Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

simdjson iterators and STL interoperability #2370

lano1106 started this conversation in General
simdjson iterators and STL interoperability #2370
Apr 23, 2025 * 2 comments * 18 replies
Return to top
Discussion options

lano1106
Apr 23, 2025

I have returned to reading the classic from Scott Meyers: Effective STL.

In it, there is the item 5:
Prefer range member functions to their single-element counterpart.

It is a very convincing text that made me chase any code in my codebase using single-element insertion function to replace them with their range counterpart and I have stumbled into this:

std::vector leverageVec;

simdjson::ondemand::array levArray{field.value().get_array()};
for (uint64_t v : levArray)
leverageVec.push_back(v);

I was so happy to find a prime candidate for the refactoring!

leverageVec.insert(std::cend(leverageVec), std::begin(levArray), std::end(levArray));

but the compiler did not let me do it:

/usr/include/c++/14.2.1/bits/stl_iterator_base_types.h:252:57: error: no type named 'iterator_category' in 'struct std::iterator_traits >'
252 | input_iterator_tag>::value>;

beside, I am not sure if the simd proxy object would have performed the typecast correctly... since the one that is available is uint64_t...
(maybe boost::iterator_adapter could help... but this starts to be a complex solution for a simple task)

I have read in simdjson documentation that the number of element in a JSON array/object was expensive to compute, but I think that it might be possible to assign a output_iterator tag to the simdjson iterators so they can be used with STL containers...

You must be logged in to vote

Replies: 2 comments 18 replies

Comment options

lemire
Apr 23, 2025
Maintainer

If you have C++20, please see
https://github.com/simdjson/simdjson/blob/master/doc/basics.md#2-use-tag_invoke-for-custom-types-c20

You should be able to just do...

std::vector<uint8_t> leverageVec = val.getuint8_t>>());

It is also possible prior to C++20, although a bit more busy work is needed.

You must be logged in to vote
17 replies
Comment options

lemire Apr 23, 2025
Maintainer

it generates a bigger binary...

After building in release mode, you should be able to examine the size o the various symbols with standard tools. The automated std::vector stuff should be quite small: it should not generate a binary larger than how you'd it by hand. It is just template metaprogramming. If not, file an issue.

Comment options

lano1106 Apr 23, 2025
Author

I bundle debug symbols with release version... there is no better way to diagnose an occasional core dump...

Comment options

lemire Apr 23, 2025
Maintainer

it seems to be breakage from the old age C++ design philosophy that using an abstraction should not be less efficient than if you were handwriting it...

I don't think so. This statement as to do with compute efficiency and we try hard to make sure that we maintain high efficiency throughout.

A simdjson array is not an array of uint8_t but rather an array of simdjson_result.

You can't easily mix and match iterators of different types in STL. The following might not compile:

std::vector<uint8_t> fun(std::vector v) {
std::vector<uint8_t> z;
z.insert(z.end(), v.begin(), v.end());
return z;
}

So I do not think we are particularly inconvenient.

Now, you might be able to convert our array::iterator from iterators over types simdjson_result, to iterators over types uint8_t... It might be cool to add this to simdjson... but that's an extra feature.

Comment options

lemire Apr 23, 2025
Maintainer

@lano1106 This being said, we could do better, for sure, but I am really not sure I want people passing around On-Demand simdjson iterators as if they were STL iterators.

It works with the DOM API because the array is an actual thing. (It is supported right now.) In On-Demand, there is no array... we are faking it.

So I want to entice users to move to their materialized types as soon as possible.

I do not want people to use a simdjson array as if it were an STL container. It is really not. It is a particular abstraction.

Comment options

lemire Apr 23, 2025
Maintainer

To elaborate on why we don't want people to use iterators in On-Demand as STL iterators, there is one key ingredient: arrays should be iterated only once, and then the array is no longer valid (unless it is rewinded).

This was not planned out in STL which is designed for actual containers and not emulated ones like On-Demand.

So, really, we do want people to move to their own containers as soon as possible.

Now, we can improve this support in various ways, for sure.

Comment options

lemire
Apr 23, 2025
Maintainer

@lano1106 If your goal is to append to an existing container, you can do it like so:

std::vector<uint32_t> array = {0, 0};

simdjson::padded_string json = R"({"data" : [1,2,3,4]})"_padded;

simdjson::ondemand::parser parser;
simdjson::ondemand::document d = parser.iterate(json);

d["data"].getuint32_t>>(array);
// array is now {0,0,1,2,3,4}

This works now with C++20.

You must be logged in to vote
1 reply
Comment options

lano1106 Apr 23, 2025
Author

it is nice to know...

assignment was what I wanted in the example context but this newly shared info will certainly be useful in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
General
Labels
None yet
2 participants