German strings

note: If you only want to know what this string type has to do with Germany you'll have to wait till the end ;)

`<const> char*` - may be heap, may be stack, may be static, requires strlen to do anything with but takes up just 8 bytes in this talk I talking strictly about modern consumer computers which are predominantly 64bits and little endian

note: Based on totally trust-worthy benchmarks! latest clang and libstdcxx todo: maybe log scale for Y?

note: While all of the 3 big implementations make different tradeoffs while staying in the bound of std::string interface they share a lot in common

note: We check the size but then basically get the data pointers and call memcmp. For the case of a non-small string it might be bad. I'm very grateful for how simple Libstdc++ implementation is as the assembly of both msvc and libc++ is uch longer and more complex due to their SSO representations

note: We will go over each part now

note: Will take up 24 bytes, because the union will want to be aligned

note: Rely on little endian

note: This is approximately 2x faster than std::memcmp for 4 byte values branchless, may be counted as SWAR maybe?

2023 viral data aggregation challenge note: Very fresh a definitely not biased. Aggregating a large volume of data from a CSV

note: Maybe a demo

note: We basically took unoptimized version that did copy strings from the file to std::string

note: Optimized version already used string_views and rarelly compared them, strings are city names and thereffore are mostly small

note: memory mapping is inadvisable

note:

note: Because of this guy

German strings

Plan

We love strings...

We love strings...

So why would we need any more?

Anatomy of std::string

Anatomy of std::string

Anatomy of std::string

Anatomy of std::string

What's lacking?

What's lacking?

What's lacking?

What's lacking?

What's lacking?

What's lacking?

What can we do differently?

What can we do differently?

Sometimes owning the data - string classes

Sometimes owning the data - string classes

Pros

Cons

Prefix

Show me the code!

Show me the code!

Show me the actual code!

Show me the actual code!

Show me the actual code!

Looking at benchmarks

Applications - 1BRC

Applications - 1BRC

Applications - 1BRC

Applications - 1BRC

Applications - 1BRC

Applications - Toy LSM-tree KV-store

Applications - Toy LSM-tree KV-store

Applications - Toy LSM-tree KV-store

Applications - Toy LSM-tree KV-store

Applications - Real World

Applications - Real World

Why German though?

Sources and further info

Thank you for your attention

Anatomy of `std::string`

Anatomy of `std::string`

Anatomy of `std::string`

Anatomy of `std::string`