B-Trees: More Than I Thought I'd Want to Know

lysdexic@programming.dev · 2 months ago

B-Trees: More Than I Thought I'd Want to Know

arendjr@programming.dev · edit-2 2 months ago

Apart from all the interesting performance characteristics and their use in databases, the reason I tend to recommend B-Tree maps over hash maps for ordinary programming is consistent iteration order. It is simply too easy to run into a situation where you think iteration order doesn’t matter, but then it turns out it does in some subtle unforeseen way.

Of course it’s the way of our trade that unforeseen things cause bugs. But if there’s one kind of bug that is particularly annoying, it’s the hard-to-reproduce ones: those introduced by timing issues or (semi-)randomness. The moment you start iterating over a hash map you risk falling prey to the second one. So I’ll just prefer to default to a B-Tree map or set instead.

lysdexic@programming.dev · 2 months ago

the reason I tend to recommend B-Tree maps over hash maps for ordinary programming is consistent iteration order.

Hash maps tend to be used to take advantage of constant time lookup and insertion, not iterations. Hash maps aren’t really suites for that usecase.

Programming languages tend to provide two standard dictionary containers: a hash map implementation suited for lookups and insertions, and a tree-based hash map that supports sorting elements by key.

arendjr@programming.dev · 2 months ago

Oh, I agree, they both have their use cases. But that doesn’t mean there’s not plenty of situations where the performance is effectively irrelevant, but where people tend to default to using a hash map because they heard it’s faster (probably because lookups are O(1) indeed). So that’s where I would say, as long as performance doesn’t matter it’s better to default to B-Tree maps than to hash maps, because the chance of avoiding bugs is more valuable than immeasurable performance benefits (not to mention that for smaller data sets B-Tree maps can often outperform hash maps due to better cache locality, but again that’s hardly relevant since the data set is small anyway).

lysdexic@programming.dev · 2 months ago

So that’s where I would say, as long as performance doesn’t matter it’s better to default to B-Tree maps than to hash maps, because the chance of avoiding bugs is more valuable than immeasurable performance benefits (…)

I don’t quite follow. What leads you to believe that a B-Tree map implementation would have a lower chance of having a bug when you can simply pick any standard and readily available hash map implementation?

Also, you fail to provide any concrete reasoning for b-tree maps. It’s not performance on any of the dictionary operationd, and bugs ain’t it as well. What’s the selling point that you are seeing?

arendjr@programming.dev · 2 months ago

I mentioned it in the first comment:

the reason I tend to recommend B-Tree maps over hash maps for ordinary programming is consistent iteration order. It is simply too easy to run into a situation where you think iteration order doesn’t matter, but then it turns out it does in some subtle unforeseen way.

I’m not talking about bugs in the implementation of the map itself, I’m talking about unforeseen consequences in the user’s code since they may not anticipate properly for the randomness in iteration.