B-Trees

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. Unlike binary search trees, which store one key per node, B-Trees store many keys per node — each node may hold hundreds or thousands of entries. This design makes B-Trees the natural fit for databases and file systems, where data lives on disk and traversing many small nodes is prohibitively expensive.

B-Trees were invented by Rudolf Bayer and Edward McCreight in 1970 while working at Boeing Research Labs. Together with their variant B+ Trees, they underlie the index layer of virtually every major database engine and file system in production today.

Rendering diagram...

Why B-Trees?#

The Disk I/O Problem#

Modern databases and file systems store far more data than fits in RAM. When data lives on disk, the dominant cost is I/O latency — not CPU time. A single disk read takes ~10 ms while a memory access takes ~100 ns: a 100,000× difference. Every additional level in a tree is another disk read, so tree height is a direct multiplier on query cost.

Binary search trees handle this poorly. With 1 million keys, a balanced BST is ~20 levels deep, requiring up to 20 disk reads per search. A B-Tree of order 1001 (where each node fits ~1000 keys on a single disk page) stays within 2–3 levels for the same dataset — roughly a 10× reduction in disk reads.

The Key Insight: Match Node Size to Disk Block#

Disk hardware reads data in fixed-size blocks (typically 4 KB or 16 KB). Reading one byte costs the same as reading the entire block. B-Trees exploit this by packing each node with as many keys as fit in one disk block. Loading a node costs the same whether it holds 1 key or 1000 — but 1000 keys per node means far fewer nodes to visit, and thus far fewer disk reads per operation.

PropertyBinary TreeB-Tree
Keys per node1Many (often 100s–1000s)
Children per node2Many (keys + 1)
Height for 1M keys~20 levels~2-4 levels
Disk reads per search~20~2-4
Node sizeOne key + 2 pointersTuned to fill a disk block
Optimized forIn-memory operationsDisk-based storage
Rendering diagram...
Rendering diagram...

Real-World Usage#

B-Trees (and their variant B+ Trees) are the default index structure in virtually every major storage system:

SystemUse of B-Tree
PostgreSQL, MySQL (InnoDB)Primary and secondary indexes
SQLiteEntire database file is one B-Tree
NTFS, HFS+, ext4Directory and file metadata indexes
MongoDB (WiredTiger)Collection and index storage
macOS APFSFile system B-Tree for all metadata

B-Tree Properties#

A B-Tree of order m (also called the degree or branching factor) controls how wide the tree is. A larger order means each node holds more keys and branches into more children, producing a shorter, wider tree. Formally, the following constraints must hold at all times:

PropertyDescription
Maximum keysEach node has at most m-1 keys
Minimum keys (non-root)Each node has at least ⌈m/2⌉-1 keys
Root minimumRoot has at least 1 key (or is empty)
ChildrenNon-leaf node with k keys has k+1 children
BalanceAll leaves are at the same level
OrderingKeys within a node are sorted
B-Tree of Order 5 (m=5)
Rendering diagram...
Rendering diagram...

B-Tree Node Structure

A node containing multiple keys and child pointers

B-Tree node with keys array and children pointers

Search Operation#

Searching in a B-Tree follows the same principle as a binary search tree — compare the target against the node's keys and descend into the correct subtree. The key difference is that each node holds many keys, so at each level you scan all of them to determine which of the many children to enter next. This multi-way branching is exactly what keeps the tree shallow and minimizes disk reads.

Searching for key 25
Rendering diagram...

B-Tree Search

Find a key in the B-Tree

Time:O(log n)Space:O(1)Best:O(1)Worst:O(log n)

Insertion#

Insertion in a B-Tree works in two phases:

  1. Descend to the leaf node where the key belongs, following the same path as a search
  2. Insert the key in sorted position; if the node is now full (exceeds the maximum key count), split it and push the middle key up to the parent

The implementation uses a top-down splitting strategy: any full node encountered on the way down is split proactively, before it actually overflows. This ensures there is always room in the parent to receive a promoted key, avoiding the need to backtrack up the tree after an insertion.

ScenarioAction
Leaf has roomSimply insert key in sorted position
Leaf is fullSplit the node: middle key moves up to parent
Parent is fullRecursively split parent nodes
Root splitsCreate new root, tree grows in height
Before Insert 25 (Order 5)
Rendering diagram...
After Insert 25: Node Splits
Rendering diagram...

B-Tree Insert

Insert a key into the B-Tree, splitting nodes as needed

Time:O(log n)Space:O(1)Best:O(log n)Worst:O(log n)

Deletion#

Deletion is the most complex B-Tree operation. Removing a key is straightforward when the target node has more than the minimum required keys — but if it does not, the tree must be restructured to preserve the balance invariant.

There are two repair strategies: borrowing a key from an adjacent sibling (rotated through the parent) and merging the node with a sibling. A merge reduces the parent's key count by one, which may require rebalancing the parent as well. In the worst case, restructuring cascades all the way to the root, shrinking the tree's height by one level.

CaseConditionAction
Case 1Key in leaf, node has extra keysSimply remove the key
Case 2aKey in internal node, left child has extraReplace with predecessor, delete from child
Case 2bKey in internal node, right child has extraReplace with successor, delete from child
Case 2cKey in internal node, both children minimalMerge children, delete from merged node
Case 3Key not in node, child has minimal keysBorrow from sibling or merge, then recurse
Delete 20: Borrow from Sibling
Rendering diagram...
After: Borrowed 40 via Parent
Rendering diagram...

B-Tree Delete

Delete a key from the B-Tree while maintaining properties

Time:O(log n)Space:O(1)Best:O(log n)Worst:O(log n)

Complexity Analysis#

OperationTime ComplexityDisk I/O
SearchO(log_m n)O(log_m n)
InsertO(log_m n)O(log_m n)
DeleteO(log_m n)O(log_m n)
Range QueryO(log_m n + k)O(log_m n + k/m)
SpaceO(n)O(n/m) disk blocks

m = order (branching factor), n = number of keys, k = keys in range

Complete B-Tree Implementation

Full B-Tree with insert, search, and traversal

Complete B-Tree with traversal and search

Applications#

ApplicationWhy B-Tree?
Databases (MySQL, PostgreSQL)Efficient disk-based indexing with range queries
File Systems (NTFS, HFS+, ext4)Directory indexing and file metadata
MongoDBDocument indexing for fast queries
Key-Value StoresOrdered key storage with fast lookups
Search EnginesIndex structures for document retrieval

Reviewing AI-Generated Code#

B-Tree implementations require careful attention to node splitting and key counts. AI-generated code frequently contains off-by-one errors in these areas — subtle mistakes that may pass basic insertion tests but silently corrupt the tree structure under specific conditions.

IssueWhat to CheckCommon AI Mistake
Order definitionSome define order as max keys, others as max childrenInconsistent order interpretation
Split pointMiddle key goes up, not left or rightWrong key moves to parent
Key countChildren = keys + 1 for internal nodesWrong number of children after split
Leaf flagTrack if node is leaf or internalTreating all nodes the same
Minimum keysNon-root nodes have minimum key requirementAllowing underflow

Spotting AI B-Tree Bugs

Common errors in AI-generated B-Tree code

Correct split: middle key goes to parent

Questions to Ask When Reviewing B-Tree Code#

  1. "What does 'order' mean in your implementation?" — Clarify whether it refers to max keys or max children. Different textbooks and implementations use different conventions.
  2. "Where does the middle key go during a split?" — It must move up to the parent, not remain in either of the two split halves.
  3. "Walk me through the tree after inserting these values" — Trace a concrete example to verify that splits produce the correct structure.
  4. "What happens when a node has fewer than the minimum keys after deletion?" — The code must borrow from siblings or merge nodes; ignoring this breaks the B-Tree invariant.

Summary#

ConceptKey Point
StructureMultiple keys per node, all leaves at same level
Order mMax m children, max m-1 keys, min ceil(m/2)-1 keys
HeightO(log_m n) - very shallow for large m
SearchMulti-way search at each node
InsertInsert in leaf, split if overflow
DeleteRemove and rebalance (borrow/merge)
Disk OptimizationNode size matches disk block size
ApplicationsDatabases, file systems, key-value stores

B-Trees are excellent for disk-based storage. For even better range query performance, explore B+ Trees, where all data lives in the leaf nodes and the leaves are linked together in a list — enabling fast sequential scans in addition to point lookups.