B-Trees

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. Unlike binary search trees, which store one key per node, B-Trees store many keys per node — each node may hold hundreds or thousands of entries. This design makes B-Trees the natural fit for databases and file systems, where data lives on disk and traversing many small nodes is prohibitively expensive.

B-Trees were invented by Rudolf Bayer and Edward McCreight in 1970 while working at Boeing Research Labs. Together with their variant B+ Trees, they underlie the index layer of virtually every major database engine and file system in production today.

Rendering diagram...

Why B-Trees?#

The Disk I/O Problem#

Modern databases and file systems store far more data than fits in RAM. When data lives on disk, the dominant cost is I/O latency — not CPU time. A single disk read takes ~10 ms while a memory access takes ~100 ns: a 100,000× difference. Every additional level in a tree is another disk read, so tree height is a direct multiplier on query cost.

Binary search trees handle this poorly. With 1 million keys, a balanced BST is ~20 levels deep, requiring up to 20 disk reads per search. A B-Tree of order 1001 (where each node fits ~1000 keys on a single disk page) stays within 2–3 levels for the same dataset — roughly a 10× reduction in disk reads.

The Key Insight: Match Node Size to Disk Block#

Disk hardware reads data in fixed-size blocks (typically 4 KB or 16 KB). Reading one byte costs the same as reading the entire block. B-Trees exploit this by packing each node with as many keys as fit in one disk block. Loading a node costs the same whether it holds 1 key or 1000 — but 1000 keys per node means far fewer nodes to visit, and thus far fewer disk reads per operation.

Property	Binary Tree	B-Tree
Keys per node	1	Many (often 100s–1000s)
Children per node	2	Many (keys + 1)
Height for 1M keys	~20 levels	~2-4 levels
Disk reads per search	~20	~2-4
Node size	One key + 2 pointers	Tuned to fill a disk block
Optimized for	In-memory operations	Disk-based storage

Rendering diagram...

Real-World Usage#

B-Trees (and their variant B+ Trees) are the default index structure in virtually every major storage system:

System	Use of B-Tree
PostgreSQL, MySQL (InnoDB)	Primary and secondary indexes
SQLite	Entire database file is one B-Tree
NTFS, HFS+, ext4	Directory and file metadata indexes
MongoDB (WiredTiger)	Collection and index storage
macOS APFS	File system B-Tree for all metadata

B-Tree Properties#

A B-Tree of order m (also called the degree or branching factor) controls how wide the tree is. A larger order means each node holds more keys and branches into more children, producing a shorter, wider tree. Formally, the following constraints must hold at all times:

Property	Description
Maximum keys	Each node has at most m-1 keys
Minimum keys (non-root)	Each node has at least ⌈m/2⌉-1 keys
Root minimum	Root has at least 1 key (or is empty)
Children	Non-leaf node with k keys has k+1 children
Balance	All leaves are at the same level
Ordering	Keys within a node are sorted

B-Tree of Order 5 (m=5)

Rendering diagram...

B-Tree Node Structure

A node containing multiple keys and child pointers

B-Tree node with keys array and children pointers

Search Operation#

Searching in a B-Tree follows the same principle as a binary search tree — compare the target against the node's keys and descend into the correct subtree. The key difference is that each node holds many keys, so at each level you scan all of them to determine which of the many children to enter next. This multi-way branching is exactly what keeps the tree shallow and minimizes disk reads.

Searching for key 25

Rendering diagram...

B-Tree Search

Find a key in the B-Tree

Time:O(log n)Space:O(1)Best:O(1)Worst:O(log n)

Insertion#

Insertion in a B-Tree works in two phases:

Descend to the leaf node where the key belongs, following the same path as a search
Insert the key in sorted position; if the node is now full (exceeds the maximum key count), split it and push the middle key up to the parent

The implementation uses a top-down splitting strategy: any full node encountered on the way down is split proactively, before it actually overflows. This ensures there is always room in the parent to receive a promoted key, avoiding the need to backtrack up the tree after an insertion.

Scenario	Action
Leaf has room	Simply insert key in sorted position
Leaf is full	Split the node: middle key moves up to parent
Parent is full	Recursively split parent nodes
Root splits	Create new root, tree grows in height

Before Insert 25 (Order 5)

Rendering diagram...

After Insert 25: Node Splits

Rendering diagram...

B-Tree Insert

Insert a key into the B-Tree, splitting nodes as needed

Time:O(log n)Space:O(1)Best:O(log n)Worst:O(log n)

Deletion#

Deletion is the most complex B-Tree operation. Removing a key is straightforward when the target node has more than the minimum required keys — but if it does not, the tree must be restructured to preserve the balance invariant.

There are two repair strategies: borrowing a key from an adjacent sibling (rotated through the parent) and merging the node with a sibling. A merge reduces the parent's key count by one, which may require rebalancing the parent as well. In the worst case, restructuring cascades all the way to the root, shrinking the tree's height by one level.

Case	Condition	Action
Case 1	Key in leaf, node has extra keys	Simply remove the key
Case 2a	Key in internal node, left child has extra	Replace with predecessor, delete from child
Case 2b	Key in internal node, right child has extra	Replace with successor, delete from child
Case 2c	Key in internal node, both children minimal	Merge children, delete from merged node
Case 3	Key not in node, child has minimal keys	Borrow from sibling or merge, then recurse

Delete 20: Borrow from Sibling

Rendering diagram...

After: Borrowed 40 via Parent

Rendering diagram...

B-Tree Delete

Delete a key from the B-Tree while maintaining properties

Time:O(log n)Space:O(1)Best:O(log n)Worst:O(log n)

Complexity Analysis#

Operation	Time Complexity	Disk I/O
Search	O(log_m n)	O(log_m n)
Insert	O(log_m n)	O(log_m n)
Delete	O(log_m n)	O(log_m n)
Range Query	O(log_m n + k)	O(log_m n + k/m)
Space	O(n)	O(n/m) disk blocks

m = order (branching factor), n = number of keys, k = keys in range

Complete B-Tree Implementation

Full B-Tree with insert, search, and traversal

Complete B-Tree with traversal and search

Applications#

Application	Why B-Tree?
Databases (MySQL, PostgreSQL)	Efficient disk-based indexing with range queries
File Systems (NTFS, HFS+, ext4)	Directory indexing and file metadata
MongoDB	Document indexing for fast queries
Key-Value Stores	Ordered key storage with fast lookups
Search Engines	Index structures for document retrieval

Reviewing AI-Generated Code#

B-Tree implementations require careful attention to node splitting and key counts. AI-generated code frequently contains off-by-one errors in these areas — subtle mistakes that may pass basic insertion tests but silently corrupt the tree structure under specific conditions.

Issue	What to Check	Common AI Mistake
Order definition	Some define order as max keys, others as max children	Inconsistent order interpretation
Split point	Middle key goes up, not left or right	Wrong key moves to parent
Key count	Children = keys + 1 for internal nodes	Wrong number of children after split
Leaf flag	Track if node is leaf or internal	Treating all nodes the same
Minimum keys	Non-root nodes have minimum key requirement	Allowing underflow

Spotting AI B-Tree Bugs

Common errors in AI-generated B-Tree code

Correct split: middle key goes to parent

Questions to Ask When Reviewing B-Tree Code#

"What does 'order' mean in your implementation?" — Clarify whether it refers to max keys or max children. Different textbooks and implementations use different conventions.
"Where does the middle key go during a split?" — It must move up to the parent, not remain in either of the two split halves.
"Walk me through the tree after inserting these values" — Trace a concrete example to verify that splits produce the correct structure.
"What happens when a node has fewer than the minimum keys after deletion?" — The code must borrow from siblings or merge nodes; ignoring this breaks the B-Tree invariant.

Summary#

Concept	Key Point
Structure	Multiple keys per node, all leaves at same level
Order m	Max m children, max m-1 keys, min ceil(m/2)-1 keys
Height	O(log_m n) - very shallow for large m
Search	Multi-way search at each node
Insert	Insert in leaf, split if overflow
Delete	Remove and rebalance (borrow/merge)
Disk Optimization	Node size matches disk block size
Applications	Databases, file systems, key-value stores

B-Trees are excellent for disk-based storage. For even better range query performance, explore B+ Trees, where all data lives in the leaf nodes and the leaves are linked together in a list — enabling fast sequential scans in addition to point lookups.

PreviousRed-Black Tree

NextB+ Tree