B+ Trees
A B+ Tree is a self-balancing tree data structure and a variation of the B-Tree, optimized for disk-based storage systems that read and write data in fixed-size blocks. Two characteristics distinguish it from a regular B-Tree: all actual data records are stored exclusively in leaf nodes, and those leaf nodes are linked together into a sorted linked list.
This design makes range queries extremely efficient — once you locate the starting key, you simply follow the linked list forward rather than repeatedly traversing the tree from the root. B+ Trees are the most widely used index structure in database management systems, including MySQL (InnoDB), PostgreSQL, Oracle, and SQL Server.
B+ Tree vs B-Tree#
Understanding the structural differences between B-Trees and B+ Trees clarifies why B+ Trees are preferred for database indexing.
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data location | All nodes store data | Only leaves store data |
| Internal nodes | Keys + data + child pointers | Keys + child pointers only |
| Leaf nodes | No sibling links | Linked list for sequential access |
| Key duplication | No duplicates | Internal keys duplicated in leaves |
| Range queries | Tree traversal required | Single leaf scan |
| Fan-out | Lower (data in all nodes) | Higher (more keys per internal node) |
B+ Tree Properties#
A valid B+ Tree must satisfy the following structural invariants at all times. Violating any of them — even temporarily during an insert or delete — can corrupt the tree.
| Property | Description |
|---|---|
| Internal nodes | Store only keys and child pointers (no data) |
| Leaf nodes | Store keys and data (or pointers to data) |
| Leaf linking | All leaves connected via next pointers (optionally prev for bidirectional scans) |
| Key copies | Internal keys are copies of keys in leaves |
| All leaves | At the same depth (perfectly balanced) |
| Order m | Internal: max m children; Leaf: max m-1 key-value pairs |
B+ Tree Node Structure
Internal nodes and leaf nodes have different structures
Search Operation#
Search in a B+ Tree always ends at a leaf node. This predictability matters: regardless of what key you're looking for, the algorithm always traverses from the root down to a leaf, so search time is always proportional to the tree height — never more, never less. This is more consistent than a B-Tree, where a key stored in an internal node can be found before reaching a leaf.
B+ Tree Search
Search always traverses to a leaf node
Range Queries#
The leaf linked list is what makes B+ Trees so powerful for range queries. Instead of repeatedly navigating the tree for each key in the range, you navigate to the starting leaf once, then scan forward through the linked list until you pass the end of the range. This is why SQL queries with BETWEEN, >, or ORDER BY on indexed columns are fast — they map directly to this scan pattern.
Range Query
Find all keys in a given range using leaf links
Insertion#
Insertion in a B+ Tree always targets a leaf node. The process is: navigate to the correct leaf, insert the new key in sorted order, then check whether the leaf has overflowed (reached the order limit). If it has, split the leaf in half and push the first key of the new right leaf up to the parent as a separator. If the parent also overflows, the split propagates upward — potentially all the way to creating a new root.
B+ Tree Insert
Insert key-value pair, splitting nodes as needed
Deletion#
Deletion from a B+ Tree is more involved than insertion because each node must maintain a minimum occupancy — at least ⌈order/2⌉ - 1 keys. If removing a key drops a node below this minimum, you must either borrow a key from a neighboring sibling (redistribution) or merge the node with a sibling and remove the corresponding separator from the parent. There is also a second concern: if the deleted key was being used as a routing key in an internal node, that separator must be updated to remain consistent with the leaf contents.
| Case | Condition | Action |
|---|---|---|
| Simple delete | Leaf has enough keys after deletion | Just remove the key |
| Borrow from sibling | Sibling has extra keys | Redistribute keys, update parent |
| Merge with sibling | Both at minimum | Combine nodes, remove parent key |
| Update separator | Deleted key was separator | Replace with new first key of right child |
B+ Tree Delete
Delete a key while maintaining tree properties
Complete Implementation#
The following implementation brings together all the operations covered above — node structure, search, range query, insertion, leaf splitting, and internal splitting — into a single working class. Reading through it in full is a good way to see how the pieces fit together and how the leaf linked list is threaded through every operation.
Complete B+ Tree
Full implementation with all operations
Applications#
B+ Trees are the default index structure in most relational databases and many file systems. The following examples show where you are likely encountering B+ Trees in practice.
| Application | Why B+ Tree? |
|---|---|
| MySQL InnoDB | Primary and secondary indexes, clustered index |
| PostgreSQL | Default index type for most data types |
| SQLite | All table and index storage |
| MongoDB | WiredTiger storage engine indexes |
| File Systems (NTFS, ext4) | Directory and file metadata indexing |
| LevelDB / RocksDB | Sorted string tables (SSTables) with B+ tree-like index blocks |
Reviewing AI-Generated Code#
B+ Trees have two structurally distinct node types and a critical linked list connecting the leaves. These details are easy to overlook, and AI code generators frequently miss them. The most common mistakes are mixing up internal and leaf node structures, and failing to update the leaf linked list during a split. If either invariant breaks, the tree may still pass basic point-lookup tests while silently returning wrong results for range queries.
| Issue | What to Check | Common AI Mistake |
|---|---|---|
| Node types | Internal nodes have children, leaves have values | Mixing up node structures |
| Leaf linking | Leaves must maintain next pointers | Breaking linked list during split |
| Key duplication | Internal keys are copies from leaves | Not copying keys correctly |
| Search termination | Search always ends at leaf | Returning early from internal node |
| Range query | Use leaf links, don't traverse tree | Tree traversal for range query |
Spotting AI B+ Tree Bugs
Common errors in AI-generated B+ Tree code
Questions to Ask AI About B+ Tree Code#
- "How is the leaf linked list maintained?" — The
nextpointer on every leaf node must be updated during every split. Trace through the split code to verify this. - "What's the time complexity of the range query?" — The correct answer is O(log n + k), where k is the number of results. If the implementation traverses the entire tree, it is O(n) and is not using the leaf links at all.
- "Where is data stored?" — Data must appear only in leaf nodes. If values are stored in internal nodes, the implementation is a B-Tree, not a B+ Tree.
- "What gets pushed up to the parent when a leaf splits?" — The first key of the new right leaf becomes the separator in the parent. For internal node splits, the middle key is removed from the node and pushed up — it is not copied to the new node.
Summary#
| Concept | Key Point |
|---|---|
| Structure | Internal nodes: keys only; Leaves: keys + data |
| Leaf Linking | Leaves form linked list for sequential access |
| Search | Always traverses to leaf: O(log n) |
| Range Query | Find start leaf, then scan: O(log n + k) |
| Insert | Always at leaf, split upward if needed |
| Delete | Remove from leaf, borrow/merge if underflow |
| Fan-out | Higher than B-Tree (no data in internal nodes) |
| Applications | Database indexes, file systems |
B+ Trees are the cornerstone of database indexing. Their efficient range queries — enabled by the leaf linked list — and high fan-out — enabled by keeping data out of internal nodes — make them ideal for disk-based storage systems where every I/O operation has a measurable cost. Every time you run a WHERE clause, a JOIN, or an ORDER BY on an indexed column in a relational database, a B+ Tree is doing the work behind the scenes.