Union-Find

Union-Find (also called Disjoint Set Union or DSU) is a data structure that tracks elements partitioned into non-overlapping sets. It supports two operations efficiently: find which set an element belongs to, and union to merge two sets. It's the backbone of Kruskal's MST algorithm and the fastest way to solve connectivity problems.

What Is Union-Find?#

Imagine a social network where people form groups. Union-Find answers two questions in near-constant time:

  1. Are these two people in the same group? (Find)
  2. Merge these two groups into one. (Union)
Rendering diagram...

Real-World Analogies#

  • Country Alliances: Countries form alliances (union); checking if two countries are allies (find) should be instant
  • Network Connectivity: Routers connect to form networks; check if two devices can communicate
  • Image Processing: Group adjacent pixels of similar color into regions
  • Equivalence Classes: Group variables that must be equal based on a set of equations

Key Characteristics#

PropertyDescription
Find(x)Return the representative (root) of the set containing x
Union(x, y)Merge the sets containing x and y into one
Near-constant timeBoth operations run in O(α(n)) amortized — practically O(1)
Path CompressionFlatten the tree during Find so future lookups are faster
Union by RankAttach smaller trees under larger ones to keep trees shallow

Basic Implementation#

The simplest Union-Find uses a parent array where parent[i] points to the parent of element i. The root of a set points to itself.

Basic Union-Find (No Optimizations)

Simple parent-pointer implementation to illustrate the concept

Basic Union-Find without optimizations

The Problem with Basic Union-Find#

Without optimizations, the tree can degenerate into a long chain — making find take O(n) per call. Union the elements 0→1→2→3→...→n and every find(0) walks the entire chain. Two optimizations fix this.

Optimized Union-Find#

Path Compression#

During find, make every visited node point directly to the root. This flattens the tree so future lookups on any of those nodes are O(1).

Rendering diagram...

Union by Rank#

Always attach the shorter tree under the taller tree's root. This prevents the tree from growing unnecessarily tall. The rank is an upper bound on tree height.

Rendering diagram...

Full Optimized Implementation#

With both optimizations, every operation runs in O(α(n)) amortized time, where α is the inverse Ackermann function — a value that is at most 4 for any conceivable practical input size. For all practical purposes, this is constant time.

Optimized Union-Find

Union-Find with path compression and union by rank

Time:O(α(n)) ≈ O(1) amortized per operationSpace:O(n)Best:O(1)Worst:O(α(n)) amortized

Number of Connected Components#

Problem: Given n nodes and a list of edges, find the number of connected components in the graph.

This is the canonical Union-Find application — each union merges two components, and the final count gives the answer.

Rendering diagram...

Number of Connected Components

Count connected components using Union-Find

Count components by processing edges

Detecting Cycles in Undirected Graphs#

Problem: Given an undirected graph, determine if it contains a cycle.

If union(u, v) finds that u and v are already in the same set, the edge (u, v) creates a cycle — both endpoints are already reachable from each other.

Rendering diagram...

Cycle Detection (Undirected Graph)

Detect cycles by checking if union connects already-connected nodes

Time:O(E × α(V))Space:O(V)Best:O(1)Worst:O(E × α(V))

Kruskal's Minimum Spanning Tree#

Problem: Find the minimum spanning tree (MST) of a weighted, connected graph — the subset of edges that connects all vertices with minimum total weight.

Kruskal's algorithm sorts edges by weight and greedily adds the lightest edge that doesn't create a cycle. Union-Find makes the cycle check nearly instantaneous.

Rendering diagram...

Kruskal's MST

Find minimum spanning tree using sorted edges and Union-Find

Time:O(E log E)Space:O(V)Best:O(E log E)Worst:O(E log E)

Number of Islands (Grid Problem)#

Problem: Given a 2D grid of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and formed by connecting adjacent land cells horizontally or vertically.

While DFS/BFS is the more common approach, Union-Find handles this elegantly and extends naturally to dynamic versions (where cells flip between land and water).

Number of Islands with Union-Find

Count islands by unioning adjacent land cells in a grid

Grid-based Union-Find: flatten 2D index to 1D

Union-Find vs Other Approaches#

TaskUnion-FindDFS/BFSWhen to Prefer Union-Find
Connected componentsO(E × α(V))O(V + E)Edges arrive one at a time (streaming)
Cycle detection (undirected)O(E × α(V))O(V + E)Only need yes/no, not the cycle path
MST (Kruskal's)O(E log E)N/A (use Prim's)Sparse graphs (E ≈ V)
Dynamic connectivityO(α(n)) per queryO(V + E) per queryMany connectivity queries over time
Grid problemsO(R × C × α(RC))O(R × C)Online/dynamic changes to the grid

When to Choose Union-Find#

  • Online queries: Edges/connections arrive one at a time and you need to answer "connected?" after each
  • Kruskal's MST: Union-Find is the standard building block
  • No path needed: You only care about connectivity, not the actual path between nodes
  • Multiple queries: Many "are X and Y connected?" queries on a static graph — build once, query O(1)

When DFS/BFS Is Better#

  • Finding actual paths: Union-Find tells you if nodes are connected, not how
  • Shortest path: Union-Find doesn't track distances
  • Directed graphs: Standard Union-Find is for undirected connectivity only
  • One-time traversal: If you only need one traversal, DFS/BFS is simpler

Reviewing AI-Generated Code#

What to Verify#

IssueWhat to CheckCommon AI Mistake
Missing path compressionfind() should update parent pointersIterative find without flattening
Missing union by rankunion() should compare ranksAlways attaching to one side
Connected check before unionAvoid redundant unionsNot checking connected() in cycle detection
Off-by-one in grid mapping2D-to-1D index: row * cols + colUsing row * rows + col
Not tracking component countDecrement count on successful unionForgetting to update count

Questions to Ask AI#

When AI generates Union-Find code, verify:

  • "Does find use path compression?"
  • "Does union use rank (or size) to keep trees balanced?"
  • "Is the 2D-to-1D index mapping correct for grid problems?"
  • "Is the component count maintained correctly?"

Summary#

ApplicationTimeKey Insight
Connected ComponentsO(E × α(V))Each union merges two components; final count is the answer
Cycle DetectionO(E × α(V))If union finds endpoints already connected, edge creates a cycle
Kruskal's MSTO(E log E)Sort edges by weight; greedily add if no cycle
Grid IslandsO(R×C × α(RC))Flatten 2D to 1D; union adjacent land cells

Key Takeaways#

  1. Two optimizations are essential — path compression and union by rank together give O(α(n)) ≈ O(1) per operation; without them, worst-case is O(n)
  2. Union-Find answers "connected?" not "how?" — it tracks set membership, not paths. If you need the actual path between nodes, use BFS/DFS
  3. Cycle detection in one line: if connected(u, v) is true before union(u, v), you found a cycle
  4. Kruskal's = sort + Union-Find — sort edges by weight, add each edge if it doesn't create a cycle, stop at n-1 edges
  5. Grid problems flatten 2D to 1D — map (row, col) to row * cols + col and subtract water cells from the total component count