Real-time Chat App

A real-time chat application looks deceptively simple from the outside — a text box, a send button, and messages that appear instantly. But behind that interface lies one of the most instructive problems in distributed systems. When a friend sends you a message on WhatsApp or Slack, your phone displays it in milliseconds without you pressing a refresh button. That instant delivery is not magic: it is the result of a carefully designed stack of WebSockets, Pub/Sub messaging, and time-series storage.

This case study is the canonical introduction to WebSockets and Pub/Sub. The moment your chat app needs more than one server — a requirement that arrives early at any meaningful scale — you face a fundamental distributed systems challenge: how do you deliver a message to a user whose WebSocket connection lives on a different server than the one that received the message? The answer to that question shapes the entire architecture.

The core question this case study answers: why does instant messaging require a fundamentally different connection model than loading a web page?

This case study in this section follows the same framework:

Clarify constraints (Steps 1–2) — What does the system do, and how much traffic must it handle?
High-level design (Step 3) — What are the major components and how do they connect?
Deep dives (Steps 4–7) — How do the trickiest parts actually work?
Trade-offs (Step 8) — What did we give up, and when would we choose differently?

Step 1: Clarify Requirements#

Functional Requirements#

These describe what the system does.

Feature	Description	Priority
1:1 direct messaging	Two users exchange messages privately in a dedicated conversation	Core
Group chat (channels)	Multiple users send and receive messages in a named channel or room	Core
Message history	Users can scroll up to load past messages in any conversation	Core
Real-time delivery	Messages appear on recipients' screens instantly, without page reload	Core
Online/offline presence	Show whether a contact is currently online	Core
Read receipts	Show the sender when their message has been read by the recipient	Core
Message status	Track each message through: Sending → Sent → Delivered → Read	Core
Typing indicators	Show 'Alice is typing...' to other participants while composing. Implementation: the client sends a `typing_start` event over the WebSocket; the Chat Server publishes it to the Redis channel for that conversation; other servers push it to their local clients. Use a short TTL (3–5 seconds) so the indicator disappears automatically if typing stops without an explicit `typing_stop` event.	Optional
File and image uploads	Send photos and files; use pre-signed object storage URLs (S3, GCS) for the actual content. Flow: client requests a pre-signed upload URL from the REST API → uploads the file directly to object storage → sends a message with type='image' and the resulting storage URL as content.	Optional
Message reactions	React to a message with an emoji	Optional

Non-Functional Requirements#

These describe how well the system works.

Property	Requirement	Why It Matters
Low latency	End-to-end message delivery in under 200ms on a normal network	Chat is synchronous by nature — visible delay breaks the conversational feel and degrades UX more sharply than almost any other feature type
High availability	99.99% uptime on the message delivery path — under 1 hour of downtime per year	A chat outage during a business emergency or a time-sensitive coordination causes immediate, visible harm; users notice within seconds
Durability	No sent message may be silently dropped; every message must be persisted and eventually delivered	Losing a message in a business negotiation or medical coordination is not recoverable — the system must guarantee no silent loss under any failure
Scalability	Handle 50M daily active users, with up to 25M concurrent WebSocket connections at peak	Unlike stateless HTTP APIs, WebSocket servers maintain per-connection in-memory state — scaling horizontally requires explicit coordination design
Message ordering	Messages must be displayed in the order they were sent within each conversation	Out-of-order messages destroy conversational context — a reply appearing before its question is disorienting and unacceptable
Idempotency	A retried message send must not produce a duplicate message visible to the recipient	Network failures cause clients to retry; without idempotency, a user appears to have sent the same message twice

The core tension: real-time delivery and durability pull against each other. The fastest path — push directly to the recipient's open connection in memory — has no durability. The most durable path — write to disk before notifying — adds latency. The architecture resolves this by doing both concurrently: every incoming message is immediately published to the real-time fan-out layer and enqueued for durable storage. Offline users sync missed messages from the durable store on reconnect.

Step 2: Back-of-the-Envelope Estimation#

Metric	Calculation	Result
Daily active users (DAU)	Assumed for a mid-scale messaging product	50 million
Messages sent per user per day	~20 messages on average	—
Total messages per day	50M × 20	1 billion messages/day
Average message throughput	1B ÷ 86,400 seconds/day	~11,600 messages/second
Peak throughput (3× average)	11,600 × 3 — spikes during business hours and live events	~35,000 messages/second
Average message size	Text content + metadata (sender ID, timestamp, channel ID, message ID)	~500 bytes
Message storage per day	1B × 500 bytes	~500 GB/day
Message storage per 30 days	500 GB × 30	~15 TB/month
Concurrent WebSocket connections at peak	50M DAU × 50% online simultaneously	~25M connections
Chat servers needed at peak	25M connections ÷ ~50,000 connections per server	~500 Chat Servers

The number that forces the architecture: 25 million concurrent WebSocket connections. A single server — even a powerful one — can handle roughly 50,000 concurrent WebSocket connections before memory and file descriptor limits become a bottleneck. (Each connection holds a TCP socket, send/receive buffers, and per-session state — roughly 10–50 KB. At 50,000 connections × 50 KB = 2.5 GB, a server's available memory fills up fast.) Across 500 servers, any design that stores user session state only in local server memory immediately breaks: when a message arrives on Server A for a user connected to Server B, Server A has no way to deliver it. This is why the Pub/Sub fan-out layer is not an optimization — it is the foundational requirement that makes horizontal scaling possible.

Step 3: High-Level Design#

The foundational principle: separate the connection layer from the message routing layer. Chat Servers maintain WebSocket connections — they hold the pipe to each user's client. Redis Pub/Sub connects all Chat Servers together so a message arriving on any server can instantly reach any user connected to any other server. Kafka and the Message Service handle durable storage asynchronously, decoupled from the real-time delivery path.

Rendering diagram...

What each component does:

Load Balancer — Routes incoming connections to Chat Servers. Important: the load balancer must be configured to pass through WebSocket upgrade requests (HTTP 101) rather than terminating them. Most modern load balancers (NGINX, AWS ALB, HAProxy) support this, but it requires explicit configuration.
Chat Servers — The core of the system. Each maintains tens of thousands of long-lived WebSocket connections in an in-memory map (userId → socket). A user connected from multiple devices (phone + laptop) gets multiple entries in this map; the server pushes messages to all of them. When a message arrives from a client, the Chat Server publishes it to Redis Pub/Sub (real-time fan-out) and enqueues it to Kafka (durable storage). When Redis delivers a message published by another server, it pushes it to any locally connected recipients.
Redis Pub/Sub — The fan-out layer. Every Chat Server subscribes to the channels (rooms) that its locally connected users belong to. When any server publishes a message to a channel, every subscribed server — including the originating server itself — receives it and pushes it to their local clients. This is the solution to the cross-server delivery problem.
Redis (Presence) — The same Redis cluster also stores user presence as keys with a time-to-live (TTL), refreshed by client heartbeats. If a user's presence key exists, they are online; if the key has expired, they are offline.
Kafka — The durability buffer. Kafka is a distributed event streaming platform — think of it as a durable, ordered log that producers write to and consumers read from at their own pace, without affecting each other's speed. Chat Servers publish to Kafka immediately (fast, non-blocking), and the Message Service consumes from Kafka to write to Cassandra. This decouples delivery latency from storage write latency. Kafka guarantees at-least-once delivery, meaning the Message Service may occasionally receive the same message twice — the deduplication mechanism described below handles this.
Message Service — Consumes messages from Kafka and writes them to Cassandra, using client_id as an idempotency key to suppress duplicates from Kafka retries. Also provides the REST API endpoint for loading message history.
Cassandra — Durable, append-only message storage, designed for the time-series access pattern of chat: "fetch the last N messages in this conversation."
PostgreSQL — Stores structured relational data that doesn't need Cassandra's write scale: users, channel metadata, and channel memberships.
REST API — Handles non-real-time operations: loading a user's channel list on app start, fetching message history, creating or joining channels, and updating read receipts.

API Design#

Endpoint	Method	Request / Body	Response
`POST /api/v1/channels`	POST	`{ name, type: 'direct' \| 'group', member_ids[] }`	`201 Created` — `{ channel_id, name, type }`
`GET /api/v1/channels/{id}/messages`	GET	Query: `?before_id=<message_id>&limit=50`	`200 OK` — `{ messages[], has_more }`
`POST /api/v1/channels/{id}/members`	POST	`{ user_id }`	`200 OK`
`PUT /api/v1/channels/{id}/read`	PUT	`{ last_read_message_id }`	`200 OK`
`GET /api/v1/users/{id}/presence`	GET	—	`{ user_id, status: 'online' \| 'offline', last_seen_at }`

POST /api/v1/channels — Creates a new conversation room. The type field distinguishes a 1:1 DM (direct) from a multi-person room (group). member_ids[] populates the channel_members table atomically at creation time, so the first message send can immediately look up who is in the room. The response returns a channel_id that becomes the partition key for all future message writes in Cassandra.

GET /api/v1/channels/{id}/messages — Loads message history for a channel. The {id} path parameter maps directly to the Cassandra partition key, so every read touches exactly one partition — no scatter-gather across nodes. The ?before_id / ?after_id cursors (explained below) control which slice of that partition is returned. limit=50 caps the result size to keep payloads predictable; the has_more flag tells the client whether to render a "load older messages" button.

POST /api/v1/channels/{id}/members — Adds a user to an existing group channel. This is a separate endpoint from channel creation because membership changes happen throughout the channel's lifetime (e.g., being invited to a Slack channel days after it was created). It inserts a row into channel_members and also triggers a Pub/Sub event so other online members see the join notification in real time.

PUT /api/v1/channels/{id}/read — Updates the user's read position by writing last_read_message_id to channel_members. This is a PUT (not POST) because it is idempotent: sending the same last_read_message_id twice produces the same state. It serves two purposes: (1) powering unread-count badges — the server computes "messages after your cursor" to determine the badge count; (2) providing read receipts to the sender — the Chat Server can publish a read_receipt event to the Redis channel, which fan-outs to the sender's WebSocket in real time.

GET /api/v1/users/{id}/presence — Returns whether a user is online and, if offline, when they were last active. This is a read against Redis (where presence heartbeats are stored as keys with TTLs) rather than PostgreSQL, so it is fast and does not require a database query. The last_seen_at timestamp powers the "last seen 5 minutes ago" UI that chat apps show when the other person is offline.

Why cursor-based pagination for message history? The before_id parameter acts as a cursor: "give me the 50 messages sent before this message ID." This is more efficient than offset-based pagination (?page=2&limit=50) because offset pagination requires the database to count and skip a growing number of rows to find the starting position, and the cost increases as you page deeper. Cassandra has no native OFFSET support — achieving it would require fetching and discarding rows. Cursor pagination sidesteps this entirely: it fetches exactly the N rows before a given ID using a direct range scan at the correct position in the partition. It also handles new messages arriving between requests correctly: offset pagination would silently shift the window and produce duplicates or gaps, while a cursor-based scan is anchored to a fixed point.

Two cursor directions: The before_id cursor is for loading older messages — used when the user scrolls up through history. A complementary after_id parameter — "give me all messages after this message ID" — is used when reconnecting and syncing missed messages. The same endpoint supports both: pass before_id to paginate backward, or after_id to fetch everything that arrived while the client was offline. If the cursor ID no longer exists (for example, due to message deletion), the range query still works correctly — Cassandra's range scan finds the next row after the missing ID without error.

Database Schema#

The system uses two databases with distinct responsibilities: PostgreSQL for structured relational data (users, channels, memberships) and Cassandra for the high-volume time-series message log.

-- Channels: 1:1 direct messages and group rooms
CREATE TABLE channels (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name       VARCHAR(255),
  type       VARCHAR(10) NOT NULL,  -- 'direct', 'group'
  created_at TIMESTAMPTZ DEFAULT now()
);

-- Channel memberships, also used for read-receipt tracking
CREATE TABLE channel_members (
  channel_id           UUID REFERENCES channels(id) ON DELETE CASCADE,
  user_id              BIGINT REFERENCES users(id) ON DELETE CASCADE,
  joined_at            TIMESTAMPTZ DEFAULT now(),
  last_read_message_id VARCHAR(36),  -- UUID of last message the user read
  PRIMARY KEY (channel_id, user_id)
);

-- Index for "which channels is this user a member of?" — used on every app load
CREATE INDEX idx_channel_members_user ON channel_members(user_id);

The Cassandra messages table is the write-heavy heart of the system:

messages_by_channel (Cassandra):
  channel_id    UUID        -- partition key: all messages in one channel cluster together
  message_id    TIMEUUID    -- clustering key DESC: newest messages are physically first
  sender_id     UUID
  content       TEXT
  type          TEXT        -- 'text', 'image', 'file'
  status        TEXT        -- 'sent', 'deleted'  (message lifecycle: has it been sent / soft-deleted?)
  client_id     TEXT        -- client-generated UUID for idempotency (deduplication key)
  created_at    TIMESTAMP

How message statuses are tracked: The status field here tracks the lifetime of the message record — whether it is live (sent) or has been soft-deleted (deleted). The higher-level statuses from the requirements ("Delivered" and "Read") are tracked separately:

Delivered is implicit: a message pushed through the WebSocket to an online user is considered delivered. There is no separate database write for each delivery event.
Read is tracked in the channel_members.last_read_message_id column in PostgreSQL — updated when the user calls PUT /api/v1/channels/{id}/read. When the sender wants to see who has read their message, they query the channel_members table.
Read receipt real-time notification: When Alice reads Bob's message and the client calls the read API, the server can also publish a read_receipt event to the Redis channel for that conversation, so Bob's Chat Server sees it and pushes it to Bob's client in real-time — the same fan-out path used for messages.
"Sending" is a client-side state only (the message appears greyed out until the server ACK arrives) and is never written to the server.

Why TIMEUUID as the message ID? A TIMEUUID is a UUID version 1 — it embeds a high-precision timestamp in the first 60 bits of the 128-bit value (measured in 100-nanosecond intervals). This gives three properties at once:

No coordination needed — every Chat Server generates globally unique, time-ordered IDs independently, without talking to a central counter or sequence service.
Natural time ordering — Cassandra clusters rows by TIMEUUID DESC, making "load the last 50 messages" a fast sequential read from the top of the partition.
Collision-free distributed generation — a TIMEUUID embeds a node identifier and clock sequence that ensures uniqueness even when multiple Chat Servers generate IDs at the exact same instant. Sequential integers would require a shared counter to avoid collisions across servers; TIMEUIDs do not.

Clock skew caveat: Because TIMEUUID ordering is based on each server's local clock, messages sent simultaneously from two different servers are ordered by their respective clocks. If two servers' clocks differ by more than a few milliseconds, messages could appear in a slightly wrong order. In practice, this is mitigated by running NTP (Network Time Protocol) on all servers, which keeps clocks synchronized to within a few milliseconds. This level of ordering accuracy is acceptable for chat — a reply appearing a few milliseconds before its question is imperceptible to users. For use cases requiring strict ordering guarantees (financial transactions, distributed ledgers), you would use a centralized sequence counter instead.

Step 4: Deep Dive — WebSockets: Why Real-Time Needs a Persistent Connection#

HTTP was designed for documents: a client sends a request, a server returns a response, and the connection closes. This works for loading a web page. It breaks for chat, where the server needs to push a message to a client at any time, without the client asking for it first.

Three Approaches to Real-Time: Polling, Long-Polling, and WebSockets

As you move from polling to WebSockets, latency drops and server overhead drops — at the cost of stateful connections. WebSockets require the server to maintain a persistent, stateful connection per user. This is the trade-off that drives the entire scaling challenge in chat.

Rendering diagram...

The WebSocket Connection Lifecycle#

When a user opens the chat app, the client initiates a WebSocket connection through a standard HTTP upgrade handshake:

1. Client → Server:
   GET /ws HTTP/1.1
   Host: chat.example.com
   Upgrade: websocket
   Connection: Upgrade
   Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

2. Server → Client:
   HTTP/1.1 101 Switching Protocols
   Upgrade: websocket
   Connection: Upgrade
   Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

3. The TCP connection is now a full-duplex WebSocket channel.
   Both sides send lightweight frames instead of full HTTP requests.

After the handshake, the Chat Server registers the connection in an in-memory map: userId → WebSocketConnection. When a message needs to be delivered to a user, the server looks up their entry in this map and writes a frame directly to their socket.

Detecting dead connections: Mobile devices switching between Wi-Fi and cellular, or screens going to sleep, can silently drop the underlying TCP connection without sending a WebSocket close frame. The Chat Server detects these silent disconnections using a ping/pong heartbeat: every 30 seconds, the server sends a WebSocket ping frame. If no pong is received within 60 seconds (tolerating one missed ping before declaring the connection dead), the server removes that socket from its in-memory map and marks the user offline. This is also the trigger for unsubscribing from that user's Redis Pub/Sub channels, freeing the subscription resources.

Step 5: Deep Dive — The Pub/Sub Fan-out Layer#

This is the architecturally most important section of this case study. Once you run more than one Chat Server, you face the central distributed chat problem:

Alice (connected to Server 1) sends a message to the "team-general" channel. Bob is connected to Server 2. How does Server 2 know to push the message to Bob?

The Cross-Server Delivery Problem and Its Solution via Redis Pub/Sub

Without a shared communication layer, Chat Server 1 has no knowledge of connections on Chat Server 2. Redis Pub/Sub solves this: every Chat Server subscribes to the Redis channels that its locally connected clients belong to. When any server publishes a message, Redis immediately delivers it to all subscribed servers, each of which pushes it to their local clients in that channel.

Rendering diagram...

The Complete Message Flow#

When Alice sends "Hey team!" to the "team-general" channel, here is the full sequence:

Redis channel naming: The Redis Pub/Sub channel name mirrors the application channel concept. For a group channel named "team-general", the Redis channel is channel:<channel_id>. For a 1:1 direct message between users 42 and 99, use the same pattern with the DM's channel ID (e.g., channel:<dm_channel_id>) — this avoids collision issues from ordering user IDs differently.

Client → Server: Alice's app sends a WebSocket frame: { type: "message", channel_id: "team-general", content: "Hey team!", client_id: "msg-client-abc-123" }
Chat Server 1 receives the frame and immediately does three things in parallel:
- Generates a TIMEUUID as the message_id, then publishes to Redis: PUBLISH channel:team-general { message_id, sender_id, content, client_id, ... }
- Enqueues to Kafka topic chat-messages for durable persistence
- Sends an acknowledgment back to Alice: { type: "ack", client_id: "msg-client-abc-123", message_id: "generated-timeuuid" }
Redis delivers to all subscribers: Every Chat Server subscribed to channel:team-general receives the message — including Server 1 itself. Server 1 pushes to Dave (a local client, and to Alice's other devices if she is logged in on multiple), Server 2 pushes to Bob and Carol (its local clients).
Kafka consumer (Message Service) asynchronously writes the message to Cassandra with the final message_id.
If Bob was offline: The real-time delivery step was skipped for him. The system can send him a mobile push notification (APNs for iOS, FCM for Android) to prompt him to open the app. When he does reconnect, his client sends last_seen_message_id and fetches all missed messages via GET /api/v1/channels/{id}/messages?after_id=<last_seen_id> from Cassandra. There is no limit on how far back this sync goes — a user offline for a week fetches all messages since their last session, paginating in batches until caught up.

The role of client_id: This is a UUID generated by Alice's app before she sends the message. It is her idempotency key. If Alice's network drops after her message was delivered but before she received the ACK, her app will retry — and critically, it must resend the same client_id (not generate a new one). When the retry arrives at the server, it checks whether a message with that client_id already exists in Cassandra. If found, it returns the existing message_id instead of writing a new record. The message appears exactly once in the conversation regardless of how many times the client retried.

Step 6: Deep Dive — Message Persistence#

Chat history has a specific access pattern: load the most recent N messages, ordered newest-first, for a given conversation — and paginate backward in time. This is a time-series workload. Cassandra is built for exactly this.

Message Storage: Cassandra Time-Series Partitioning

Cassandra organizes data into partitions (determined by the partition key) and sorts rows within each partition by the clustering key. For chat messages, the partition key is channel_id and the clustering key is message_id in descending order. All messages for a channel live together on a small set of nodes, and the newest messages are physically first — making 'load the last 50 messages' a single sequential read from the top of one partition.

Rendering diagram...

Message Sync on Reconnect#

Real-time delivery via Redis Pub/Sub is best-effort: if a Chat Server is restarting when a message is published, or a user's connection drops momentarily, the real-time push is missed. Cassandra is always the source of truth.

Each client tracks the ID of the last message it successfully displayed. On reconnect, it requests all messages after that point:

GET /api/v1/channels/{id}/messages?after_id=<last_seen_id>&limit=100
→ Returns all messages sent after that ID, in ascending order (oldest first)

This sync-on-reconnect pattern means the system always converges to the correct state regardless of which real-time events were missed. It also means you can safely deploy or restart Chat Servers at any time — users will automatically fill the gap from Cassandra when their connection is restored. Note that after_id is the complement of the before_id used in history pagination: before_id pages backward in time (scroll up to see older messages), while after_id pages forward in time (catch up on newer messages).

Step 7: Deep Dive — Presence Detection#

Knowing whether a contact is online feels simple, but it has subtle failure modes that become important at scale.

Presence Detection via Redis TTL and Client Heartbeats

Presence is stored as a Redis key with a time-to-live (TTL). The client sends a heartbeat every 15 seconds to refresh the TTL. If the heartbeat stops — whether due to an intentional disconnect, a network failure, or a server crash — the key expires automatically and the user is considered offline. No background cleanup job required.

Rendering diagram...

Step 8: Trade-offs#

Architecture Variants: From Single Server to Fully Distributed

Chat architecture evolves through distinct steps, each one solving a specific scaling problem introduced by the previous step. The trade-off at each step is complexity against capability. Start as simple as possible and migrate when you hit a concrete wall.

Rendering diagram...

Key Architectural Decisions Compared#

Decision	Option A	Option B	Recommendation
Real-time transport	WebSockets — bidirectional, persistent, ~2 byte frame overhead	Server-Sent Events (SSE) — server-to-client only, simpler, works natively over HTTP/2	WebSockets for interactive chat; SSE for read-heavy one-way streams such as notification feeds or activity logs
Fan-out mechanism	Redis Pub/Sub — microsecond fan-out, no persistence, fire-and-forget	Kafka — millisecond fan-out, durable, ordered, replayable consumer groups	Redis for real-time delivery between Chat Servers; Kafka for the durable downstream persistence pipeline
Message storage	Cassandra — native time-series, horizontal write scale, operational expertise required	DynamoDB — fully managed, auto-scaling, pay-per-request pricing	Cassandra for cost efficiency at high sustained volume; DynamoDB if operational simplicity is the priority and volume is moderate
Message ordering	TIMEUUID — timestamp-embedded, no coordination, small clock-skew risk	Per-channel sequence counter (Redis INCR) — exact ordering, but a central bottleneck	TIMEUUID for most systems; per-channel sequence counters only when strict ordering is a non-negotiable requirement
Presence storage	Redis TTL + heartbeat — sub-millisecond read, automatic expiry on disconnect	Database row with updated_at timestamp — durable but extremely write-heavy	Redis TTL for all production systems — storing presence in a relational database is a well-known anti-pattern at scale

Summary#

Concept	What It Solves	Key Insight
WebSockets	HTTP cannot push messages to a client without a client-initiated request — chat requires a persistent, bidirectional channel	One TCP connection upgraded from HTTP; both sides can send frames at any time with 2-byte overhead instead of ~800-byte HTTP headers
Redis Pub/Sub fan-out	Messages must reach users who are connected to different server instances	Each Chat Server subscribes to Redis channels for its locally connected users; publishing one message instantly reaches every server hosting members of that channel
Cassandra partitioning	1 billion messages per day requires a store that scales writes horizontally and reads recent messages fast	Partition by channel_id so all messages for a conversation cluster together; cluster by TIMEUUID DESC so 'last 50 messages' is a single fast sequential read from the top of the partition
Redis TTL presence	Detecting offline users cleanly when connections drop without sending a close frame	Presence is a Redis key with a TTL; client heartbeats refresh the TTL every 15 seconds; no heartbeat means the key expires and the user is automatically offline
TIMEUUID message IDs	Globally unique, time-ordered IDs without a central counter or coordination service	UUID v1 embeds the creation timestamp, enabling natural sort order and distributed ID generation with no bottleneck. Risk: if server clocks drift out of sync, messages from different servers may be ordered slightly incorrectly — mitigated by NTP synchronization. Acceptable for chat; use a centralized counter for strict ordering requirements.
Sync on reconnect	Real-time Pub/Sub delivery is best-effort and must tolerate misses	The client stores its last seen message ID and fetches all messages after it on reconnect — Cassandra is the source of truth, not Pub/Sub
client_id idempotency	Network retries can cause the same message to be sent and stored twice	The client generates a unique client_id before sending; the server stores it in Cassandra and deduplicates on it, guaranteeing each message appears exactly once regardless of retries

The real-time chat app is the canonical introduction to stateful distributed systems. Every design challenge you solve here — persistent connections, cross-server fan-out, time-series storage, TTL-based presence — reappears in live collaboration tools, multiplayer games, financial trading platforms, and any system where users expect the world to update in front of them without asking for it. Master the patterns here, and the rest follows naturally.

Sources:

PreviousNotification System

NextSocial Media Feed

Real-time Chat App

Three Approaches to Real-Time: Polling, Long-Polling, and WebSockets

The Cross-Server Delivery Problem and Its Solution via Redis Pub/Sub

Message Storage: Cassandra Time-Series Partitioning

Presence Detection via Redis TTL and Client Heartbeats

Architecture Variants: From Single Server to Fully Distributed

Arch Advisor