Sync vs. Async: How Services Talk to Each Other

Every feature you build in a distributed system requires services to communicate. An API server calls a database. A checkout service notifies an email service. An AI agent calls an inference endpoint. The architectural choice you make — synchronous or asynchronous — determines how tightly coupled those services are, how they behave under failure, and how well they scale.

This section covers the two fundamental communication models and the three protocols you'll encounter in practice:

  • Synchronous: REST and gRPC — the caller sends a request and waits for a response
  • Asynchronous: Message Queues — the caller publishes a message and continues immediately; a separate consumer processes it later

In practice, most production systems use both — synchronous where an immediate response is required, and asynchronous where it is not. The key skill is knowing which to reach for.

The Core Distinction#

Synchronous communication means the caller blocks — it waits for the response before doing anything else. The two services must both be available at the same time. If the downstream service is slow, the caller is slow. If it crashes, the caller gets an error.

Asynchronous communication means the caller publishes a message to a queue or broker, then immediately continues with its next task. The message sits in the queue until a consumer picks it up and processes it — possibly milliseconds later, possibly minutes later. The caller and consumer never need to be available simultaneously.

Synchronous communication
Rendering diagram...
Asynchronous communication
Rendering diagram...

A common source of confusion: Using async/await in your code does not make your architecture asynchronous. When you write await fetch(url), you are writing non-blocking code — your thread is released while waiting — but the system is still synchronous: the HTTP request is still sent, and your function still waits for the response before continuing. From the perspective of the overall system, the caller is blocked until the downstream service responds. True asynchronous architecture requires a message broker to fully decouple the sender from the receiver: the sender publishes a message and returns immediately, with no knowledge of when or whether the receiver has processed it.

Synchronous Communication#

In synchronous communication, the caller and the downstream service are temporally coupled — they must both be online at the same time, and the response time of the downstream service directly affects the caller's response time. This makes synchronous communication simple to reason about but fragile under failure.

REST: The Universal Standard#

REST (Representational State Transfer) is the most widely used communication style in web systems. It uses HTTP methods (GET, POST, PUT, DELETE) to operate on resources identified by URLs, typically exchanging JSON.

REST is stateless: every request carries all the information needed to process it. The server holds no session state between requests.

REST: Request-Response Over HTTP

REST is the default for any public-facing API and for internal services where simplicity matters more than raw performance. Its text-based JSON payloads are human-readable and easy to debug, but larger and slower than binary alternatives.

Rendering diagram...

gRPC: High-Performance Internal Communication#

gRPC is a Remote Procedure Call (RPC) framework built by Google, designed for efficient service-to-service communication. Instead of JSON over HTTP/1.1, it uses Protocol Buffers (a compact binary format) over HTTP/2 (which multiplexes multiple requests over a single connection).

The result is roughly 3–10× better throughput and lower latency than REST for the same workload, with the gap widening at high concurrency and with larger payloads — at the cost of human-readability and more complex tooling.

gRPC: Binary RPC Over HTTP/2

gRPC compiles your API contract from a .proto schema file into type-safe client/server code in any language. HTTP/2 multiplexing lets multiple requests share one TCP connection, eliminating the per-request connection cost that REST suffers.

Rendering diagram...

Asynchronous Communication: Message Queues#

Asynchronous communication removes the temporal coupling entirely. The producer (the service that generates work) publishes a message to a broker and immediately continues. The consumer (the service that does the work) reads from the broker and processes the message independently — on its own schedule, at its own pace.

The broker holds the message durably until it is processed. If the consumer is offline, messages accumulate and are delivered when it comes back. If traffic spikes, messages buffer in the queue and consumers drain the backlog gradually — instead of crashing under load.

Message Queue: Decoupled Producer and Consumer

A message queue decouples when work is submitted from when it is executed. The producer is never blocked waiting for processing to complete. The consumer scales independently, processes messages at its own rate, and receives built-in retry and dead-letter handling if processing fails.

Rendering diagram...

Queue vs. Event Stream: Two Flavors of Async#

Not all message brokers work the same way. There are two distinct models:

Traditional Queue (RabbitMQ, AWS SQS)Event Stream (Apache Kafka, AWS Kinesis)
ModelPoint-to-point: each message is delivered to one consumer and deleted after acknowledgmentPub/Sub log: messages are appended to a persistent log and multiple consumer groups can read the same message independently
RetentionMessages are removed once consumedMessages are retained for a configurable period (hours, days, forever) regardless of consumption
Use caseTask queues: send one email, process one payment, run one jobEvent-driven architectures: audit logs, real-time analytics, event sourcing, feeding multiple independent consumers
OrderingStandard queues (SQS Standard, RabbitMQ with multiple consumers) offer no ordering guarantee. FIFO queues (SQS FIFO, RabbitMQ with a single consumer) provide strict ordering at lower throughput.Strict ordering within a partition (a shard of the topic); no ordering guarantee across partitions
ReplayNot supported — once consumed, the message is goneSupported — consumers can re-read from any point in the log, enabling historical replay and backfill
ComplexityLower — simpler to set up and operateHigher — Kafka requires more infrastructure and operational expertise

Rule of thumb: Use a traditional queue (SQS, RabbitMQ) when you need to distribute tasks to workers — one message, one processor, then deleted. Use an event stream (Kafka) when multiple independent systems need to react to the same event, or when you need a persistent, replayable log of what happened. Note that Kafka can also distribute work across workers via consumer groups, but its operational overhead is significantly higher — favor a traditional queue for simple task distribution unless you specifically need event streaming capabilities. Event streams are covered in depth in the next section.

Cascading Failure: The Risk of Synchronous Chains#

The most dangerous failure mode in synchronous systems is the cascading failure: a slow downstream service makes every upstream service slow — even services that have nothing wrong with them.

Synchronous Chain: How One Slow Service Brings Down Everything

In a synchronous call chain, each service waits for the next. If Service C slows down, Service B's threads fill up waiting. Then Service A's threads fill up waiting on B. The entire chain becomes unresponsive — even though Services A and B have no problem of their own.

Rendering diagram...

Asynchronous communication is the structural remedy for this risk: move anything that does not need to complete before the user sees a response into a message queue. The checkout service no longer holds a thread open waiting for email and analytics — it publishes a message and returns immediately, regardless of what those downstream services are doing.

Decision Framework: Sync vs. Async#

ScenarioRecommended ApproachWhy
User submits a search query and waits for resultsSynchronous (REST or gRPC)The result is needed before anything else can happen — there is nothing to defer
User completes checkout; confirmation email must be sentAsynchronous (Queue)The email does not need to complete before the user sees the 'Order confirmed' page
Internal service calls another at 50,000 req/sSynchronous (gRPC)Low latency, high throughput, same datacenter — the overhead of a broker adds latency without benefit
Checkout event triggers email + inventory + fraud check + analyticsAsynchronous (Queue/Fan-out)Multiple independent consumers need the same event; none of them should block checkout
AI batch inference for 1M records overnightAsynchronous (Queue)Processing time is long; no user is waiting; consumers can be throttled to stay within API rate limits
Public API called by third-party developersSynchronous (REST)Third parties expect standard HTTP request-response; introducing a queue requires them to poll for results
Real-time dashboard updated from many data sourcesAsynchronous (Event Stream / Kafka)Events flow continuously; multiple dashboard components consume the same stream independently
Payment processing (must complete before order is confirmed)Synchronous (REST or gRPC)Payment status is required immediately — the order cannot be confirmed until payment is known

What AI Agents Get Wrong#

AI Agents and Communication Pattern Defaults

AI agents default to synchronous REST for every service-to-service interaction, regardless of whether the operation needs to be synchronous. This creates unnecessarily slow critical paths and brittle systems that fail when any downstream service becomes slow.

Rendering diagram...

Protocol Comparison at a Glance#

RESTgRPCMessage Queue
Communication styleSynchronousSynchronousAsynchronous
ProtocolHTTP/1.1 or HTTP/2HTTP/2AMQP, proprietary
Payload formatJSON (text)Protocol Buffers (binary)JSON, binary, or custom
Typical latency50–200ms5–20msMilliseconds to minutes (by design)
CouplingTemporal (caller waits)Temporal (caller waits)None — fully decoupled
Failure handlingCaller receives error immediatelyCaller receives error immediatelyBroker retries; DLQ for unprocessable messages
Best forPublic APIs, CRUD, simple request-responseHigh-throughput internal calls, streamingBackground jobs, fan-out, long-running tasks
Browser supportFullRequires gRPC-Web proxyNot applicable
DebuggingEasy — curl, Postman, browser DevToolsNeeds grpcurl / Postman gRPCNeeds broker dashboard (RabbitMQ UI, AWS Console)

Summary#

ConceptKey Takeaway
SynchronousCaller blocks and waits for a response. Simple to reason about but creates temporal coupling — a slow downstream service makes every caller slow.
AsynchronousCaller publishes a message and immediately continues. Decoupled but eventually consistent — the work may happen milliseconds or minutes later.
RESTHTTP + JSON. Universal and simple. Best for public APIs, CRUD, and any case where simplicity beats raw performance.
gRPCHTTP/2 + Protocol Buffers. 3–10× faster than REST. Best for high-throughput internal service-to-service calls where performance matters more than simplicity.
Message QueueBroker-mediated. Best for background work, fan-out to multiple consumers, rate-limited processing, and anything that does not need to complete before the user sees a response.
Cascading failureThe key risk of synchronous chains — a slow downstream service blocks every upstream service. Add timeouts and circuit breakers to every synchronous dependency.
Critical path ruleAsk: 'Does the user need this result before they see a response?' If yes → synchronous. If no → asynchronous. Most systems need both.
AI agent defaultAI generates synchronous REST for everything. Always specify which operations are critical path (synchronous) and which should be moved to a queue (asynchronous).

A practical starting rule: use REST by default, then move an operation to a message queue when you can answer yes to any of these questions: Does it take more than a few hundred milliseconds to complete? Does it need to fan out to multiple independent services? Does it depend on an external system with rate limits? Can it fail without affecting what the user sees right now? If yes to any of these, it belongs in a queue, not on the synchronous critical path.

Sources: