·7 min read

Stop Reaching for Kafka. Your Postgres Is Already a Queue.

PostgreSQLArchitectureBackend

Every time two services need to talk to each other, someone in the architecture meeting says the word “Kafka.” Then everyone nods, the ticket gets filed, and three sprints later you're running a broker cluster, managing consumer groups, dealing with schema registries, and writing runbooks for dead-letter queues — all so you could propagate a row change from one database to another.

I'm tired of watching teams do this to themselves.

We recently had to sync two separate Postgres databases bidirectionally. Accounts created in one system needed to appear in the other. Updates needed to flow both ways. Deletes needed to cascade. The classic setup that screams “message queue.”

We didn't use one. Here's why that was the right call, and why you're probably overbuilding too.


The Controversial Take

A message queue is not an architecture. It's a coping mechanism for when you've already made your system too complicated.

Kafka is extraordinary at what it does — high-throughput, multi-consumer, replayable event streams. It's also a distributed system you now have to operate, monitor, and reason about, on top of the two systems you already had. Most teams adopting it for service-to-service sync don't need 99% of what it gives them. They need: “when row A changes in database 1, update row B in database 2.” That's it.

Postgres has had everything you need to do this for decades. You're just not using it.


The Pattern: Event Tables

Every database change that needs to sync gets written to a small event table — in the same database, in the same transaction. No separate service, no network hop, no “at least once vs exactly once” philosophy debate.

CREATE TABLE sync_events (
    id         bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    record_id  integer NOT NULL,
    event_type text NOT NULL  -- 'insert' | 'update' | 'delete'
);

A database trigger populates this table. No application code touches it on the producer side. The database is the source of truth, so the database writes the event. This is the transactional outbox pattern, and it's been a known solution since at least 2015. The reason people skip it is that “queue table” doesn't sound impressive in an architecture diagram.


The Consumer Is Just a Loop

A single daemon process polls the event table every few seconds, processes each row, and deletes it — all in the same transaction. The deletion and the side effect are atomic. If the process crashes, the row survives and gets retried on the next poll. No event is ever lost.

for (;;) {
    const events = await fetchPendingEvents()
    for (const event of events) {
        await processAndDelete(event)  // one transaction, both steps
    }
    await sleep(Duration.ofSeconds(2))
}

That's the entire consumer. The operational complexity of your “message broker” is now a for loop and a sleep.


What About Exactly-Once Delivery?

Kafka evangelists love bringing up delivery semantics. Here's the reality: Kafka doesn't give you exactly-once processing either — it gives you exactly-once delivery, which is useless if your handler isn't idempotent. You still have to write idempotent consumers. Once you do that, at-least-once delivery (which is what the event table gives you) is functionally identical.

Your handlers should be upserts. If they are, processing the same event twice does nothing. The “exactly-once” conversation becomes irrelevant.


Concurrent Workers Without a Lock Service

You might want multiple instances of the daemon running for redundancy. But you don't want all of them processing the same events. The usual answer is ZooKeeper, or Redis, or some external coordination service.

Postgres already has advisory locks. One line:

SELECT pg_try_advisory_lock(12345)

The lock is held for the lifetime of the connection. The process that gets it processes events. The others poll until they can acquire it. No external service, no distributed consensus algorithm, no new thing to operate.

Use the non-blocking pg_try_advisory_lock, not the blocking version. A blocking lock query sitting open for hours is invisible noise in your database monitoring and prevents vacuum from cleaning up dead tuples. Poll instead.


The Fetch-Before-Process Trick

If you're syncing entities with dependencies — say, accounts before their child users — you need to be careful about race conditions. The naive approach (process accounts, then fetch and process users) has a gap: a user event written after your account fetch but before your account processing references an account that hasn't landed yet.

The fix is simple: fetch all dependent events first, then process parents, then process the already-fetched dependents. New events generated in the gap get picked up next iteration. No locks, no coordination, just careful fetch ordering.


The Operational Win Nobody Talks About

When a Kafka consumer falls behind, you open Grafana, stare at consumer group lag metrics, and try to figure out which partition is stuck.

When an event table has a backlog, you run:

SELECT COUNT(*) FROM sync_events;
SELECT * FROM sync_events ORDER BY id LIMIT 10;

You see exactly what's waiting. You can manually delete a stuck event, replay it, inspect it, join it against the source table. Everything is just SQL. Every engineer on your team already knows how to do this.

That's not a minor convenience. Over a year of operations, that's dozens of hours not spent learning a new tool's failure modes.


When to Actually Use Kafka

To be clear: this is not “Kafka is bad.” Kafka is exceptional when you need fan-out to many consumers, event replay for analytics, or volumes that would make a polling loop impractical. If you're processing millions of events per minute, use Kafka.

But if you're syncing two internal services where a few seconds of latency is fine, where your events number in the hundreds per minute, and where your team is four engineers who also have to ship features — the event table pattern will serve you better. Not because it's technically superior at scale, but because it's simple enough that you can hold the whole thing in your head, debug it with a SQL client, and hand it off to a new hire in an afternoon.

The best architecture is the one you can actually operate. Most teams reach for Kafka before they've outgrown a cron job. Don't let the tool's reputation write checks your problem size doesn't need to cash.