Strangler-fig migration of a tightly-coupled synchronous service layer to an event-driven Apache Kafka architecture — designed to run alongside the existing system in production, with a clear handover boundary, documented ADRs, and a team that could own it independently after handover.
The client had a working monolith. The problem wasn’t that it didn’t work — it was that it was becoming progressively harder to change. Every deployment required coordinating multiple teams. Failures in one service cascaded. Adding a new downstream consumer of an event meant modifying the upstream producer.
The goal was not to rewrite the system. It was to introduce an event backbone that allowed services to be progressively decoupled, at a pace the team could manage, without a big-bang cutover that put production at risk.
Strangler fig pattern — the Kafka event backbone was introduced alongside the existing synchronous calls, not in place of them. Services published events as side-effects of their existing operations; downstream consumers were migrated one by one to consume from Kafka rather than calling upstream services directly. At each step, both the old and new paths ran in parallel until the new path was proven stable, then the old path was removed.
Domain event modelling — a domain events workshop with the engineering and product teams to agree on the canonical event vocabulary before any infrastructure was built. This was the most valuable part of the engagement: the team discovered that their existing service boundaries didn’t match their actual domain model, and the migration became an opportunity to correct that.
Operational first — dead letter queues, consumer group monitoring, schema registry for event versioning, and runbook documentation were all built before the first production event was published. Operating a Kafka-based system is different from operating a REST-based one; the team needed to be ready before they were dependent on it.
The migration ran over three months in production with no incidents. By the end, four of the eight original direct service dependencies had been replaced by event consumption. Each one reduced the blast radius of failures and removed a deployment coordination requirement. The team continued the remaining migrations independently after the engagement ended.