A fault-tolerant, serverless real-time API service ingesting complex transactional data from major financial institutions across FX, fixed income, credit, and repo markets — with cross-region disaster recovery built in from day one.
Build a zero-maintenance, fault-tolerant ingestion pipeline capable of receiving real-time transactional data from major financial institutions across FX, fixed income, credit, and repo — with no data loss under any failure scenario, including regional AWS outages.
Serverless AWS ingestion pipeline: API Gateway → SNS fan-out → SQS → Lambda, with cross-region DR replication built in from day one. Dead-letter queues and idempotent handlers ensured recoverability. Infrastructure fully defined in Ansible-templated CloudFormation stacks. Apache Druid pipeline rebuilt in Java and Python to eliminate ingestion latency.
Zero data loss for institutional clients throughout the engagement. DR architecture proved under both planned and unplanned failure scenarios. Druid pipeline improvements measurably reduced data latency for end-user analytics, directly improving the client experience Mosaic's sales team could demonstrate.
"At Mosaic Smart Data, Samuel demonstrated outstanding technical depth and a genuine talent for collaboration. His contributions to our real-time data ingestion platform were instrumental in strengthening the team's capability and delivering exceptional value to our clients."
Mosaic Smart Data — Real-Time Financial Data IngestionMosaic Smart Data provides sophisticated transaction cost analysis (TCA) and market analytics to institutional clients across global financial markets. My engagement there, from 2022 to 2024, centred on building and extending the core data ingestion platform — the system responsible for receiving, validating, enriching, and persisting transactional data from major banks and asset managers in real time.
The platform needed to be zero-maintenance, fault-tolerant, and capable of recovering completely from any failure without data loss. It also needed to handle the operational complexity of ingesting data from institutions with differing API protocols, data formats, and delivery cadences.
The core of my work was a serverless AWS ingestion pipeline handling high-volume, time-sensitive transactional data across foreign exchange, fixed income, credit, and repo markets. The architecture was designed so that no component failure — whether a Lambda timeout, an SQS message delay, or a regional AWS outage — could result in lost or corrupted data.
I also extended the existing backend with a Java and Python data pipeline to improve the enrichment and ingestion of data into Apache Druid, the analytical data store powering Mosaic’s client-facing analytics. This significantly reduced ingestion latency and improved the accuracy of the data visible to end users.
The platform delivered reliable, zero-data-loss ingestion for Mosaic’s institutional client base throughout my engagement. The DR architecture — tested under both planned and unplanned failure scenarios — performed as designed. The Apache Druid pipeline improvements measurably reduced data latency for end-user analytics, directly improving the client experience Mosaic’s sales team could demonstrate.
Mosaic Smart Data AWS Serverless AWS Lambda AWS SNS AWS SQS AWS CloudFormation Ansible Apache Druid Python