Practical SQS patterns for Spring Boot microservices — standard vs FIFO queues, @SqsListener, dead letter queues, idempotency, and visibility timeout tuning.
SQS is the bread-and-butter messaging service for AWS-based microservices. It’s simpler to operate than Kafka or RabbitMQ, integrates cleanly with IAM and other AWS services, and scales without you touching anything. But simple to operate does not mean simple to use correctly. At DWP Digital we handled high-volume benefit payment events over SQS, and the operational lessons from that work are what this post is about.
The first decision is queue type.
Standard queues offer unlimited throughput, but SQS guarantees at-least-once delivery — not exactly-once. Messages can arrive out of order and can be delivered more than once. Your consumer must be idempotent.
FIFO queues guarantee ordered delivery and exactly-once processing within a message group, but throughput is capped (300 TPS per queue, or 3,000 with batching). They require a MessageGroupId on every send and a MessageDeduplicationId for deduplication.
For most microservice use cases, standard queues are the right default. FIFO is worth the throughput constraint when strict ordering genuinely matters to the business domain — for example, processing customer account state changes in sequence.
Add spring-cloud-aws-sqs to your pom.xml:
<dependency>
<groupId>io.awspring.cloud</groupId>
<artifactId>spring-cloud-aws-sqs</artifactId>
</dependency>
With Spring Cloud AWS 3.x, the SqsAsyncClient is auto-configured from your AWS credentials and region. For local development, point at LocalStack:
spring:
cloud:
aws:
sqs:
endpoint: http://localhost:4566
region:
static: eu-west-2
credentials:
access-key: test
secret-key: test
The @SqsListener annotation is the Spring Cloud AWS equivalent of @KafkaListener:
@Component
public class PaymentEventConsumer {
private final PaymentService paymentService;
@SqsListener("payment-events")
public void receive(
@Payload PaymentEvent event,
@Header("ApproximateReceiveCount") String receiveCount) {
log.info("Processing payment {} (receive attempt {})",
event.getPaymentId(), receiveCount);
paymentService.process(event);
// message is deleted automatically on successful return
}
}
When the method returns without throwing, Spring Cloud AWS deletes the message from the queue. When it throws, the message becomes visible again after the visibility timeout and will be redelivered. This is the at-least-once delivery contract.
SqsTemplate is the high-level send abstraction:
@Service
public class PaymentEventPublisher {
private final SqsTemplate sqsTemplate;
public void publish(PaymentEvent event) {
sqsTemplate.send(to -> to
.queue("payment-events")
.payload(event)
.header("source", "payment-service"));
}
}
For FIFO queues, include messageGroupId and messageDeduplicationId:
sqsTemplate.send(to -> to
.queue("payment-events.fifo")
.payload(event)
.messageGroupId(event.getAccountId())
.messageDeduplicationId(event.getPaymentId()));
Configure a DLQ for every queue. Without one, a poison message that always fails will be redelivered until it expires (up to 14 days), consuming your visibility windows and polluting your metrics.
Set up the DLQ in your AWS infrastructure (CloudFormation, Terraform, or CDK):
"RedrivePolicy": {
"deadLetterTargetArn": "arn:aws:sqs:eu-west-2:123456789:payment-events-dlq",
"maxReceiveCount": 3
}
maxReceiveCount: 3 means after three failed delivery attempts, SQS routes the message to the DLQ. Alert on DLQ depth — a non-zero value means something is wrong in the consumer and requires investigation.
When a consumer picks up a message, SQS hides it from other consumers for the visibility timeout period. If your consumer doesn’t delete the message before the timeout expires, SQS makes it visible again and another consumer picks it up.
The default visibility timeout is 30 seconds. If your processing regularly takes longer than that, you’ll get spurious duplicates. Set the visibility timeout to at least 6x the expected processing time:
// Extend visibility timeout if processing will take a while
@SqsListener(value = "batch-jobs", acknowledgementMode = SqsListenerAcknowledgementMode.MANUAL)
public void processBatch(BatchJob job, Acknowledgement ack) {
// Extend by 60s before starting long work
ack.changeVisibility(Duration.ofSeconds(60));
doLongRunningWork(job);
ack.acknowledge();
}
Even with a correctly tuned visibility timeout, SQS standard queues can deliver the same message more than once. A consumer process that crashes after processing but before deleting the message will cause redelivery. Your consumer must tolerate this.
The standard pattern is to track processed message IDs in a durable store:
@Service
public class IdempotentPaymentProcessor {
private final ProcessedMessageRepository processedMessages;
private final PaymentRepository payments;
@Transactional
public void process(PaymentEvent event) {
String messageId = event.getMessageId();
if (processedMessages.existsById(messageId)) {
log.info("Duplicate message {}, skipping", messageId);
return;
}
payments.save(Payment.from(event));
processedMessages.save(new ProcessedMessage(messageId, Instant.now()));
}
}
The idempotency check and the business write must be in the same transaction. If your downstream is not transactional (e.g. a third-party HTTP call), use a conditional write or an outbox pattern instead.
A poison message is one that always fails processing regardless of how many times it’s retried — usually because the payload is malformed or the system it references no longer exists.
With maxReceiveCount set, these route to the DLQ automatically. In your DLQ consumer, log the raw message body along with all headers before doing anything else:
@SqsListener("payment-events-dlq")
public void handleDeadLetter(
String rawBody,
@Header("ApproximateFirstReceiveTimestamp") String firstReceived,
@Header("ApproximateReceiveCount") String receiveCount) {
log.error("Dead letter received after {} attempts. First seen: {}. Body: {}",
receiveCount, firstReceived, rawBody);
alertingService.notifyDeadLetter("payment-events", rawBody);
}
Injecting the raw String rather than a typed payload means deserialization failures won’t prevent the DLQ handler from logging the problem.
ApproximateNumberOfMessagesNotVisible alongside queue depth. A large not-visible count with low throughput means your consumers are getting messages but not processing them fast enough.Thread.sleep() in consumers — it blocks the polling thread. Use SQS visibility timeout extension for rate limiting instead.If you’re building AWS microservices and want messaging patterns that hold up under production load, get in touch.