How Kafka consumer offsets work, the trade-offs between auto and manual commit, and how to implement at-least-once and exactly-once semantics correctly in a Spring Boot consumer.
Kafka’s durability guarantee is only as strong as your offset management. An offset is a pointer to the last message your consumer processed — commit it at the wrong time and you either lose messages (skip past failures) or reprocess them endlessly (never advance past a persistent error). Getting offset management right requires understanding what Kafka promises, what Spring Kafka provides, and where you need to take manual control.
Every message in a Kafka topic partition has a monotonically increasing offset. When a consumer group processes a partition, it commits the offset of the last successfully processed message. If the consumer restarts, it resumes from the committed offset.
The consumer group offset is stored in the internal __consumer_offsets topic, not on the broker message log. Committing is an explicit act — messages are not auto-acknowledged by delivery.
Spring Kafka’s default configuration wraps Kafka’s enable.auto.commit=true. Every auto.commit.interval.ms (default 5 seconds), the consumer commits the highest offset it has fetched, regardless of whether your code has processed those messages.
The lie: between fetching a batch and the next auto-commit interval, your consumer could crash. On restart, it fetches from the committed offset — which is before the messages it had fetched but not yet processed. Reprocessing. Or worse: if auto-commit fires after fetch but before processing, and the consumer crashes mid-processing, those messages are skipped permanently.
Auto-commit is acceptable only for non-critical, idempotent consumers where occasional reprocessing or loss is tolerable. For anything that writes to a database, places an order, or sends a notification, disable it.
spring:
kafka:
consumer:
enable-auto-commit: false
listener:
ack-mode: MANUAL_IMMEDIATE
With MANUAL_IMMEDIATE, the listener receives an Acknowledgment object and must call acknowledgment.acknowledge() explicitly after successful processing.
@KafkaListener(topics = "order-events", groupId = "order-processor")
public void onOrderEvent(OrderEvent event, Acknowledgment ack) {
try {
orderService.process(event);
ack.acknowledge();
} catch (RetryableException e) {
log.warn("Transient failure processing order {}, will retry", event.orderId());
// Do NOT acknowledge — Kafka will redeliver from this offset on restart
throw e;
} catch (FatalException e) {
log.error("Permanent failure for order {}, sending to DLQ", event.orderId());
deadLetterPublisher.send(event);
ack.acknowledge(); // advance past the poison message
}
}
This gives at-least-once delivery: every message is processed at least once, and acknowledged only after success. In the transient-failure case, no acknowledge means the partition assignment must be re-fetched — but Spring Kafka’s DefaultErrorHandler with backoff handles retry within the same consumer before the partition is reassigned.
For consumers that process messages in bulk — writing to a database in batches, for example — commit after the batch completes:
@KafkaListener(topics = "market-prices", groupId = "price-processor",
containerFactory = "batchFactory")
public void onPriceBatch(List<MarketPrice> prices, Acknowledgment ack) {
priceRepository.saveAll(prices); // single DB write for the batch
ack.acknowledge(); // commit after the entire batch is persisted
}
Configure the batch factory with appropriate fetch sizes:
@Bean
public ConcurrentKafkaListenerContainerFactory<String, MarketPrice> batchFactory(
ConsumerFactory<String, MarketPrice> cf) {
ConcurrentKafkaListenerContainerFactory<String, MarketPrice> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(cf);
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
return factory;
}
Spring Kafka 2.8+ provides DefaultErrorHandler as the replacement for SeekToCurrentErrorHandler. Configure exponential backoff with a dead-letter topic:
@Bean
public DefaultErrorHandler errorHandler(KafkaTemplate<String, Object> template) {
DeadLetterPublishingRecoverer recoverer =
new DeadLetterPublishingRecoverer(template,
(record, ex) -> new TopicPartition(record.topic() + ".DLT", record.partition()));
ExponentialBackOffWithMaxRetries backoff = new ExponentialBackOffWithMaxRetries(3);
backoff.setInitialInterval(1_000L);
backoff.setMultiplier(2.0);
backoff.setMaxInterval(10_000L);
return new DefaultErrorHandler(recoverer, backoff);
}
After 3 retries with exponential backoff, the message is published to the .DLT topic and the original offset is committed. The consumer advances — it does not get stuck on a permanently failing message.
At-least-once delivery means your consumer must be idempotent — processing the same message twice must produce the same result as processing it once. The simplest approach: check before acting.
public void process(OrderEvent event) {
if (orderRepository.existsByIdempotencyKey(event.idempotencyKey())) {
log.debug("Duplicate event {}, skipping", event.idempotencyKey());
return;
}
orderRepository.save(Order.from(event));
}
True exactly-once semantics — processing exactly once with no duplicates even across crashes — requires transactional producers and the Kafka Streams transactional API (isolation.level=read_committed). For most Spring Boot consumers writing to a database, idempotent at-least-once is the practical choice: simpler, sufficient, and far easier to reason about under failure.
Consumer lag — the number of uncommitted messages between the latest offset and the consumer position — is the primary health metric for a Kafka consumer. Zero lag is ideal; growing lag means the consumer cannot keep up with the producer.
Expose lag via Micrometer (included in Spring Boot Actuator):
management:
metrics:
tags:
application: order-processor
Spring Kafka automatically registers kafka.consumer.fetch-manager-metrics which includes records-lag-max. Wire this to a Prometheus alert:
alert: KafkaConsumerLagHigh
expr: kafka_consumer_fetch_manager_metrics_records_lag_max > 10000
for: 5m
labels:
severity: warning
For debugging rebalance storms or uneven partition distribution:
@Bean
public ConsumerAwareRebalanceListener rebalanceListener() {
return new ConsumerAwareRebalanceListener() {
@Override
public void onPartitionsAssigned(Consumer<?, ?> consumer,
Collection<TopicPartition> partitions) {
log.info("Assigned partitions: {}", partitions);
}
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
log.info("Revoked partitions: {}", partitions);
}
};
}
Frequent rebalances — visible as rapid assigned/revoked cycles in the logs — indicate a consumer that is too slow to process its batch within max.poll.interval.ms. Either reduce the batch size or increase the poll interval.
Offset management is not a configuration detail — it is a correctness boundary. The choice between auto-commit, manual acknowledge, and idempotent processing determines whether your consumer loses data, duplicates work, or handles failures cleanly.
If you’re working on Kafka consumer infrastructure in Spring Boot and want a review of your offset and error handling strategy, get in touch.