How to implement graceful shutdown in Spring Boot on Kubernetes — SIGTERM handling, preStop hooks, readiness probes, and draining in-flight requests safely.
Deploying a new version of a Spring Boot service on Kubernetes without dropping in-flight requests requires more than just setting server.shutdown=graceful. The full solution involves coordinating Kubernetes lifecycle hooks, readiness probes, and Spring’s shutdown sequence — and getting the order wrong means requests fail during rolling deployments.
This post covers the complete setup: what happens at pod termination, where the gaps are, and how to close them.
When you roll out a new deployment or scale down, Kubernetes sends SIGTERM to the container process and starts a grace period. The default is 30 seconds, after which it sends SIGKILL.
The problem: Kubernetes continues routing traffic to the pod for a brief window after SIGTERM is sent, because kube-proxy and the Endpoints controller remove the pod from load balancer backends asynchronously. Requests that arrive during this window hit a pod that’s already shutting down.
The solution is a two-phase approach:
Enable it in application.yml:
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 20s
With graceful shutdown, when Spring receives SIGTERM it:
timeout-per-shutdown-phaseWithout this, Spring exits immediately on SIGTERM — any in-flight requests are abandoned mid-processing.
Spring’s graceful shutdown handles in-flight requests, but it doesn’t stop Kubernetes from sending new requests right up until the pod is removed from the load balancer. The fix is to use the readiness probe.
Spring Boot Actuator exposes /actuator/health/readiness. When Spring begins shutdown it automatically marks itself not-ready, which causes the readiness probe to fail. Kubernetes stops routing new traffic once the probe fails.
Configure it:
management:
endpoint:
health:
probes:
enabled: true
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
Kubernetes deployment probe config:
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
When shutdown starts, the readiness probe fails within one probe cycle (5 seconds in this config). Kubernetes stops sending new traffic. Spring then drains existing requests during the grace period.
There’s still a gap between when SIGTERM is sent and when Kubernetes has propagated the pod removal to all kube-proxy instances across the cluster. During this window, new requests can still arrive at the pod even though it’s shutting down.
The fix is a preStop lifecycle hook that introduces a deliberate sleep before Spring’s shutdown begins:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]
Full deployment spec:
apiVersion: apps/v1
kind: Deployment
metadata:
name: market-data-service
spec:
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: market-data-service
image: your-registry/market-data-service:latest
ports:
- containerPort: 8080
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10
failureThreshold: 3
The sleep 5 in preStop delays the SIGTERM signal by 5 seconds. During those 5 seconds:
After the sleep, SIGTERM fires, Spring begins graceful shutdown, and drains the remaining in-flight requests.
terminationGracePeriodSeconds must be long enough to cover preStop duration + maximum expected request drain time + a safety margin. With sleep 5 and a 20-second drain, 60 seconds is comfortable.
For services with long-running background tasks (Kafka consumers, scheduled jobs, async processors), register a custom shutdown hook:
@Component
public class KafkaConsumerShutdownHook implements SmartLifecycle {
private final KafkaListenerEndpointRegistry registry;
private volatile boolean running = false;
@Override
public void start() {
running = true;
}
@Override
public void stop(Runnable callback) {
log.info("Stopping Kafka consumers gracefully");
registry.getListenerContainers().forEach(container -> {
container.stop();
log.info("Stopped container: {}", container.getListenerId());
});
running = false;
callback.run();
}
@Override
public boolean isRunning() { return running; }
@Override
public int getPhase() { return Integer.MAX_VALUE - 10; }
@Override
public boolean isAutoStartup() { return true; }
}
The getPhase() value controls shutdown order — higher values stop first. Set Kafka consumers to stop before the HTTP server so no new messages are consumed once shutdown begins.
Simulate a rolling deployment locally with Docker:
# Start the service
docker run -p 8080:8080 market-data-service
# Send a slow request that takes 10 seconds
curl -o /dev/null -s -w "%{http_code}" http://localhost:8080/api/slow-endpoint &
# Immediately send SIGTERM
docker kill --signal=SIGTERM <container-id>
# The slow request should complete (200), not be dropped (connection reset)
With server.shutdown=graceful and timeout-per-shutdown-phase: 20s, the 10-second request should complete normally. Without it, you’d get a connection reset.
Rolling deployments: without preStop sleep, roughly one in every N requests during a deployment window hits a pod that’s already shutting down. On a busy service, this means visible 5xx errors in APM dashboards during every deploy.
Kafka consumers: without SmartLifecycle stopping consumers first, a consumer can receive a message, begin processing, and then have the database connection closed mid-transaction as the context shuts down.
Database connections: without graceful shutdown, HikariCP connection pool is closed while queries are still in flight. The queries fail with Connection is closed rather than completing or rolling back cleanly.
If you’re running Spring Boot services on Kubernetes and want to eliminate deployment errors entirely, let’s work together.