How to configure Spring Boot graceful shutdown to drain in-flight requests before terminating, and what Kubernetes lifecycle hooks and readiness probes you need for zero-downtime rolling deploys.
A rolling deploy without graceful shutdown drops requests. When Kubernetes terminates a pod, it immediately stops routing traffic to it — but if the pod’s JVM accepts a SIGTERM and exits before finishing in-flight requests, those requests fail with connection reset. Graceful shutdown, combined with correct readiness probe configuration, makes rolling deploys invisible to clients.
Since Spring Boot 2.3, graceful shutdown is built in — enable it with one property:
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
When the JVM receives SIGTERM:
timeout-per-shutdown-phase (or when all requests complete), the context closesDuring this drain period, the readiness probe must return DOWN so the load balancer stops sending new traffic.
management:
endpoint:
health:
probes:
enabled: true
health:
readiness-state:
enabled: true
liveness-state:
enabled: true
This exposes:
/actuator/health/liveness — is the process alive?/actuator/health/readiness — is the process ready to serve traffic?When graceful shutdown begins, Spring automatically sets the readiness state to REFUSING_TRAFFIC — the readiness probe returns 503. Kubernetes stops routing to the pod before the drain period ends.
spec:
containers:
- name: trading-service
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]
There is a race condition in Kubernetes pod termination. When a pod is marked for deletion:
These happen in parallel, not in sequence. The load balancer’s iptables rules are updated asynchronously — there is a window of a few seconds where SIGTERM has been sent but traffic is still being routed to the pod.
The preStop: sleep 5 creates a buffer: the pod waits 5 seconds before Spring begins its graceful drain. By then, the iptables update has propagated and no new traffic is arriving.
Without this sleep, you get a brief window where the pod is draining but still receives new connections — those connections may be dropped.
preStop sleep: 5s
Graceful drain: up to 30s
Container cleanup: seconds
Set terminationGracePeriodSeconds in the pod spec to comfortably exceed the sum:
spec:
terminationGracePeriodSeconds: 60 # preStop + drain + margin
If terminationGracePeriodSeconds expires, Kubernetes sends SIGKILL — hard kill, in-flight requests dropped. Set it generously.
Spring Kafka’s ConcurrentMessageListenerContainer needs additional configuration to drain cleanly:
spring:
kafka:
listener:
shutdown-timeout: 25000 # wait 25s for in-flight messages to complete
The consumer stops polling and waits for the current batch to finish before closing. Set this below timeout-per-shutdown-phase so Kafka drains before Spring context close.
@Scheduled tasks run on the scheduler thread pool. If a task is running when SIGTERM arrives, it will be interrupted. For tasks that must complete:
@Configuration
public class SchedulerConfig {
@Bean(destroyMethod = "shutdown")
public Executor taskScheduler() {
ThreadPoolTaskScheduler scheduler = new ThreadPoolTaskScheduler();
scheduler.setPoolSize(2);
scheduler.setAwaitTerminationSeconds(25); // wait for tasks to finish
scheduler.setWaitForTasksToCompleteOnShutdown(true);
return scheduler;
}
}
Add a slow endpoint to test with:
@GetMapping("/slow")
public String slow() throws InterruptedException {
Thread.sleep(10_000); // simulate slow request
return "done";
}
Start a request to /slow, then SIGTERM the process:
kill -SIGTERM $(pgrep -f "trading-service")
With graceful shutdown configured, the request completes after 10 seconds and the process exits cleanly. Without it, the process exits immediately and the request fails.
The real test is a rolling deploy under load. Use k6 or Gatling to run continuous requests during the deploy:
// k6 test script
export default function () {
let response = http.get('http://trading-service/orders');
check(response, { 'status is 200': (r) => r.status === 200 });
}
A successful zero-downtime deploy shows 0 non-200 responses in the k6 output, even as pods are replaced.
If you’re deploying Spring Boot services to Kubernetes and want a review of your shutdown and probe configuration, get in touch.