Graceful Shutdown and Zero-Downtime Rolling Deploys

A rolling deploy without graceful shutdown drops requests. When Kubernetes terminates a pod, it immediately stops routing traffic to it — but if the pod’s JVM accepts a SIGTERM and exits before finishing in-flight requests, those requests fail with connection reset. Graceful shutdown, combined with correct readiness probe configuration, makes rolling deploys invisible to clients.

Spring Boot graceful shutdown

Since Spring Boot 2.3, graceful shutdown is built in — enable it with one property:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

When the JVM receives SIGTERM:

Spring sets the application context to “shutting down”
The embedded Tomcat stops accepting new requests
Existing requests are allowed to complete
After timeout-per-shutdown-phase (or when all requests complete), the context closes

During this drain period, the readiness probe must return DOWN so the load balancer stops sending new traffic.

Readiness probe

management:
  endpoint:
    health:
      probes:
        enabled: true
  health:
    readiness-state:
      enabled: true
    liveness-state:
      enabled: true

This exposes:

/actuator/health/liveness — is the process alive?
/actuator/health/readiness — is the process ready to serve traffic?

When graceful shutdown begins, Spring automatically sets the readiness state to REFUSING_TRAFFIC — the readiness probe returns 503. Kubernetes stops routing to the pod before the drain period ends.

Kubernetes pod spec

spec:
  containers:
    - name: trading-service
      livenessProbe:
        httpGet:
          path: /actuator/health/liveness
          port: 8080
        initialDelaySeconds: 30
        periodSeconds: 10
        failureThreshold: 3

      readinessProbe:
        httpGet:
          path: /actuator/health/readiness
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 5
        failureThreshold: 3

      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 5"]

The preStop sleep: why it matters

There is a race condition in Kubernetes pod termination. When a pod is marked for deletion:

The pod’s endpoint is removed from the service’s endpoint list
SIGTERM is sent to the container

These happen in parallel, not in sequence. The load balancer’s iptables rules are updated asynchronously — there is a window of a few seconds where SIGTERM has been sent but traffic is still being routed to the pod.

The preStop: sleep 5 creates a buffer: the pod waits 5 seconds before Spring begins its graceful drain. By then, the iptables update has propagated and no new traffic is arriving.

Without this sleep, you get a brief window where the pod is draining but still receives new connections — those connections may be dropped.

Total termination time

preStop sleep:      5s
Graceful drain:     up to 30s
Container cleanup:  seconds

Set terminationGracePeriodSeconds in the pod spec to comfortably exceed the sum:

spec:
  terminationGracePeriodSeconds: 60   # preStop + drain + margin

If terminationGracePeriodSeconds expires, Kubernetes sends SIGKILL — hard kill, in-flight requests dropped. Set it generously.

Kafka consumer shutdown

Spring Kafka’s ConcurrentMessageListenerContainer needs additional configuration to drain cleanly:

spring:
  kafka:
    listener:
      shutdown-timeout: 25000   # wait 25s for in-flight messages to complete

The consumer stops polling and waits for the current batch to finish before closing. Set this below timeout-per-shutdown-phase so Kafka drains before Spring context close.

Scheduled task shutdown

@Scheduled tasks run on the scheduler thread pool. If a task is running when SIGTERM arrives, it will be interrupted. For tasks that must complete:

@Configuration
public class SchedulerConfig {

    @Bean(destroyMethod = "shutdown")
    public Executor taskScheduler() {
        ThreadPoolTaskScheduler scheduler = new ThreadPoolTaskScheduler();
        scheduler.setPoolSize(2);
        scheduler.setAwaitTerminationSeconds(25);   // wait for tasks to finish
        scheduler.setWaitForTasksToCompleteOnShutdown(true);
        return scheduler;
    }
}

Verifying graceful shutdown

Add a slow endpoint to test with:

@GetMapping("/slow")
public String slow() throws InterruptedException {
    Thread.sleep(10_000);   // simulate slow request
    return "done";
}

Start a request to /slow, then SIGTERM the process:

kill -SIGTERM $(pgrep -f "trading-service")

With graceful shutdown configured, the request completes after 10 seconds and the process exits cleanly. Without it, the process exits immediately and the request fails.

Load test during deploy

The real test is a rolling deploy under load. Use k6 or Gatling to run continuous requests during the deploy:

// k6 test script
export default function () {
    let response = http.get('http://trading-service/orders');
    check(response, { 'status is 200': (r) => r.status === 200 });
}

A successful zero-downtime deploy shows 0 non-200 responses in the k6 output, even as pods are replaced.

If you’re deploying Spring Boot services to Kubernetes and want a review of your shutdown and probe configuration, get in touch.