Graceful Shutdown in Spring Boot — Handling In-Flight Requests on Kubernetes

Deploying a new version of a Spring Boot service on Kubernetes without dropping in-flight requests requires more than just setting server.shutdown=graceful. The full solution involves coordinating Kubernetes lifecycle hooks, readiness probes, and Spring’s shutdown sequence — and getting the order wrong means requests fail during rolling deployments.

This post covers the complete setup: what happens at pod termination, where the gaps are, and how to close them.

What happens when Kubernetes terminates a pod

When you roll out a new deployment or scale down, Kubernetes sends SIGTERM to the container process and starts a grace period. The default is 30 seconds, after which it sends SIGKILL.

The problem: Kubernetes continues routing traffic to the pod for a brief window after SIGTERM is sent, because kube-proxy and the Endpoints controller remove the pod from load balancer backends asynchronously. Requests that arrive during this window hit a pod that’s already shutting down.

The solution is a two-phase approach:

Stop accepting new connections before starting shutdown
Drain existing connections during the grace period

Spring Boot graceful shutdown

Enable it in application.yml:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 20s

With graceful shutdown, when Spring receives SIGTERM it:

Sets the Tomcat connector to reject new requests (returns 503)
Waits for active requests to complete, up to timeout-per-shutdown-phase
Proceeds to close application context, beans, and connections

Without this, Spring exits immediately on SIGTERM — any in-flight requests are abandoned mid-processing.

The readiness probe gap

Spring’s graceful shutdown handles in-flight requests, but it doesn’t stop Kubernetes from sending new requests right up until the pod is removed from the load balancer. The fix is to use the readiness probe.

Spring Boot Actuator exposes /actuator/health/readiness. When Spring begins shutdown it automatically marks itself not-ready, which causes the readiness probe to fail. Kubernetes stops routing new traffic once the probe fails.

Configure it:

management:
  endpoint:
    health:
      probes:
        enabled: true
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true

Kubernetes deployment probe config:

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3

When shutdown starts, the readiness probe fails within one probe cycle (5 seconds in this config). Kubernetes stops sending new traffic. Spring then drains existing requests during the grace period.

The preStop hook — closing the timing gap

There’s still a gap between when SIGTERM is sent and when Kubernetes has propagated the pod removal to all kube-proxy instances across the cluster. During this window, new requests can still arrive at the pod even though it’s shutting down.

The fix is a preStop lifecycle hook that introduces a deliberate sleep before Spring’s shutdown begins:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 5"]

Full deployment spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: market-data-service
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: market-data-service
          image: your-registry/market-data-service:latest
          ports:
            - containerPort: 8080
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 5"]
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            periodSeconds: 10
            failureThreshold: 3

The sleep 5 in preStop delays the SIGTERM signal by 5 seconds. During those 5 seconds:

The readiness probe has already failed (pod is not-ready)
kube-proxy has had time to propagate the removal across nodes
No new traffic reaches the pod

After the sleep, SIGTERM fires, Spring begins graceful shutdown, and drains the remaining in-flight requests.

terminationGracePeriodSeconds must be long enough to cover preStop duration + maximum expected request drain time + a safety margin. With sleep 5 and a 20-second drain, 60 seconds is comfortable.

Handling long-running operations

For services with long-running background tasks (Kafka consumers, scheduled jobs, async processors), register a custom shutdown hook:

@Component
public class KafkaConsumerShutdownHook implements SmartLifecycle {

    private final KafkaListenerEndpointRegistry registry;
    private volatile boolean running = false;

    @Override
    public void start() {
        running = true;
    }

    @Override
    public void stop(Runnable callback) {
        log.info("Stopping Kafka consumers gracefully");
        registry.getListenerContainers().forEach(container -> {
            container.stop();
            log.info("Stopped container: {}", container.getListenerId());
        });
        running = false;
        callback.run();
    }

    @Override
    public boolean isRunning() { return running; }

    @Override
    public int getPhase() { return Integer.MAX_VALUE - 10; }

    @Override
    public boolean isAutoStartup() { return true; }
}

The getPhase() value controls shutdown order — higher values stop first. Set Kafka consumers to stop before the HTTP server so no new messages are consumed once shutdown begins.

Testing graceful shutdown locally

Simulate a rolling deployment locally with Docker:

# Start the service
docker run -p 8080:8080 market-data-service

# Send a slow request that takes 10 seconds
curl -o /dev/null -s -w "%{http_code}" http://localhost:8080/api/slow-endpoint &

# Immediately send SIGTERM
docker kill --signal=SIGTERM <container-id>

# The slow request should complete (200), not be dropped (connection reset)

With server.shutdown=graceful and timeout-per-shutdown-phase: 20s, the 10-second request should complete normally. Without it, you’d get a connection reset.

What breaks without graceful shutdown

Rolling deployments: without preStop sleep, roughly one in every N requests during a deployment window hits a pod that’s already shutting down. On a busy service, this means visible 5xx errors in APM dashboards during every deploy.

Kafka consumers: without SmartLifecycle stopping consumers first, a consumer can receive a message, begin processing, and then have the database connection closed mid-transaction as the context shuts down.

Database connections: without graceful shutdown, HikariCP connection pool is closed while queries are still in flight. The queries fail with Connection is closed rather than completing or rolling back cleanly.

If you’re running Spring Boot services on Kubernetes and want to eliminate deployment errors entirely, let’s work together.