Graceful Shutdown — Draining Requests Before the JVM Exits

Without graceful shutdown, stopping a Spring Boot application in production is a hard kill. Any request in-flight at the moment the JVM exits is terminated mid-processing — the client gets a connection reset, the work is half-done, and depending on what that work was, you may have partial writes, incomplete transactions, or silent data loss. Under Kubernetes rolling deployments, this happens every time a pod is replaced. Without graceful shutdown, every deployment risks corrupting in-flight requests.

Spring Boot has supported graceful shutdown since 2.3. Enabling it takes two lines of configuration. Getting it right in Kubernetes requires a few more deliberate choices.

What Happens Without It

When a SIGTERM arrives at a Spring Boot process with the default (immediate) shutdown mode, the JVM starts terminating. Tomcat stops immediately. Active threads are interrupted. If a request was midway through a database write, that write may not complete. If it was waiting on an external service call, the caller gets an abrupt disconnection.

In a Kubernetes rolling deployment: the old pod receives SIGTERM while the new pod is starting. Without graceful shutdown, the gap between “SIGTERM received” and “last request completes” is zero — meaning requests already accepted by the pod are abandoned.

Enabling Graceful Shutdown

Two properties in application.yml:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

server.shutdown: graceful switches the embedded server (Tomcat, Jetty, or Undertow — all three support it) to drain mode on shutdown rather than immediate termination. When SIGTERM arrives, the server stops accepting new connections but continues processing requests already accepted. Once all in-flight requests complete — or the timeout is reached — the JVM exits.

timeout-per-shutdown-phase is the maximum time Spring will wait for the active request count to reach zero before forcing the shutdown. 30 seconds is a reasonable default for most services; increase it if you have long-running jobs or batch operations that need more time to complete cleanly.

How the Drain Works

When Spring receives a shutdown signal, the SmartLifecycle beans are stopped in reverse dependency order. The web server is one of those beans. Tomcat (and the other embedded servers) switches to a state where:

The server socket stops accepting new connections.
Existing keep-alive connections are closed at the next request boundary.
Requests currently being processed continue until completion.
Once the in-flight count reaches zero, or the timeout elapses, the server stops.

After the server phase, Spring closes its ApplicationContext — triggering @PreDestroy methods, DisposableBean.destroy() calls, and SmartLifecycle stop callbacks for other beans (connection pools, Kafka consumers, scheduled executors, etc.).

@PreDestroy for Application-Level Cleanup

For resources that need explicit cleanup on shutdown — thread pools, open file handles, external connections — use @PreDestroy:

@Service
public class ReportGenerationService {

    private final ExecutorService reportExecutor = Executors.newFixedThreadPool(4);

    @PreDestroy
    public void shutdown() {
        reportExecutor.shutdown();
        try {
            if (!reportExecutor.awaitTermination(20, TimeUnit.SECONDS)) {
                reportExecutor.shutdownNow();
            }
        } catch (InterruptedException e) {
            reportExecutor.shutdownNow();
            Thread.currentThread().interrupt();
        }
    }
}

@PreDestroy is called during the context shutdown phase, after the web server has finished draining. This ordering matters: you don’t want to shut down a thread pool that active request handlers are still using. Graceful shutdown gives you that ordering for free — web server drains first, then @PreDestroy cleanup runs.

Kubernetes: The Critical Timing Issue

Enabling graceful shutdown in Spring Boot is necessary but not sufficient when running under Kubernetes. There is a timing problem you need to account for explicitly.

When Kubernetes terminates a pod, it does two things concurrently:

Sends SIGTERM to the pod.
Begins removing the pod’s endpoint from the Service’s Endpoints list (which removes it from load balancer rotation).

The problem is that step 2 is not instantaneous. There is a propagation delay — typically a few seconds — before kube-proxy and the ingress controller stop routing new traffic to the pod. During that window, the pod has received SIGTERM and begun shutting down, but new requests are still arriving from the load balancer.

Without a preStop hook, Spring’s graceful shutdown starts immediately on SIGTERM, meaning the server may stop accepting connections before the load balancer has finished draining it. You’ll see connection resets in the logs from requests that arrived in that gap.

The fix is a preStop sleep hook in the pod spec:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 5"]

This delays the SIGTERM to the application process by 5 seconds, giving kube-proxy time to remove the pod from rotation before Spring starts draining. It’s an intentional pause, not a hack.

The full probe and lifecycle configuration:

spec:
  containers:
    - name: my-service
      image: my-service:latest
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 5"]
      terminationGracePeriodSeconds: 60
      readinessProbe:
        httpGet:
          path: /actuator/health/readiness
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 5
      livenessProbe:
        httpGet:
          path: /actuator/health/liveness
          port: 8080
        initialDelaySeconds: 30
        periodSeconds: 10

terminationGracePeriodSeconds must be larger than the sum of your preStop sleep plus your Spring timeout-per-shutdown-phase. If preStop sleeps for 5 seconds and Spring waits up to 30 seconds to drain, terminationGracePeriodSeconds should be at least 40 — with some headroom. If Kubernetes hits terminationGracePeriodSeconds, it sends SIGKILL regardless of Spring’s state. That is the same as having no graceful shutdown at all.

The Correct Configuration Stack

Putting it together, a production-safe configuration looks like this in application.yml:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

management:
  endpoint:
    health:
      probes:
        enabled: true
  endpoints:
    web:
      exposure:
        include: health, info, metrics

And the Kubernetes side:

preStop sleep: 5s
timeout-per-shutdown-phase: 30s
terminationGracePeriodSeconds: 45s (5 + 30 + 10 headroom)

Testing Shutdown Behaviour

You can verify graceful shutdown locally by sending a SIGTERM to the running process and watching the logs:

# Find the PID
jps -l | grep your-application

# Send SIGTERM
kill -TERM <pid>

With graceful shutdown enabled, you should see log output like:

INFO --- Commencing graceful shutdown. Waiting for active requests to complete
INFO --- Graceful shutdown complete

If you see the JVM exit immediately without those lines, the configuration is not being picked up — check that your application.yml is on the classpath and that the property names are correct.

For integration testing, you can trigger shutdown programmatically via the Actuator shutdown endpoint (enable it cautiously — it’s disabled by default for good reason):

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
class GracefulShutdownTest {

    @Test
    void inFlightRequestCompletesBeforeShutdown() throws Exception {
        // Start a long-running request in a separate thread
        // Trigger shutdown
        // Assert the request completed with a valid response, not a connection reset
    }
}

Graceful shutdown is one of those configurations that is invisible when it works and catastrophic when it doesn’t. Enable it, set the timeout correctly, and size terminationGracePeriodSeconds to match — then you can deploy with confidence.

If you’re hardening a Spring Boot service for production Kubernetes deployments, get in touch.