Hire Me
← All Writing AWS

Lambda SnapStart — Java Cold Start Solved?

What Lambda SnapStart actually does, how to configure it for Java, what changes in your application code to make it work correctly, and whether it actually solves the Java cold start problem in practice.

Java cold starts on AWS Lambda have been a genuine pain for years. A cold start for a Spring Boot Lambda could easily take 8–15 seconds — long enough to breach API Gateway timeouts, trigger retries, and make Java feel like the wrong choice for serverless workloads. Lambda SnapStart, released in late 2022 and extended to additional runtimes since, addresses this directly by snapshotting the initialised JVM state and restoring it rather than starting from scratch on each cold start.

The headline numbers are impressive: sub-second cold starts for workloads that previously took 10+ seconds. But there are caveats, and whether it actually solves your problem depends on what your function does during initialisation.

What SnapStart Does

Normal Lambda cold start sequence:

  1. Provision execution environment (download code, set up runtime)
  2. Start JVM
  3. Load and initialise your code (static initialisers, Spring context, class loading)
  4. Handle the first invocation

With SnapStart:

  1. Steps 1–3 happen once when you publish a Lambda version
  2. AWS takes a snapshot of the fully-initialised execution environment (memory + disk state)
  3. On cold start, AWS restores from the snapshot — skipping steps 1–3 entirely
  4. Handle the first invocation

The restored environment is a copy of the snapshot — each concurrent execution gets its own restore, so there’s no shared state between invocations. The snapshot is taken during deployment, not at invocation time.

Enabling SnapStart

SnapStart requires publishing a Lambda version (it doesn’t work with $LATEST):

// CDK configuration
Function.Builder.create(this, "ClaimsProcessor")
    .functionName("claims-processor")
    .runtime(Runtime.JAVA_21)
    .handler("com.trinitylogic.claims.Handler::handleRequest")
    .code(Code.fromAsset("target/claims-processor.jar"))
    .memorySize(1024)
    .snapStart(SnapStartConf.ON_PUBLISHED_VERSIONS)  // enable SnapStart
    .build();

Via SAM template:

ClaimsProcessorFunction:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java21
    Handler: com.trinitylogic.claims.Handler::handleRequest
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live

The AutoPublishAlias is important — without an alias pointing to a published version, invocations hit $LATEST where SnapStart doesn’t apply.

The Initialisation Problem: Uniqueness After Restore

Here’s the catch. When AWS restores from a snapshot, your code resumes from the exact state it was in when the snapshot was taken. If your initialisation code generated unique values — random seeds, UUIDs, timestamps, nonces — those values are baked into the snapshot and will be identical across all restored instances.

This is a serious correctness problem for:

AWS provides a hook interface to handle this — CRaC (Coordinated Restore at Checkpoint):

@Component
public class SnapStartLifecycleHook implements Resource {

    private final SecureRandom secureRandom;
    private final ConnectionPool connectionPool;

    @PostConstruct
    public void registerWithCrac() {
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) {
        // Called before snapshot is taken
        // Close network connections — they won't be valid after restore
        connectionPool.closeAll();
        log.info("SnapStart: connections closed before checkpoint");
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) {
        // Called after restore from snapshot, before first invocation
        // Re-seed random, re-establish connections, refresh timestamps
        secureRandom.reseed();
        connectionPool.reconnectAll();
        log.info("SnapStart: connections re-established after restore");
    }
}

Add the CRaC dependency:

<dependency>
    <groupId>io.github.crac</groupId>
    <artifactId>org-crac</artifactId>
    <version>0.1.3</version>
</dependency>

Any resource that holds state that must be unique or current per-invocation needs to be handled in afterRestore. Forgetting this is how you get subtle security or correctness bugs that only appear under concurrent load.

What to Reinitialise After Restore

@Override
public void afterRestore(Context<? extends Resource> context) throws Exception {
    // 1. Re-seed all random number generators
    ThreadLocalRandom.current(); // forces re-seed
    // For SecureRandom, explicitly reseed:
    ((NativePRNG) secureRandom).engineSetSeed(
        SecureRandom.getSeed(32));

    // 2. Refresh any cached timestamps
    CacheConfig.refreshStartupTimestamp();

    // 3. Re-establish database connections
    dataSource.getConnection().close(); // test + re-establish pool

    // 4. Refresh AWS SDK credentials (they may have rotated since snapshot)
    credentialsProvider.resolveCredentials();

    // 5. Re-generate any instance-specific identifiers
    instanceId = UUID.randomUUID().toString();

    log.info("SnapStart restore complete, instance: {}", instanceId);
}

Real-World Cold Start Numbers

The improvement is substantial for init-heavy Java applications. Typical results for a Spring Boot Lambda:

Scenario Cold Start (no SnapStart) Cold Start (SnapStart) Warm
Simple handler 3–5s 200–400ms <10ms
Spring Boot (minimal) 6–10s 300–600ms <20ms
Spring Boot (full context) 10–20s 500ms–1s <30ms

The reduction scales with how much initialisation your function does. A bare Lambda handler saves less than a fully-wired Spring Boot context.

When SnapStart Doesn’t Help

Your function has almost no initialisation work. If your handler starts in under 1 second already, SnapStart’s overhead (snapshot storage, restore time) may not justify the complexity.

You have complex CRaC lifecycle requirements. If your function establishes many external connections during init, managing beforeCheckpoint and afterRestore correctly adds non-trivial code. For functions with simple initialisation, the effort may exceed the benefit.

Your function is already warm. SnapStart only affects cold starts. If your function receives enough traffic to stay warm, SnapStart adds deployment complexity for no runtime benefit.

You need $LATEST. SnapStart requires published versions. If your deployment process relies heavily on $LATEST for rapid iteration, this is a workflow change.

SnapStart vs Provisioned Concurrency

SnapStart and Provisioned Concurrency are complementary, not alternatives:

For APIs with strict latency SLAs, SnapStart + Provisioned Concurrency (at a lower concurrency level than without SnapStart, since SnapStart makes cold starts fast enough to tolerate) is the right combination. SnapStart alone is sufficient for most workloads where the occasional sub-second cold start is acceptable.

SnapStart doesn’t eliminate the Java cold start problem — but it reduces it from “often unacceptable” to “usually fine”. For the vast majority of Java Lambda workloads, that’s exactly the improvement needed.

If you’re building serverless Java architectures on AWS and want an engineer who understands the trade-offs in depth, get in touch.