What Lambda SnapStart actually does, how to configure it for Java, what changes in your application code to make it work correctly, and whether it actually solves the Java cold start problem in practice.
Java cold starts on AWS Lambda have been a genuine pain for years. A cold start for a Spring Boot Lambda could easily take 8–15 seconds — long enough to breach API Gateway timeouts, trigger retries, and make Java feel like the wrong choice for serverless workloads. Lambda SnapStart, released in late 2022 and extended to additional runtimes since, addresses this directly by snapshotting the initialised JVM state and restoring it rather than starting from scratch on each cold start.
The headline numbers are impressive: sub-second cold starts for workloads that previously took 10+ seconds. But there are caveats, and whether it actually solves your problem depends on what your function does during initialisation.
Normal Lambda cold start sequence:
With SnapStart:
The restored environment is a copy of the snapshot — each concurrent execution gets its own restore, so there’s no shared state between invocations. The snapshot is taken during deployment, not at invocation time.
SnapStart requires publishing a Lambda version (it doesn’t work with $LATEST):
// CDK configuration
Function.Builder.create(this, "ClaimsProcessor")
.functionName("claims-processor")
.runtime(Runtime.JAVA_21)
.handler("com.trinitylogic.claims.Handler::handleRequest")
.code(Code.fromAsset("target/claims-processor.jar"))
.memorySize(1024)
.snapStart(SnapStartConf.ON_PUBLISHED_VERSIONS) // enable SnapStart
.build();
Via SAM template:
ClaimsProcessorFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: java21
Handler: com.trinitylogic.claims.Handler::handleRequest
SnapStart:
ApplyOn: PublishedVersions
AutoPublishAlias: live
The AutoPublishAlias is important — without an alias pointing to a published version, invocations hit $LATEST where SnapStart doesn’t apply.
Here’s the catch. When AWS restores from a snapshot, your code resumes from the exact state it was in when the snapshot was taken. If your initialisation code generated unique values — random seeds, UUIDs, timestamps, nonces — those values are baked into the snapshot and will be identical across all restored instances.
This is a serious correctness problem for:
SecureRandom seeded during init will produce the same sequence in every restored instanceInstant.now() captured during init will be stale after restoreAWS provides a hook interface to handle this — CRaC (Coordinated Restore at Checkpoint):
@Component
public class SnapStartLifecycleHook implements Resource {
private final SecureRandom secureRandom;
private final ConnectionPool connectionPool;
@PostConstruct
public void registerWithCrac() {
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) {
// Called before snapshot is taken
// Close network connections — they won't be valid after restore
connectionPool.closeAll();
log.info("SnapStart: connections closed before checkpoint");
}
@Override
public void afterRestore(Context<? extends Resource> context) {
// Called after restore from snapshot, before first invocation
// Re-seed random, re-establish connections, refresh timestamps
secureRandom.reseed();
connectionPool.reconnectAll();
log.info("SnapStart: connections re-established after restore");
}
}
Add the CRaC dependency:
<dependency>
<groupId>io.github.crac</groupId>
<artifactId>org-crac</artifactId>
<version>0.1.3</version>
</dependency>
Any resource that holds state that must be unique or current per-invocation needs to be handled in afterRestore. Forgetting this is how you get subtle security or correctness bugs that only appear under concurrent load.
@Override
public void afterRestore(Context<? extends Resource> context) throws Exception {
// 1. Re-seed all random number generators
ThreadLocalRandom.current(); // forces re-seed
// For SecureRandom, explicitly reseed:
((NativePRNG) secureRandom).engineSetSeed(
SecureRandom.getSeed(32));
// 2. Refresh any cached timestamps
CacheConfig.refreshStartupTimestamp();
// 3. Re-establish database connections
dataSource.getConnection().close(); // test + re-establish pool
// 4. Refresh AWS SDK credentials (they may have rotated since snapshot)
credentialsProvider.resolveCredentials();
// 5. Re-generate any instance-specific identifiers
instanceId = UUID.randomUUID().toString();
log.info("SnapStart restore complete, instance: {}", instanceId);
}
The improvement is substantial for init-heavy Java applications. Typical results for a Spring Boot Lambda:
| Scenario | Cold Start (no SnapStart) | Cold Start (SnapStart) | Warm |
|---|---|---|---|
| Simple handler | 3–5s | 200–400ms | <10ms |
| Spring Boot (minimal) | 6–10s | 300–600ms | <20ms |
| Spring Boot (full context) | 10–20s | 500ms–1s | <30ms |
The reduction scales with how much initialisation your function does. A bare Lambda handler saves less than a fully-wired Spring Boot context.
Your function has almost no initialisation work. If your handler starts in under 1 second already, SnapStart’s overhead (snapshot storage, restore time) may not justify the complexity.
You have complex CRaC lifecycle requirements. If your function establishes many external connections during init, managing beforeCheckpoint and afterRestore correctly adds non-trivial code. For functions with simple initialisation, the effort may exceed the benefit.
Your function is already warm. SnapStart only affects cold starts. If your function receives enough traffic to stay warm, SnapStart adds deployment complexity for no runtime benefit.
You need $LATEST. SnapStart requires published versions. If your deployment process relies heavily on $LATEST for rapid iteration, this is a workflow change.
SnapStart and Provisioned Concurrency are complementary, not alternatives:
For APIs with strict latency SLAs, SnapStart + Provisioned Concurrency (at a lower concurrency level than without SnapStart, since SnapStart makes cold starts fast enough to tolerate) is the right combination. SnapStart alone is sufficient for most workloads where the occasional sub-second cold start is acceptable.
SnapStart doesn’t eliminate the Java cold start problem — but it reduces it from “often unacceptable” to “usually fine”. For the vast majority of Java Lambda workloads, that’s exactly the improvement needed.
If you’re building serverless Java architectures on AWS and want an engineer who understands the trade-offs in depth, get in touch.