Profiling Production JVMs with JFR and JMC

Most performance problems I’ve investigated in production weren’t found by looking at application metrics or logs. They were found by profiling the JVM directly — watching where CPU time actually goes, how memory is allocated, where threads are blocked, and what the garbage collector is doing. Java Flight Recorder (JFR) and Java Mission Control (JMC) are the tools for that job, and they’re production-safe in a way that traditional profilers are not.

At ESG Global, a structural performance issue was causing a 30% capacity shortfall in the BOL Engine. The fix was straightforward once we could see what the JVM was actually doing — but it was invisible without profiling. This post covers how to use JFR and JMC effectively.

What JFR and JMC Are

Java Flight Recorder is a low-overhead profiling and event collection framework built into the JVM. It records a configurable stream of events — method sampling, allocations, GC activity, I/O operations, thread states, lock contention, and more — into a binary .jfr file. The overhead in production mode is typically under 1%, making it safe to run continuously or on demand against live systems.

Java Mission Control is the desktop analysis tool for .jfr recordings. It provides flame graphs, allocation profiles, GC analysis, thread state timelines, and method-level CPU hotspot views. Both tools are open source and bundled with JDK 11+.

Starting a Recording

From the command line — start a timed recording on a running process:

# Find the PID
jps -l

# Start a 60-second recording
jcmd <pid> JFR.start duration=60s filename=/tmp/recording.jfr settings=profile

# Or dump an ongoing recording
jcmd <pid> JFR.dump filename=/tmp/recording.jfr

At JVM startup — enable continuous recording via JVM flags:

java -XX:+FlightRecorder \
     -XX:StartFlightRecording=duration=0,filename=/tmp/recording.jfr,settings=profile,\
       dumponexit=true \
     -jar your-application.jar

duration=0 means record indefinitely. dumponexit=true writes the recording when the JVM shuts down — useful for capturing the state up to a failure.

Settings profiles — JFR ships with two built-in profiles:

default — minimal overhead (~0.1%), suitable for always-on recording
profile — more detailed event collection (~1% overhead), use for investigation sessions

For production investigation, start with profile for a 2–5 minute window, then revert to default or stop recording.

Programmatic Recording in Spring Boot

For Spring Boot applications, you can trigger recordings programmatically — useful for capturing recordings during load tests or automatically on performance alerts:

@Component
@Slf4j
public class JfrRecordingService {

    public Path startTimedRecording(String name, Duration duration) throws Exception {
        Path outputPath = Path.of("/tmp/jfr-" + name + "-" + Instant.now().toEpochMilli() + ".jfr");

        Recording recording = new Recording();
        recording.setName(name);
        recording.setDuration(duration);
        recording.setDestination(outputPath);
        recording.enable("jdk.CPUSample").withPeriod(Duration.ofMillis(20));
        recording.enable("jdk.ObjectAllocationInNewTLAB");
        recording.enable("jdk.GarbageCollection");
        recording.enable("jdk.ThreadSleep");
        recording.enable("jdk.JavaMonitorWait");
        recording.enable("jdk.SocketRead").withThreshold(Duration.ofMillis(10));
        recording.enable("jdk.FileRead").withThreshold(Duration.ofMillis(10));

        recording.start();
        log.info("JFR recording '{}' started, will save to {}", name, outputPath);
        return outputPath;
    }
}

This gives you fine-grained control over which events are captured, at what frequency, and with what thresholds — so you’re not collecting data you don’t need.

Interpreting the Flame Graph

The flame graph in JMC is the first place to look for CPU hotspots. Each horizontal block represents a method; width represents the proportion of CPU samples where that method was on the stack. Methods at the top of tall stacks with wide blocks are your hotspots.

What to look for:

Wide blocks high in the stack — methods consuming significant CPU. Drill down to understand whether the work is necessary (legitimate business logic) or wasteful (redundant computation, inefficient algorithms, unnecessary serialisation).

Unexpectedly wide framework blocks — if Spring’s proxy infrastructure, Jackson serialisation, or Hibernate ORM shows up as a significant percentage of CPU, it often indicates too many small operations (N+1 queries, excessive serialisation, unnecessary bean lookups).

Flat-topped stacks — methods that appear frequently at the top with shallow stacks beneath them suggest CPU-bound hotspots — computationally intensive code that isn’t calling further into the stack. These are candidates for algorithmic optimisation.

In the ESG Global case, a wide, flat block appeared in a data transformation loop that was performing string concatenation via + inside a tight loop over thousands of records. Switching to StringBuilder eliminated the allocation pressure and resolved the throughput issue. Completely invisible without the flame graph.

Memory Allocation Analysis

The Memory tab in JMC shows allocation profiles — which code paths are allocating the most heap. This is separate from the flame graph (which shows CPU time) and is often where the real problems hide in throughput-constrained services.

Key views:

Allocation by class — which object types are being allocated most frequently
Allocation by call stack — which code paths are generating the most allocations

High allocation rates that aren’t reflected in high GC pause times indicate either short-lived objects (allocated and collected quickly, which still consumes CPU) or allocations that are accumulating toward a future GC event.

What to look for:

byte[] or char[] allocations — often from string operations, serialisation, or logging
Boxing allocations (Integer, Long, Double) — often from using Map<String, Integer> where Map<String, int[]> or a primitive map would suffice
Wrapper object proliferation — DTOs or event objects being created and discarded on every message in a high-throughput path

// High allocation: boxes long on every call
Map<String, Long> counters = new HashMap<>();
counters.merge(key, 1L, Long::sum);

// Lower allocation for hot paths: use a mutable holder
Map<String, long[]> counters = new HashMap<>();
counters.computeIfAbsent(key, k -> new long[1])[0]++;

This kind of micro-optimisation only makes sense in genuinely hot paths — confirm it’s hot with the allocation profile before changing it.

GC Analysis

The GC tab shows pause times, collection frequency, and heap usage over time. What to diagnose:

Frequent minor GCs — indicates high allocation rate in the young generation. Look at the allocation profile to identify the source. Sometimes acceptable; sometimes a sign of unnecessary object creation.

Long major/full GCs — pause the application for tens to hundreds of milliseconds. In latency-sensitive systems (Betfair trading, real-time APIs), a 200ms GC pause is a critical incident. Investigate heap sizing and object tenure rates.

Heap growth over time — if heap usage grows steadily without returning to baseline after GC, you have a memory leak. JFR’s object allocation events combined with heap dump analysis (via jmap or jcmd) pinpoints the leaking type.

For latency-sensitive applications, consider ZGC or Shenandoah as the GC algorithm — both offer sub-millisecond pause times at the cost of slightly higher CPU overhead:

java -XX:+UseZGC -Xmx4g -jar your-application.jar

Thread and Lock Analysis

The Threads tab shows thread states over time — RUNNABLE, BLOCKED, WAITING, TIMED_WAITING. A thread that spends most of its time BLOCKED is contending on a lock held by another thread.

The Lock Instances view shows which monitors have the most contention — which locks are being waited for, how long, and by which threads. This is where synchronized bottlenecks become visible.

Common patterns:

Connection pool contention — many threads blocked waiting for a database connection. Fix: increase pool size or reduce connection hold time.
Shared cache contention — a synchronized cache being accessed from many threads. Fix: use ConcurrentHashMap with computeIfAbsent.
Log appender contention — synchronous logging frameworks (early Log4j, default Logback config) serialize all log writes. Fix: use async appenders.

Continuous Profiling in Production

For always-on visibility, enable JFR with the default profile and rotate recordings:

java -XX:StartFlightRecording=name=continuous,\
  settings=default,\
  maxage=1h,\
  maxsize=500mb,\
  filename=/var/log/jfr/recording.jfr \
  -jar your-application.jar

maxage and maxsize create a circular buffer — old data is overwritten as new data arrives. When an incident occurs, dump the current recording immediately:

jcmd <pid> JFR.dump filename=/var/log/jfr/incident-$(date +%s).jfr

You now have a recording that includes the period leading up to the incident — invaluable for diagnosing problems that are hard to reproduce.

JFR and JMC take about an hour to learn and repay that investment many times over the first time you use them to find a performance problem that every other tool missed. If you’re running Java services in production and not profiling with JFR, you’re guessing at performance.

If you’re diagnosing JVM performance issues in production Java systems and need an experienced engineer, get in touch.