Hire Me
← All Work Java Expert
Private Client

Java Performance Optimisation & Profiling

Systematic investigation and resolution of latency issues in a Java Spring Boot service — profiling under load to identify non-obvious bottlenecks, resolving them methodically, and bringing p99 response times within SLA. Delivered with full knowledge transfer so the team could sustain the gains.

Java JVM Profiling Spring Boot HikariCP JMeter async-profiler Grafana

Private Client – Java Performance Optimisation & Profiling

The client’s Java service was hitting latency spikes under load that they couldn’t reproduce in development and couldn’t explain in production. The p99 response time was breaching their SLA several times a day. The team had tried the obvious things — increasing heap, adjusting GC settings, scaling horizontally — without lasting improvement.

The root causes were not obvious. They never are in these engagements, which is why the investigation phase matters more than any fix.

Investigation

Load profile first — JMeter load tests were run against a staging environment with production-equivalent data volumes to reproduce the latency pattern reliably before any changes were made. A problem you can’t reproduce consistently is a problem you can’t verify solving.

async-profiler under load — CPU and allocation flame graphs captured during sustained load revealed the hottest call paths. Several were surprising: a supposedly “lightweight” validation step was allocating heavily due to regex compilation on every call; a serialisation library was performing reflection on a class hierarchy far deeper than anticipated.

Connection pool analysis — HikariCP metrics exposed via Micrometer showed the pool was routinely exhausting under peak load, causing threads to queue waiting for a connection. The pool was sized for average throughput, not burst capacity — a classic configuration error that doesn’t show up in light testing.

GC log analysis — G1GC pause events were correlating with the latency spikes. Heap sizing and region configuration adjustments eliminated the major pauses.

Fixes

Outcome

p99 response time dropped from 800ms (breaching SLA) to 95ms. The team received a full write-up of each finding, the fix, and the measurement that confirmed it — not just a diff, but an explanation. Two engineers attended a half-day knowledge transfer session on reading flame graphs and interpreting HikariCP metrics.

← All Work Hire Me →