Spring Boot | Production Observability with Actuator, Micrometer, and Prometheus

Observability is the difference between knowing your system is behaving correctly and hoping it is. I’ve been on teams that discovered a critical processing failure from a customer complaint rather than from a dashboard — and I’ve been on teams where we caught and resolved a Kafka consumer lag spike before a single business transaction was affected. The difference was instrumentation. Spring Boot Actuator, Micrometer, and Prometheus give you the building blocks; the craft is knowing what to measure and how to make the dashboards actionable.

Spring Boot Actuator — Health Endpoints

Actuator exposes operational endpoints over HTTP. The most important for production is /actuator/health:

management:
  endpoints:
    web:
      exposure:
        include: health, info, metrics, prometheus, loggers
  endpoint:
    health:
      show-details: when-authorized  # don't expose internals publicly
      show-components: when-authorized
  health:
    livenessState:
      enabled: true
    readinessState:
      enabled: true

/actuator/health/liveness tells Kubernetes whether the process is alive (should it restart?). /actuator/health/readiness tells it whether the service is ready to receive traffic.

Custom health indicators let you express the health of dependencies:

@Component("betfairStream")
public class BetfairStreamHealthIndicator implements HealthIndicator {

    private final BetfairStreamClient streamClient;

    @Override
    public Health health() {
        if (!streamClient.isConnected()) {
            return Health.down()
                .withDetail("reason", "Streaming connection down")
                .withDetail("lastConnected", streamClient.getLastConnectedAt())
                .build();
        }

        Duration lastHeartbeat = streamClient.timeSinceLastHeartbeat();
        if (lastHeartbeat.toSeconds() > 30) {
            return Health.down()
                .withDetail("reason", "No heartbeat for " + lastHeartbeat.toSeconds() + "s")
                .build();
        }

        return Health.up()
            .withDetail("subscriptions", streamClient.getSubscriptionCount())
            .withDetail("lastHeartbeat", lastHeartbeat.toMillis() + "ms ago")
            .build();
    }
}

Micrometer — The Metrics API

Micrometer is Spring Boot’s metrics abstraction. You write metrics once against the Micrometer API; the backend (Prometheus, Datadog, CloudWatch, etc.) is a dependency you swap.

The four core types:

@Service
public class MarketMetrics {

    private final Counter betsPlaced;
    private final Counter betsFailed;
    private final Timer betPlacementLatency;
    private final Gauge openPositions;
    private final DistributionSummary stakeDistribution;

    public MarketMetrics(MeterRegistry registry, PositionTracker tracker) {
        this.betsPlaced = Counter.builder("trading.bets.placed")
            .description("Total bets successfully placed")
            .tag("exchange", "betfair")
            .register(registry);

        this.betsFailed = Counter.builder("trading.bets.failed")
            .description("Total bet placement failures")
            .tag("exchange", "betfair")
            .register(registry);

        this.betPlacementLatency = Timer.builder("trading.bet.placement.latency")
            .description("Time from order decision to API response")
            .publishPercentiles(0.5, 0.95, 0.99)
            .register(registry);

        this.openPositions = Gauge.builder("trading.positions.open", tracker,
                PositionTracker::getOpenPositionCount)
            .description("Number of currently open positions")
            .register(registry);

        this.stakeDistribution = DistributionSummary.builder("trading.stake.distribution")
            .description("Distribution of bet stakes in pence")
            .baseUnit("pence")
            .publishPercentiles(0.5, 0.95, 0.99)
            .register(registry);
    }
}

Use tags to slice metrics by meaningful dimensions:

// Record bet placement with outcome tags
Timer.Sample sample = Timer.start();
try {
    PlaceExecutionReport report = betfairClient.placeOrders(ssoid, request);
    betsPlaced.increment();
    stakeDistribution.record(request.getTotalStakePence());
    sample.stop(betPlacementLatency.tag("result", "success").register(registry));
} catch (Exception e) {
    betsFailed.increment();
    sample.stop(betPlacementLatency.tag("result", "failure").register(registry));
}

Exporting to Prometheus

Add the Prometheus dependency:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Expose the scrape endpoint:

management:
  endpoints:
    web:
      exposure:
        include: prometheus
  metrics:
    export:
      prometheus:
        enabled: true

Configure Prometheus to scrape:

# prometheus.yml
scrape_configs:
  - job_name: 'trading-service'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 15s
    static_configs:
      - targets: ['trading-service:8080']

Business-Level Metrics in Grafana

Technical metrics (JVM heap, HTTP request rate, database pool size) are valuable but not sufficient. The dashboards I find most useful in production combine technical metrics with business metrics:

// Kafka consumer lag — technical + business signal
Gauge.builder("kafka.consumer.lag", consumerLag, ConsumerLagProvider::getTotalLag)
    .tags("topic", "claim-events", "consumer-group", "claims-processor")
    .register(registry);

// Daily P&L — business metric
Gauge.builder("trading.daily.pnl.pence", pnlTracker, PnlTracker::getDailyPnlPence)
    .register(registry);

// Kill switch state — operational metric
Gauge.builder("trading.kill.switch.active",
        riskController, rc -> rc.isKillSwitchActive() ? 1.0 : 0.0)
    .register(registry);

In Grafana, set alerts on:

Distributed Tracing

For request flows that span multiple services, add distributed tracing:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>
management:
  tracing:
    sampling:
      probability: 0.1  # sample 10% of requests

Traces propagate automatically across HTTP calls and Kafka messages in Spring Boot 3.x. The traceId appears in structured log output, linking logs to the trace in Zipkin or Grafana Tempo.

ProTips

If you’re looking for a Java contractor who knows this space inside out, get in touch.