Distributed Tracing with OpenTelemetry in Spring Boot

A log entry tells you what happened in one service. A trace tells you what happened across all of them. When an API call touches four services and one is slow, logs give you four separate accounts of the same journey with no shared thread. A distributed trace gives you a single timeline — which service was called, in what order, how long each step took, where the latency came from, and which downstream call blew the budget.

OpenTelemetry is now the standard instrumentation layer for this. It ships with auto-instrumentation that handles most of the wiring automatically, and a clean API for adding application-specific spans when you need to go deeper.

How It Works

OpenTelemetry models distributed work as a trace — a tree of spans. Each span represents a unit of work: an incoming HTTP request, a database call, a Kafka consume, a downstream API call. Spans carry:

A trace ID shared across all services in the same request chain.
A span ID unique to the unit of work.
A parent span ID linking child to parent.
Start time, duration, status code, and key-value attributes.

When a service makes an outbound HTTP call, it injects the trace context (trace ID + span ID) into the request headers. The downstream service extracts the context from those headers and creates a child span under the same trace. This propagation is what links spans across process boundaries into a single trace.

Dependencies

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>io.opentelemetry.instrumentation</groupId>
            <artifactId>opentelemetry-instrumentation-bom</artifactId>
            <version>2.4.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>io.opentelemetry.instrumentation</groupId>
        <artifactId>opentelemetry-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

The starter pulls in auto-instrumentation for Spring MVC, RestTemplate, WebClient, JDBC, and Spring Kafka — no additional configuration required for the basic case.

Exporter Configuration

Traces need somewhere to go. The OTLP exporter sends spans to any OpenTelemetry-compatible backend — Jaeger, Zipkin, Grafana Tempo, Datadog, Honeycomb, AWS X-Ray (via the ADOT collector):

spring:
  application:
    name: order-service

management:
  tracing:
    sampling:
      probability: 1.0   # 1.0 = sample everything; use 0.1 in production

otel:
  exporter:
    otlp:
      endpoint: http://localhost:4317
  resource:
    attributes:
      service.name: order-service
      deployment.environment: production

service.name is the most important resource attribute — it’s what you search for in Jaeger or Tempo to find all traces for this service.

For production, set sampling.probability to 0.1 or lower. Sampling every request at volume generates enormous amounts of data and will saturate your collector.

Auto-Instrumentation in Practice

With the starter on the classpath, the following are instrumented automatically — no code changes needed:

Incoming HTTP requests via Spring MVC or WebFlux: a root span is created for each request, with HTTP method, route, status code, and duration.
Outbound HTTP calls via RestTemplate or WebClient: a child span is created and trace context headers are injected automatically.
JDBC queries: a span per query with the SQL statement (parameterised — no values logged).
Spring Kafka consumers and producers: trace context is propagated via Kafka message headers.

You get a working trace across a chain of REST calls with zero application code changes. For a three-service chain — gateway → order-service → inventory-service — you see a tree with a root span on the gateway and child spans on each downstream service, all linked by the same trace ID.

Adding Custom Spans

Auto-instrumentation handles I/O boundaries. For internal business logic that warrants observation — a complex calculation, a decision point, a third-party SDK call that isn’t auto-instrumented — add spans manually:

@Service
public class RiskEvaluationService {

    private final Tracer tracer;

    public RiskEvaluationService(OpenTelemetry openTelemetry) {
        this.tracer = openTelemetry.getTracer("uk.co.trinitylogic.risk");
    }

    public RiskScore evaluate(Order order) {
        Span span = tracer.spanBuilder("risk.evaluate")
            .setAttribute("order.id",       order.getId())
            .setAttribute("order.market",   order.getMarketId())
            .setAttribute("order.size",     order.getSize())
            .startSpan();

        try (Scope scope = span.makeCurrent()) {
            RiskScore score = runEvaluation(order);
            span.setAttribute("risk.score",    score.value());
            span.setAttribute("risk.approved", score.approved());
            return score;
        } catch (Exception e) {
            span.recordException(e);
            span.setStatus(StatusCode.ERROR, e.getMessage());
            throw e;
        } finally {
            span.end();
        }
    }
}

span.makeCurrent() sets the span as active on the current thread, so any child spans created during runEvaluation() — including auto-instrumented JDBC calls — are automatically parented to this span. The try-with-resources closes the scope, restoring the previous active span when the block exits.

Attributes are key-value pairs on the span. Use them to record business-meaningful data — order ID, market, score — so you can filter and aggregate traces by business dimension in your backend.

Context Propagation Across Async Boundaries

Auto-instrumentation handles context propagation for RestTemplate and WebClient automatically. For manual async code — CompletableFuture, thread pools, @Async — you must propagate the context explicitly, because ThreadLocal (where span context lives) does not cross thread boundaries:

@Service
public class AsyncSettlementService {

    private final Tracer tracer;

    public CompletableFuture<Void> settleAsync(SettlementRequest request) {
        Context currentContext = Context.current();

        return CompletableFuture.runAsync(() -> {
            try (Scope scope = currentContext.makeCurrent()) {
                Span span = tracer.spanBuilder("settlement.process")
                    .setParent(currentContext)
                    .startSpan();
                try (Scope childScope = span.makeCurrent()) {
                    processSettlement(request);
                } finally {
                    span.end();
                }
            }
        }, executor);
    }
}

Capture Context.current() on the calling thread before submitting to the executor. Inside the async lambda, restore it with makeCurrent(). Without this, the child span has no parent and appears as a disconnected root span in the trace.

Kafka Trace Propagation

The Spring Kafka auto-instrumentation propagates trace context via message headers. When a producer sends a message, the current span context is injected as headers. When a consumer receives the message, the context is extracted and used as the parent for the consumer span.

This works out-of-the-box for @KafkaListener methods. For manual consumer code, extract the context explicitly:

public void processRecord(ConsumerRecord<String, OrderPlaced> record) {
    Context extractedContext = GlobalOpenTelemetry.getPropagators()
        .getTextMapPropagator()
        .extract(Context.current(), record.headers(), new KafkaHeadersGetter());

    Span span = tracer.spanBuilder("orders.process")
        .setParent(extractedContext)
        .setSpanKind(SpanKind.CONSUMER)
        .setAttribute("messaging.system",      "kafka")
        .setAttribute("messaging.destination", record.topic())
        .startSpan();

    try (Scope scope = span.makeCurrent()) {
        handleOrder(record.value());
    } finally {
        span.end();
    }
}

KafkaHeadersGetter is a simple adapter that reads header bytes as strings — implement TextMapGetter<Headers> with get() returning new String(header.value(), StandardCharsets.UTF_8).

Running Jaeger Locally

For local development, Jaeger is the simplest trace backend:

# docker-compose.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:1.57
    ports:
      - "16686:16686"   # Jaeger UI
      - "4317:4317"     # OTLP gRPC receiver

Start with docker compose up -d jaeger, run your services, and open http://localhost:16686. Select your service name from the dropdown, hit “Find Traces”, and you’ll see the waterfall view with all spans.

ProTips

Name spans after the operation, not the class. risk.evaluate is a better span name than RiskEvaluationService.evaluate. Span names are used for aggregation in trace backends — operation-level names aggregate usefully; class names don’t.
Keep attribute cardinality low. Don’t use attributes with high-cardinality values (UUIDs, timestamps, per-request counts) as dimensions for aggregation — they explode your trace backend’s index. Use them for filtering on individual traces, not for group-by.
Don’t log inside spans — add attributes. If you find yourself logging a value that would be useful for debugging a slow trace, add it as a span attribute instead. It lands in the trace and is searchable without flooding your log aggregator.
Record exceptions on the span. span.recordException(e) attaches the stack trace to the span. In Jaeger and Tempo, this surfaces in the span detail view — much easier to find than grepping logs by trace ID.
Set the service name via otel.resource.attributes, not just spring.application.name. Some exporters and collectors read the OTEL resource attributes directly; spring.application.name is a Spring context property that the OTEL starter reads and converts, but the explicit attribute is more reliable across different collector configurations.

If you’re building microservices and want to get observability right from the start rather than debugging production outages blind, get in touch.