Resilience4j | Circuit Breakers, Rate Limiters, and Bulkheads in Spring Boot

Distributed systems fail. The question isn’t whether a downstream service will become unavailable — it’s whether your system degrades gracefully or takes everything down with it. At Mosaic Smart Data, where our real-time financial data API depended on feeds from multiple third-party providers, a poorly isolated failure in one provider would cascade and degrade unrelated client queries. Resilience4j, combined with Spring Boot’s annotation-driven configuration, gave us the isolation and recovery behaviour we needed with a fraction of the code Hystrix required.

Adding Resilience4j to Spring Boot

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

The AOP dependency is required — Resilience4j’s annotations are implemented as aspects.

Circuit Breaker

The circuit breaker has three states: Closed (normal operation), Open (failing — calls short-circuit immediately), and Half-Open (testing recovery). When the failure rate in the sliding window exceeds the configured threshold, the circuit opens:

@Service
public class MarketDataClient {

    @CircuitBreaker(name = "market-data", fallbackMethod = "fallbackMarketData")
    public MarketData fetchMarket(String marketId) {
        return externalProvider.fetch(marketId); // may throw
    }

    private MarketData fallbackMarketData(String marketId, Exception ex) {
        log.warn("Circuit breaker open for market-data, returning cached data for {}", marketId);
        return cache.get(marketId).orElse(MarketData.empty());
    }
}

Configuration in application.yml:

resilience4j:
  circuitbreaker:
    instances:
      market-data:
        sliding-window-type: COUNT_BASED
        sliding-window-size: 20
        failure-rate-threshold: 50          # open when 50% of 20 calls fail
        wait-duration-in-open-state: 30s
        permitted-number-of-calls-in-half-open-state: 5
        slow-call-rate-threshold: 80        # also open on slow calls
        slow-call-duration-threshold: 2s
        register-health-indicator: true

The fallback method must have the same return type and the same parameters as the original method, plus an additional Exception parameter as the last argument. Spring’s AOP wiring handles the routing.

Rate Limiter

Rate limiters prevent your service from overwhelming a downstream dependency:

@RateLimiter(name = "betfair-api", fallbackMethod = "rateLimitedFallback")
public PlaceExecutionReport placeOrder(PlaceOrdersRequest request) {
    return betfairClient.placeOrders(request);
}

private PlaceExecutionReport rateLimitedFallback(
        PlaceOrdersRequest request, RequestNotPermitted ex) {
    throw new TradingException("API rate limit reached — order rejected", ex);
}
resilience4j:
  ratelimiter:
    instances:
      betfair-api:
        limit-for-period: 10          # 10 calls
        limit-refresh-period: 1s      # per second
        timeout-duration: 100ms       # wait up to 100ms for a permit

Unlike circuit breakers (which respond to failures), rate limiters enforce throughput limits regardless of success or failure.

Bulkhead — Isolating Thread Pools

A bulkhead isolates concurrent calls to a dependency, preventing resource starvation. The ThreadPoolBulkhead runs calls in a dedicated thread pool:

@Bulkhead(name = "price-data", type = Bulkhead.Type.THREADPOOL,
          fallbackMethod = "priceDataFallback")
@Async
public CompletableFuture<PriceData> fetchPrices(String marketId) {
    return CompletableFuture.completedFuture(priceProvider.fetch(marketId));
}

private CompletableFuture<PriceData> priceDataFallback(
        String marketId, BulkheadFullException ex) {
    return CompletableFuture.completedFuture(PriceData.empty());
}
resilience4j:
  thread-pool-bulkhead:
    instances:
      price-data:
        max-thread-pool-size: 10
        core-thread-pool-size: 5
        queue-capacity: 20

If the thread pool is full and the queue is at capacity, calls go to the fallback immediately rather than blocking the calling thread. This is the key isolation property — a slow or failing price data feed doesn’t exhaust your shared thread pool.

Retry

@Retry retries on failure with configurable backoff:

@Retry(name = "external-validation", fallbackMethod = "validationFallback")
@CircuitBreaker(name = "external-validation")
public ValidationResult validate(ClaimData claim) {
    return validationService.validate(claim);
}
resilience4j:
  retry:
    instances:
      external-validation:
        max-attempts: 3
        wait-duration: 500ms
        enable-exponential-backoff: true
        exponential-backoff-multiplier: 2
        retry-exceptions:
          - java.net.ConnectException
          - java.net.SocketTimeoutException
        ignore-exceptions:
          - com.example.ValidationException  # don't retry business validation failures

Combine @Retry with @CircuitBreaker — the retry sits inside the circuit breaker. Three failed retries count as one failure against the circuit breaker’s sliding window.

Monitoring via Actuator and Micrometer

Resilience4j integrates with Micrometer automatically when register-health-indicator: true is set:

GET /actuator/health
{
  "status": "UP",
  "components": {
    "circuitBreakers": {
      "status": "UP",
      "details": {
        "market-data": {
          "status": "UP",
          "state": "CLOSED",
          "failureRate": "5.0%",
          "slowCallRate": "0.0%"
        }
      }
    }
  }
}

Micrometer exposes circuit breaker state, failure rate, call duration, and buffered call counts as metrics. Wire them into Grafana dashboards to alert when circuits open.

Testing Circuit Breakers

Testing resilience patterns requires controlling failure injection:

@SpringBootTest
class MarketDataCircuitBreakerTest {

    @Autowired
    private CircuitBreakerRegistry registry;

    @MockBean
    private ExternalProvider externalProvider;

    @Test
    void shouldOpenCircuitAfterThresholdFailures() {
        when(externalProvider.fetch(any())).thenThrow(new RuntimeException("Provider down"));

        // Trigger 10 failures (50% of 20 = threshold)
        IntStream.range(0, 10).forEach(i -> {
            try { marketDataClient.fetchMarket("1.12345"); }
            catch (Exception ignored) {}
        });

        CircuitBreaker cb = registry.circuitBreaker("market-data");
        assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.OPEN);
    }
}

ProTips

If you’re looking for a Java contractor who knows this space inside out, get in touch.