How to handle Betfair Streaming API disconnections, market suspensions, and state recovery correctly in Java — connection lifecycle management, exponential backoff reconnection, market status tracking, and safe order management during suspension.
An automated Betfair trading system that works perfectly in normal conditions and falls apart when the connection drops or a market suspends is not a production system — it’s a liability. Both events happen regularly. Streaming connections drop due to network blips, Betfair infrastructure maintenance, and the occasional unexplained timeout. Markets suspend for false starts, jockey changes, abandoned races, and stewards’ enquiries. Your system needs to handle both gracefully, and the handling needs to be built in from the start, not bolted on after the first incident.
I’ve run headless trading systems on AWS through hundreds of race days. This is what the resilience layer looks like.
The Betfair Streaming API is a persistent TCP connection. The lifecycle has four states your system must explicitly model:
public enum ConnectionState {
DISCONNECTED,
CONNECTING,
CONNECTED,
AUTHENTICATED
}
CONNECTED means the TCP connection is established but authentication hasn’t completed. AUTHENTICATED means the connection is live and subscriptions are active. Only in AUTHENTICATED state should your system act on market data or execute orders.
@Component
@Slf4j
public class StreamingConnectionManager {
private final AtomicReference<ConnectionState> state =
new AtomicReference<>(ConnectionState.DISCONNECTED);
private final AtomicInteger reconnectAttempts = new AtomicInteger(0);
private volatile ScheduledFuture<?> reconnectTask;
public ConnectionState getState() { return state.get(); }
public boolean isReady() { return state.get() == ConnectionState.AUTHENTICATED; }
public void onConnected() {
state.set(ConnectionState.CONNECTED);
log.info("Streaming connection established");
}
public void onAuthenticated() {
state.set(ConnectionState.AUTHENTICATED);
reconnectAttempts.set(0);
log.info("Streaming connection authenticated and ready");
}
public void onDisconnected(String reason) {
state.set(ConnectionState.DISCONNECTED);
log.warn("Streaming disconnected: {}", reason);
scheduleReconnect();
}
}
Reconnecting immediately after a disconnect hammers Betfair’s infrastructure and earns you a rate-limit ban. Exponential backoff with jitter is the correct approach — it spreads reconnection attempts naturally and avoids thundering-herd problems when many connections drop simultaneously (e.g. after a Betfair maintenance window):
@Component
@RequiredArgsConstructor
@Slf4j
public class StreamingConnectionManager {
private static final long BASE_DELAY_MS = 1_000;
private static final long MAX_DELAY_MS = 60_000;
private static final int MAX_ATTEMPTS = 20;
private final StreamingClient streamingClient;
private final ScheduledExecutorService scheduler =
Executors.newSingleThreadScheduledExecutor(
r -> new Thread(r, "streaming-reconnect"));
private final AtomicInteger reconnectAttempts = new AtomicInteger(0);
public void scheduleReconnect() {
int attempt = reconnectAttempts.incrementAndGet();
if (attempt > MAX_ATTEMPTS) {
log.error("Max reconnection attempts reached — manual intervention required");
alertingService.sendCriticalAlert("Betfair streaming: max reconnects exceeded");
return;
}
long delayMs = computeBackoff(attempt);
log.info("Scheduling reconnect attempt {} in {}ms", attempt, delayMs);
scheduler.schedule(this::reconnect, delayMs, TimeUnit.MILLISECONDS);
}
private long computeBackoff(int attempt) {
long exponential = BASE_DELAY_MS * (1L << Math.min(attempt - 1, 10));
long capped = Math.min(exponential, MAX_DELAY_MS);
long jitter = (long) (capped * 0.2 * Math.random()); // up to 20% jitter
return capped + jitter;
}
private void reconnect() {
try {
log.info("Attempting streaming reconnect...");
state.set(ConnectionState.CONNECTING);
streamingClient.connect();
} catch (Exception e) {
log.error("Reconnect attempt failed: {}", e.getMessage());
onDisconnected("reconnect failed: " + e.getMessage());
}
}
}
The bit-shift 1L << attempt gives you 1s, 2s, 4s, 8s, 16s, 32s, 60s (capped) across the first seven attempts. The 20% jitter prevents all reconnecting clients from hitting the endpoint at exactly the same second.
A streaming connection can go silent without raising a TCP disconnect — the socket remains open but no data arrives. Betfair sends heartbeat messages periodically; if you stop receiving them, the connection is stale. Monitor for this explicitly:
@Component
@Slf4j
public class HeartbeatMonitor {
private static final Duration HEARTBEAT_TIMEOUT = Duration.ofSeconds(30);
private final AtomicReference<Instant> lastMessageReceived =
new AtomicReference<>(Instant.now());
private final StreamingConnectionManager connectionManager;
public void onMessageReceived() {
lastMessageReceived.set(Instant.now());
}
@Scheduled(fixedDelay = 10_000)
public void checkHeartbeat() {
if (connectionManager.getState() != ConnectionState.AUTHENTICATED) return;
Instant last = lastMessageReceived.get();
if (Duration.between(last, Instant.now()).compareTo(HEARTBEAT_TIMEOUT) > 0) {
log.warn("No message received for {}s — forcing reconnect",
HEARTBEAT_TIMEOUT.getSeconds());
connectionManager.onDisconnected("heartbeat timeout");
}
}
}
Call onMessageReceived() from your message handler on every inbound message — market data, connection status, or heartbeat. If 30 seconds pass with nothing, the connection is considered dead and reconnection begins.
Betfair suspends markets for several reasons: a horse is withdrawn, a false start is called, an objection is lodged in-running. During suspension, your orders cannot be matched — but they remain on the exchange, and you’re still exposed to whatever positions you held when suspension happened.
The Streaming API signals suspension via the market status in MarketChange:
public class MarketStatusTracker {
private final Map<String, MarketStatus> marketStatuses = new ConcurrentHashMap<>();
public void onMarketChange(MarketChange change) {
if (change.getMarketDefinition() == null) return;
String marketId = change.getId();
MarketDefinition def = change.getMarketDefinition();
MarketStatus previous = marketStatuses.get(marketId);
MarketStatus current = MarketStatus.from(def.getStatus());
marketStatuses.put(marketId, current);
if (previous != null && previous != current) {
onStatusTransition(marketId, previous, current);
}
}
private void onStatusTransition(String marketId,
MarketStatus from,
MarketStatus to) {
log.info("Market {} status: {} -> {}", marketId, from, to);
if (to == MarketStatus.SUSPENDED) {
onMarketSuspended(marketId);
} else if (from == MarketStatus.SUSPENDED && to == MarketStatus.OPEN) {
onMarketResumed(marketId);
} else if (to == MarketStatus.CLOSED) {
onMarketClosed(marketId);
}
}
}
When a market suspends, your strategy engine must stop generating new orders immediately. Attempting to place orders during suspension wastes API calls and can trigger rate limiting. More critically, when the market resumes, you need a clear picture of your open positions — you may be entering in-play having expected a pre-race exit that never executed.
@Component
@RequiredArgsConstructor
@Slf4j
public class StrategyGuard {
private final Set<String> suspendedMarkets = ConcurrentHashMap.newKeySet();
private final OrderManager orderManager;
public void onMarketSuspended(String marketId) {
suspendedMarkets.add(marketId);
log.warn("Market {} suspended — halting strategy for this market", marketId);
// Cancel any pending (unmatched) orders
orderManager.cancelAllUnmatchedOrders(marketId);
}
public void onMarketResumed(String marketId) {
suspendedMarkets.remove(marketId);
log.info("Market {} resumed — reconciling positions before re-enabling strategy", marketId);
// Reconcile open positions before resuming
orderManager.reconcilePositions(marketId);
}
public boolean isStrategyPermitted(String marketId) {
return !suspendedMarkets.contains(marketId);
}
}
Cancelling unmatched orders on suspension is a matter of preference and strategy design — some traders want to hold unmatched orders through suspension in case they fill immediately on resume. My default is to cancel, reconcile, and re-evaluate from a clean state. The cost of an unwanted matched order on resume is higher than the cost of re-placing an order.
The most important thing to understand about Streaming reconnection is that the initial snapshot is your only source of truth. When you reconnect and resubscribe, Betfair sends a full market snapshot before delta updates begin. Your in-memory state must be completely replaced by this snapshot — not merged with your previous state.
public class MarketStateManager {
private final Map<String, RunnerState> runnerStates = new ConcurrentHashMap<>();
private boolean snapshotReceived = false;
public void onMarketChange(MarketChange change) {
if (Boolean.TRUE.equals(change.getImg())) {
// img=true signals a full image (snapshot) — replace state entirely
handleSnapshot(change);
} else {
if (!snapshotReceived) {
log.warn("Received delta before snapshot — discarding");
return;
}
handleDelta(change);
}
}
private void handleSnapshot(MarketChange change) {
runnerStates.clear();
snapshotReceived = true;
if (change.getRc() != null) {
for (RunnerChange rc : change.getRc()) {
runnerStates.put(String.valueOf(rc.getId()), RunnerState.fromSnapshot(rc));
}
}
log.info("Market state snapshot applied — {} runners", runnerStates.size());
}
private void handleDelta(MarketChange change) {
if (change.getRc() == null) return;
for (RunnerChange rc : change.getRc()) {
runnerStates.computeIfPresent(
String.valueOf(rc.getId()),
(id, existing) -> existing.applyDelta(rc));
}
}
}
The img flag on MarketChange is the critical field. Always check it. Failing to reset state on reconnection is one of the most dangerous bugs you can have in a trading system — you’ll be making decisions based on stale prices and order book data from before the disconnect.
The complete resilience stack for a production streaming system:
None of these are optional for a system running unattended. Each one addresses a failure mode that will occur on a long enough timeline. Build them in before the first live session, not after the first incident.
If you’re building or hardening a Betfair trading system and need an engineer who’s been through these failure modes in production, get in touch.