Building a Backtesting Framework for Trading Strategies in Java

Every strategy I’ve ever seen look great on a spreadsheet has eventually met the backtesting framework. Some survive. Most don’t. The gap between “this pattern looks profitable in the data” and “this strategy produces real edge after commission, latency, and realistic fill assumptions” is where most amateur trading systems die.

I’ve run a Betfair trading framework in production for several years, and the backtesting infrastructure I’ve built around it has been at least as valuable as the strategies themselves. This is how to build a backtesting engine in Java that gives you results you can actually trust — and how to interpret those results honestly.

Loading and Replaying Streaming Data

Betfair’s Historical Data service provides gzipped stream files in the same format as the live Streaming API — a sequence of JSON MarketChange objects with full order book state. This is the gold standard for backtesting: you’re replaying the actual market state, not synthesised data.

public class StreamingDataReplayer {

    private final Path dataFile;
    private final ObjectMapper objectMapper;

    public StreamingDataReplayer(Path dataFile) {
        this.dataFile = dataFile;
        this.objectMapper = new ObjectMapper()
            .registerModule(new JavaTimeModule());
    }

    public void replay(MarketChangeHandler handler) throws IOException {
        try (BufferedReader reader = new BufferedReader(
                new InputStreamReader(
                    new GZIPInputStream(Files.newInputStream(dataFile))))) {

            String line;
            while ((line = reader.readLine()) != null) {
                if (line.isBlank()) continue;
                MarketChange change = objectMapper.readValue(line, MarketChange.class);
                handler.onMarketChange(change);
            }
        }
    }
}

Each MarketChange carries a publish time (pt) in milliseconds. Use this, not wall-clock time, to drive your simulation. The replayer must be a pure event source — it drives the simulation engine with no knowledge of strategy logic.

public interface MarketChangeHandler {
    void onMarketChange(MarketChange change);
}

The Simulation Engine

The simulation engine is the core of the framework. It maintains a reconstructed order book state, processes strategy signals, simulates order fills, and tracks P&L:

public class SimulationEngine implements MarketChangeHandler {

    private final MarketStateTracker stateTracker;
    private final TradingStrategy strategy;
    private final SimulatedOrderBook orderBook;
    private final PnlTracker pnlTracker;

    public SimulationEngine(TradingStrategy strategy) {
        this.stateTracker = new MarketStateTracker();
        this.strategy     = strategy;
        this.orderBook    = new SimulatedOrderBook();
        this.pnlTracker   = new PnlTracker();
    }

    @Override
    public void onMarketChange(MarketChange change) {
        // 1. Update the reconstructed market state
        stateTracker.update(change);
        MarketSnapshot snapshot = stateTracker.snapshot();

        // 2. Check for pending order fills against new prices
        orderBook.attemptFills(snapshot);

        // 3. Ask the strategy for instructions
        List<OrderInstruction> instructions = strategy.evaluate(snapshot, orderBook.openPositions());

        // 4. Submit simulated orders
        for (OrderInstruction instruction : instructions) {
            orderBook.submit(instruction, snapshot.publishTime());
        }
    }

    public BacktestResult result() {
        return BacktestResult.from(pnlTracker, orderBook.settledBets());
    }
}

The key separation: the strategy only receives the market snapshot and open positions. It has no access to the order book internals or the P&L tracker — this prevents accidentally writing strategies that cheat by looking at their own unrealised P&L.

Realistic Order Fill Modelling

Naive backtests assume your order fills instantly at the price you requested. That is almost always wrong on Betfair, especially for lay orders where you are offering odds and waiting for a backer to match you.

public class SimulatedOrderBook {

    private final List<SimulatedOrder> pendingOrders = new ArrayList<>();
    private final List<SettledBet> settledBets = new ArrayList<>();

    public void submit(OrderInstruction instruction, long publishTimeMs) {
        pendingOrders.add(new SimulatedOrder(instruction, publishTimeMs));
    }

    public void attemptFills(MarketSnapshot snapshot) {
        Iterator<SimulatedOrder> it = pendingOrders.iterator();
        while (it.hasNext()) {
            SimulatedOrder order = it.next();
            if (canFill(order, snapshot)) {
                SettledBet bet = fill(order, snapshot);
                settledBets.add(bet);
                it.remove();
            } else if (isExpired(order, snapshot)) {
                order.cancel("EXPIRED");
                it.remove();
            }
        }
    }

    private boolean canFill(SimulatedOrder order, MarketSnapshot snapshot) {
        RunnerSnapshot runner = snapshot.runner(order.selectionId());
        if (runner == null) return false;

        if (order.side() == Side.BACK) {
            // Back order fills if available-to-back price is >= our requested price
            return runner.bestAvailableToBack()
                .map(av -> av.price() >= order.requestedPrice())
                .orElse(false);
        } else {
            // Lay order fills if available-to-lay price is <= our requested price
            return runner.bestAvailableToLay()
                .map(av -> av.price() <= order.requestedPrice())
                .orElse(false);
        }
    }
}

This fill model is pessimistic — your order only fills if the market moves to your price. It does not assume you fill into the existing available volume at your exact price, because in reality you’re competing with other orders at the same price and queue position matters.

Commission and P&L Tracking

public class PnlTracker {

    private static final double COMMISSION_RATE = 0.05; // 5% of net winnings

    private double grossProfit = 0;
    private double grossLoss   = 0;
    private int    winners     = 0;
    private int    losers      = 0;

    public void record(SettledBet bet) {
        double pnl = bet.pnl(); // positive = profit, negative = loss
        if (pnl > 0) {
            grossProfit += pnl;
            winners++;
        } else {
            grossLoss += Math.abs(pnl);
            losers++;
        }
    }

    public double netProfit() {
        double commission = grossProfit * COMMISSION_RATE;
        return grossProfit - grossLoss - commission;
    }

    public double winRate() {
        int total = winners + losers;
        return total == 0 ? 0 : (double) winners / total;
    }
}

Always apply commission after the gross result, not before. And use the actual commission rate for your Betfair account — at higher turnover volumes the rate drops under the Market Base Rate discount scheme, which can significantly change profitability at scale.

Reporting: Sharpe Ratio, Drawdown, Win Rate

A single P&L number tells you almost nothing. The distribution of returns matters:

public class BacktestResult {

    private final List<Double> dailyPnl; // P&L per racing day
    private final PnlTracker tracker;
    private final List<SettledBet> settledBets;

    public double sharpeRatio() {
        if (dailyPnl.size() < 2) return 0;
        double mean = dailyPnl.stream().mapToDouble(Double::doubleValue).average().orElse(0);
        double stdDev = stdDev(dailyPnl, mean);
        return stdDev == 0 ? 0 : (mean / stdDev) * Math.sqrt(252); // annualised
    }

    public double maxDrawdown() {
        double peak = Double.NEGATIVE_INFINITY;
        double maxDd = 0;
        double running = 0;

        for (double pnl : dailyPnl) {
            running += pnl;
            if (running > peak) peak = running;
            double drawdown = peak - running;
            if (drawdown > maxDd) maxDd = drawdown;
        }
        return maxDd;
    }

    private double stdDev(List<Double> values, double mean) {
        double variance = values.stream()
            .mapToDouble(v -> Math.pow(v - mean, 2))
            .average().orElse(0);
        return Math.sqrt(variance);
    }

    public void print() {
        System.out.printf("Net P&L:      £%.2f%n", tracker.netProfit());
        System.out.printf("Win Rate:     %.1f%%%n", tracker.winRate() * 100);
        System.out.printf("Sharpe Ratio: %.2f%n",  sharpeRatio());
        System.out.printf("Max Drawdown: £%.2f%n", maxDrawdown());
        System.out.printf("Total Bets:   %d%n",    settledBets.size());
    }
}

A Sharpe ratio above 1.0 is the minimum I’d consider taking to live with real money. Above 2.0 is compelling. Below 1.0, the risk-adjusted return isn’t worth it regardless of absolute profit.

Strategy Pluggability

The TradingStrategy interface is the seam that makes the framework reusable:

public interface TradingStrategy {
    List<OrderInstruction> evaluate(MarketSnapshot snapshot, List<OpenPosition> positions);
    String name();
}

Any strategy implementation can be dropped in and tested against the same data set. This makes A/B comparison straightforward — run both strategies against the same historical files and compare BacktestResult outputs.

The Limitations You Must Accept

Backtesting Betfair data is genuinely useful, but there are three limitations that cannot be engineered away.

Liquidity assumptions. The historical data shows the order book as it was when real participants were present. Your orders, in reality, are participating in that market — your back orders consume available-to-back volume, your lay orders consume available-to-lay volume. The backtest cannot model market impact, because you weren’t in the market when the data was recorded. Strategies that require placing large orders relative to available volume will face worse fills in production than the backtest shows.

Latency. Historical data has no network latency. Your live system will. A strategy that depends on reacting within 50ms of a price change will have a different fill rate live than in the backtest, because in the backtest you always “react instantly”.

Market evolution. Markets from two years ago were different — different participants, different typical behaviours. A strategy calibrated on old data may be fit to a market that no longer exists.

Treat the backtest as a filter for obviously bad strategies, not a predictor of live performance. A strategy that fails the backtest is definitely not worth running live. A strategy that passes is a candidate for live testing at minimum stake — not guaranteed profitability.

If you’re building automated trading infrastructure on Betfair and need an experienced Java engineer to help design or extend it, get in touch.