Stream Collectors Deep Dive — groupingBy, partitioningBy, and Custom Collectors

Collectors.toList() is where most developers stop. The full set of collectors in java.util.stream.Collectors covers grouping, partitioning, counting, summarising, joining, and arbitrary reduction — operations that would otherwise require three intermediate variables and a for loop. Knowing which collector to reach for makes stream pipelines dramatically shorter.

groupingBy

Group elements by a classifier function, producing a Map<K, List<V>>:

Map<String, List<Order>> byMarket = orders.stream()
    .collect(Collectors.groupingBy(Order::marketId));

The downstream collector (default toList()) can be replaced with any other collector:

// Count orders per market
Map<String, Long> countByMarket = orders.stream()
    .collect(Collectors.groupingBy(Order::marketId, Collectors.counting()));

// Sum stake per market
Map<String, Double> stakeByMarket = orders.stream()
    .collect(Collectors.groupingBy(Order::marketId,
        Collectors.summingDouble(Order::stake)));

// Group then sort each group
Map<String, List<Order>> byMarketSorted = orders.stream()
    .collect(Collectors.groupingBy(Order::marketId,
        Collectors.collectingAndThen(
            Collectors.toList(),
            list -> list.stream()
                .sorted(Comparator.comparingDouble(Order::price))
                .collect(Collectors.toList())
        )));

groupingBy with a specific Map implementation

The two-argument groupingBy uses a HashMap by default. For predictable iteration order, pass a TreeMap supplier:

Map<String, List<Order>> sorted = orders.stream()
    .collect(Collectors.groupingBy(
        Order::marketId,
        TreeMap::new,
        Collectors.toList()
    ));

Multi-level grouping

Group by two keys by nesting groupingBy:

Map<String, Map<String, List<Order>>> byMarketAndSide = orders.stream()
    .collect(Collectors.groupingBy(Order::marketId,
        Collectors.groupingBy(Order::side)));

For more than two levels, a composite key record is usually cleaner than nesting:

record MarketSideKey(String marketId, String side) {}

Map<MarketSideKey, List<Order>> grouped = orders.stream()
    .collect(Collectors.groupingBy(o ->
        new MarketSideKey(o.marketId(), o.side())));

partitioningBy

Split a stream into two groups — elements that match a predicate and those that don’t — producing a Map<Boolean, List<T>>:

Map<Boolean, List<Order>> partition = orders.stream()
    .collect(Collectors.partitioningBy(o -> o.price() > 2.0));

List<Order> highPrice = partition.get(true);
List<Order> lowPrice  = partition.get(false);

This is cleaner than filtering twice. Like groupingBy, it accepts a downstream collector:

Map<Boolean, Long> countByHighPrice = orders.stream()
    .collect(Collectors.partitioningBy(
        o -> o.price() > 2.0,
        Collectors.counting()
    ));

Summarising statistics

summarizingDouble (and Int/Long variants) returns count, sum, min, max, and average in one pass:

DoubleSummaryStatistics priceStats = orders.stream()
    .collect(Collectors.summarizingDouble(Order::price));

double avg   = priceStats.getAverage();
double max   = priceStats.getMax();
long   count = priceStats.getCount();

This avoids four separate stream operations for the same data.

joining

Concatenate strings from a stream:

String csv = orders.stream()
    .map(Order::marketId)
    .distinct()
    .collect(Collectors.joining(", "));

// With prefix and suffix:
String ids = orders.stream()
    .map(Order::id)
    .collect(Collectors.joining(", ", "[", "]"));
// → "[ORD-001, ORD-002, ORD-003]"

toUnmodifiableMap

Map<String, Order> byId = orders.stream()
    .collect(Collectors.toUnmodifiableMap(
        Order::id,
        o -> o,
        (existing, replacement) -> existing   // merge function for duplicate keys
    ));

The merge function is required when duplicate keys are possible. Without it, a duplicate key throws IllegalStateException. The merge function decides which value wins — here the first one.

collectingAndThen

Perform a final transformation on the result of another collector:

List<Order> topFive = orders.stream()
    .sorted(Comparator.comparingDouble(Order::stake).reversed())
    .collect(Collectors.collectingAndThen(
        Collectors.toList(),
        list -> list.subList(0, Math.min(5, list.size()))
    ));

Or wrap the result in an unmodifiable collection:

List<Order> immutable = orders.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toList(),
        Collections::unmodifiableList
    ));

Writing a custom collector

When none of the built-ins fit, implement Collector<T, A, R> directly:

public class RunningAverageCollector implements Collector<Double, double[], Double> {

    @Override
    public Supplier<double[]> supplier() {
        return () -> new double[]{0.0, 0};   // [sum, count]
    }

    @Override
    public BiConsumer<double[], Double> accumulator() {
        return (state, value) -> {
            state[0] += value;
            state[1]++;
        };
    }

    @Override
    public BinaryOperator<double[]> combiner() {
        return (a, b) -> new double[]{a[0] + b[0], a[1] + b[1]};
    }

    @Override
    public Function<double[], Double> finisher() {
        return state -> state[1] == 0 ? 0.0 : state[0] / state[1];
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Set.of(Characteristics.CONCURRENT);
    }
}

// Usage:
double avg = prices.stream().collect(new RunningAverageCollector());

The combiner enables parallel streams — it merges two partial accumulators. Characteristics.CONCURRENT tells the framework the accumulator is thread-safe and can be shared across threads.

When not to use collectors

Collectors are most valuable when producing a single aggregate result from a stream. For side effects (writing to a database, publishing events), use forEach. For complex stateful transformations where each element depends on previous ones, a for loop with explicit state is clearer and safer than forcing it into a stream pipeline.

If you’re working on a Java codebase and want a review of stream usage and performance characteristics, get in touch.