A practical guide to the most useful Java Stream collectors beyond toList — covering groupingBy, partitioningBy, counting, summarising, joining, and how to write a custom collector.
Collectors.toList() is where most developers stop. The full set of collectors in java.util.stream.Collectors covers grouping, partitioning, counting, summarising, joining, and arbitrary reduction — operations that would otherwise require three intermediate variables and a for loop. Knowing which collector to reach for makes stream pipelines dramatically shorter.
Group elements by a classifier function, producing a Map<K, List<V>>:
Map<String, List<Order>> byMarket = orders.stream()
.collect(Collectors.groupingBy(Order::marketId));
The downstream collector (default toList()) can be replaced with any other collector:
// Count orders per market
Map<String, Long> countByMarket = orders.stream()
.collect(Collectors.groupingBy(Order::marketId, Collectors.counting()));
// Sum stake per market
Map<String, Double> stakeByMarket = orders.stream()
.collect(Collectors.groupingBy(Order::marketId,
Collectors.summingDouble(Order::stake)));
// Group then sort each group
Map<String, List<Order>> byMarketSorted = orders.stream()
.collect(Collectors.groupingBy(Order::marketId,
Collectors.collectingAndThen(
Collectors.toList(),
list -> list.stream()
.sorted(Comparator.comparingDouble(Order::price))
.collect(Collectors.toList())
)));
The two-argument groupingBy uses a HashMap by default. For predictable iteration order, pass a TreeMap supplier:
Map<String, List<Order>> sorted = orders.stream()
.collect(Collectors.groupingBy(
Order::marketId,
TreeMap::new,
Collectors.toList()
));
Group by two keys by nesting groupingBy:
Map<String, Map<String, List<Order>>> byMarketAndSide = orders.stream()
.collect(Collectors.groupingBy(Order::marketId,
Collectors.groupingBy(Order::side)));
For more than two levels, a composite key record is usually cleaner than nesting:
record MarketSideKey(String marketId, String side) {}
Map<MarketSideKey, List<Order>> grouped = orders.stream()
.collect(Collectors.groupingBy(o ->
new MarketSideKey(o.marketId(), o.side())));
Split a stream into two groups — elements that match a predicate and those that don’t — producing a Map<Boolean, List<T>>:
Map<Boolean, List<Order>> partition = orders.stream()
.collect(Collectors.partitioningBy(o -> o.price() > 2.0));
List<Order> highPrice = partition.get(true);
List<Order> lowPrice = partition.get(false);
This is cleaner than filtering twice. Like groupingBy, it accepts a downstream collector:
Map<Boolean, Long> countByHighPrice = orders.stream()
.collect(Collectors.partitioningBy(
o -> o.price() > 2.0,
Collectors.counting()
));
summarizingDouble (and Int/Long variants) returns count, sum, min, max, and average in one pass:
DoubleSummaryStatistics priceStats = orders.stream()
.collect(Collectors.summarizingDouble(Order::price));
double avg = priceStats.getAverage();
double max = priceStats.getMax();
long count = priceStats.getCount();
This avoids four separate stream operations for the same data.
Concatenate strings from a stream:
String csv = orders.stream()
.map(Order::marketId)
.distinct()
.collect(Collectors.joining(", "));
// With prefix and suffix:
String ids = orders.stream()
.map(Order::id)
.collect(Collectors.joining(", ", "[", "]"));
// → "[ORD-001, ORD-002, ORD-003]"
Map<String, Order> byId = orders.stream()
.collect(Collectors.toUnmodifiableMap(
Order::id,
o -> o,
(existing, replacement) -> existing // merge function for duplicate keys
));
The merge function is required when duplicate keys are possible. Without it, a duplicate key throws IllegalStateException. The merge function decides which value wins — here the first one.
Perform a final transformation on the result of another collector:
List<Order> topFive = orders.stream()
.sorted(Comparator.comparingDouble(Order::stake).reversed())
.collect(Collectors.collectingAndThen(
Collectors.toList(),
list -> list.subList(0, Math.min(5, list.size()))
));
Or wrap the result in an unmodifiable collection:
List<Order> immutable = orders.stream()
.collect(Collectors.collectingAndThen(
Collectors.toList(),
Collections::unmodifiableList
));
When none of the built-ins fit, implement Collector<T, A, R> directly:
public class RunningAverageCollector implements Collector<Double, double[], Double> {
@Override
public Supplier<double[]> supplier() {
return () -> new double[]{0.0, 0}; // [sum, count]
}
@Override
public BiConsumer<double[], Double> accumulator() {
return (state, value) -> {
state[0] += value;
state[1]++;
};
}
@Override
public BinaryOperator<double[]> combiner() {
return (a, b) -> new double[]{a[0] + b[0], a[1] + b[1]};
}
@Override
public Function<double[], Double> finisher() {
return state -> state[1] == 0 ? 0.0 : state[0] / state[1];
}
@Override
public Set<Characteristics> characteristics() {
return Set.of(Characteristics.CONCURRENT);
}
}
// Usage:
double avg = prices.stream().collect(new RunningAverageCollector());
The combiner enables parallel streams — it merges two partial accumulators. Characteristics.CONCURRENT tells the framework the accumulator is thread-safe and can be shared across threads.
Collectors are most valuable when producing a single aggregate result from a stream. For side effects (writing to a database, publishing events), use forEach. For complex stateful transformations where each element depends on previous ones, a for loop with explicit state is clearer and safer than forcing it into a stream pipeline.
If you’re working on a Java codebase and want a review of stream usage and performance characteristics, get in touch.