Advanced Stream Collectors You Should Actually Be Using

Most Java developers know toList(), toSet(), and joining(). They use groupingBy occasionally and reach for a manual loop when the collector they need doesn’t immediately come to mind. The result is code that is longer, less expressive, and harder to compose than it needs to be.

The Collectors utility class has 44 factory methods. Most are genuinely useful. This post covers the ones that produce the biggest improvement in real-world code.

groupingBy with a downstream collector

groupingBy(Function) partitions a stream into a Map<K, List<V>>. The downstream collector is what most people miss — it transforms the grouped values rather than collecting them into a list.

Group orders by status and count them:

Map<OrderStatus, Long> countByStatus = orders.stream()
        .collect(groupingBy(Order::status, counting()));

Group orders by status, extract just the IDs:

Map<OrderStatus, List<String>> idsByStatus = orders.stream()
        .collect(groupingBy(
                Order::status,
                mapping(Order::id, toList())
        ));

Group orders by status, summarise the total value:

Map<OrderStatus, DoubleSummaryStatistics> valueStatsByStatus = orders.stream()
        .collect(groupingBy(
                Order::status,
                summarizingDouble(order -> order.value().doubleValue())
        ));

The downstream collector composes: you can pass mapping(...) into toUnmodifiableSet(), or filtering(...) into counting(). Any Collector works as the downstream.

partitioningBy

partitioningBy is a specialised two-bucket groupingBy. It always produces a Map<Boolean, List<T>> keyed on a predicate. Use it when the grouping is binary:

Map<Boolean, List<Order>> partitioned = orders.stream()
        .collect(partitioningBy(order -> order.value().compareTo(threshold) > 0));

List<Order> highValue = partitioned.get(true);
List<Order> lowValue  = partitioned.get(false);

It also accepts a downstream collector:

Map<Boolean, Long> countByHighValue = orders.stream()
        .collect(partitioningBy(
                order -> order.value().compareTo(threshold) > 0,
                counting()
        ));

Prefer partitioningBy over groupingBy with a boolean classifier — it’s more expressive and the true/false keys are self-documenting.

toMap — and how to avoid the duplicate key trap

toMap(keyMapper, valueMapper) builds a Map. The trap: it throws IllegalStateException on duplicate keys. The third argument is a merge function that resolves conflicts:

// Last writer wins
Map<String, Order> latestByCustomer = orders.stream()
        .collect(toMap(
                Order::customerId,
                Function.identity(),
                (existing, replacement) -> replacement
        ));

// Accumulate into a BigDecimal sum
Map<String, BigDecimal> totalByCustomer = orders.stream()
        .collect(toMap(
                Order::customerId,
                Order::value,
                BigDecimal::add
        ));

The fourth argument is a map factory — useful when you need a LinkedHashMap for insertion-order preservation or a TreeMap for sorted keys:

Map<String, Order> sortedByCustomer = orders.stream()
        .collect(toMap(
                Order::customerId,
                Function.identity(),
                (a, b) -> a,
                TreeMap::new
        ));

teeing — split the stream in two

teeing was added in Java 12. It applies two downstream collectors to the same stream, then combines their results with a merger function. The stream is traversed once.

The classic use: compute min and max in one pass:

record MinMax(Order min, Order max) {}

MinMax result = orders.stream()
        .collect(teeing(
                minBy(Comparator.comparing(Order::value)),
                maxBy(Comparator.comparing(Order::value)),
                (min, max) -> new MinMax(min.orElseThrow(), max.orElseThrow())
        ));

Split into two lists based on different predicates in one pass:

record SplitOrders(List<Order> pending, List<Order> completed) {}

SplitOrders split = orders.stream()
        .collect(teeing(
                filtering(o -> o.status() == PENDING, toList()),
                filtering(o -> o.status() == COMPLETED, toList()),
                SplitOrders::new
        ));

teeing removes the need for two separate stream passes or intermediate collections when you need two independent aggregations from the same data.

filtering and mapping as downstream collectors

filtering and mapping work both as standalone collectors and as downstream collectors inside groupingBy. The difference matters:

As a standalone — filter before grouping, losing the empty-group keys:

// If no HIGH orders exist for a category, that category won't appear in the map
Map<String, List<Order>> grouped = orders.stream()
        .filter(o -> o.priority() == HIGH)
        .collect(groupingBy(Order::category));

As a downstream — filter after grouping, preserving empty groups as empty lists:

// Every category appears, but only HIGH-priority orders are in the lists
Map<String, List<Order>> grouped = orders.stream()
        .collect(groupingBy(
                Order::category,
                filtering(o -> o.priority() == HIGH, toList())
        ));

Choose based on whether you need to know that a category had no matching orders.

collectingAndThen

collectingAndThen applies a finishing function to the result of another collector. The most common use: wrap a toList() in Collections.unmodifiableList():

List<Order> immutable = orders.stream()
        .collect(collectingAndThen(toList(), Collections::unmodifiableList));

More usefully, transform the collected result into a different type:

// Collect to a map, then wrap in a domain object
OrderIndex index = orders.stream()
        .collect(collectingAndThen(
                toMap(Order::id, Function.identity()),
                OrderIndex::new
        ));

Writing a custom Collector

When no combination of built-in collectors fits, implement Collector<T, A, R> directly. The interface has five methods:

supplier() — creates the mutable accumulator
accumulator() — folds one element into the accumulator
combiner() — merges two accumulators (for parallel streams)
finisher() — converts the accumulator to the result type
characteristics() — hints to the stream pipeline (UNORDERED, CONCURRENT, IDENTITY_FINISH)

A collector that groups consecutive elements with the same key (useful for run-length encoding or change detection):

public class ConsecutiveGroupingCollector<T, K>
        implements Collector<T, List<List<T>>, List<List<T>>> {

    private final Function<T, K> classifier;

    public ConsecutiveGroupingCollector(Function<T, K> classifier) {
        this.classifier = classifier;
    }

    @Override
    public Supplier<List<List<T>>> supplier() {
        return ArrayList::new;
    }

    @Override
    public BiConsumer<List<List<T>>, T> accumulator() {
        return (groups, element) -> {
            if (groups.isEmpty()) {
                groups.add(new ArrayList<>(List.of(element)));
            } else {
                var lastGroup = groups.getLast();
                var lastKey = classifier.apply(lastGroup.getLast());
                if (lastKey.equals(classifier.apply(element))) {
                    lastGroup.add(element);
                } else {
                    groups.add(new ArrayList<>(List.of(element)));
                }
            }
        };
    }

    @Override
    public BinaryOperator<List<List<T>>> combiner() {
        return (left, right) -> {
            left.addAll(right);
            return left;
        };
    }

    @Override
    public Function<List<List<T>>, List<List<T>>> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Set.of(Characteristics.IDENTITY_FINISH);
    }
}

Usage:

var priceChanges = ticks.stream()
        .collect(new ConsecutiveGroupingCollector<>(Tick::direction));
// Returns [[UP, UP, UP], [DOWN], [UP, UP], ...]

Custom collectors are rarely necessary — the composition of built-in collectors covers most cases. But when you find yourself doing post-processing on a collected list that could have been done inside the terminal operation, a custom collector is the right tool.

ProTips

Avoid Collectors.toUnmodifiableList() boilerplate: In Java 16+, Stream.toList() returns an unmodifiable list directly. Use it over collect(toUnmodifiableList()).

groupingByConcurrent for parallel streams: If you’re collecting from a parallel stream and order doesn’t matter, groupingByConcurrent avoids the combine phase and can be significantly faster on large datasets.

Don’t chain collect calls: If you find yourself doing stream.collect(...).entrySet().stream().collect(...), you likely need a teeing or a nested downstream collector instead.

If you’re working through a codebase that mixes streams with manual loops and want to review the boundaries, get in touch.