Java has Streams. Do we need third-party collections?
Java 8 added Streams. Are competing collections libraries like Eclipse Collections, Trove, Guava, etc. effectively deprecated now?
I saw this question on StackOverflow, specifically about Eclipse Collections. Here is my answer.
Is there any reason to still use Eclipse Collections? Yes! Streams are a huge step forward, and a welcome improvement to Java. However, Eclipse Collections includes many features not yet in the JDK.
- Eager evaluation
- Efficient Maps and Sets
- Multimaps and Bags
- Immutable Collections
- Primitive Collections
- Hashing Strategies
Streams always use lazy evaluation. We start a stream by calling
collection.stream(), stack one or more lazy operations, and finish the stream by calling a method like
With Eclipse Collections, you may use lazy evaluation. Instead of
stream() you’d call
Or you may use the eager api.
Lazy evaluation is great when your computation may short circuit, or when the result is reduced down to a primitive, like a boolean or count. Otherwise, there’s usually a performance penalty when using lazy evaluation, and the code winds up a lot longer.
Efficient Maps and Sets
HashMap is implemented as a table of
Entry objects, where each
Entry wraps a key-value pair. These entries waste memory, and the extra hop can waste time too.
Eclipse Collections includes
UnifiedMap, which uses 50% the memory on average.
HashSet is implemented by delegating to a
HashMap and just ignoring the Map’s values, which is even more wasteful. Eclipse Collections includes
UnifiedSet, which uses 25% the memory on average.
Trove, FastUtils, HPPC, and others have similar replacements for
These issues with
HashSet have been around since the collections library was added back in Java 1.2. They are unlikely ever to be fixed due to backwards compatibility concerns.
Multimaps and Bags
Multimaps are like Maps where each key maps to multiple values. Bags, aka multisets, are like Sets where each item is mapped to a count, or number of occurrences.
Guava popularized these types in Java. Keep in mind that Guava doesn’t implement replacements for built-in types. So Guava’s
HashMultimap is backed by a
HashMap<K, HashSet<V>> which wastes a lot of memory as we just learned.
Eclipse Collections includes iteration patterns like
groupBy() which return efficient Multimaps. In this example
people is a
groupBy() returns a
MutableSetMultimap (backed by a UnifiedMap of UnifiedSets).
Java 8 added grouping but no multimaps.
Map isn’t quite as convenient as a
Multimap, because we have to always remember to deal with
null. With Streams, we also have to remember to use the right collector to match the collection. If we had used the single-argument
groupingBy() we would have returned a map of lists.
The same points apply to counting…
…and to Bags
Immutable Collections and Primitive Collections
Most third party collections libraries add immutable collections or primitive collections. Eclipse Collections adds both. The JDK has unmodifiable wrappers, but no immutable collections. Java 8 added primitive streams, but there’s no way to finish the stream by collecting into a primitive collection.
In relational databases, a table may have a primary key and additional unique indices, allowing lookups in different ways. Hash tables always perform lookups using equals and hashcode, which is like only allowing a primary key. This is where
HashingStrategy comes in.
Eclipse Collections includes iteration patterns that work with HashingStrategies…
… and data structures that work with HashingStrategies.
As future versions of Java pull in more features, there will be less of a need for third-party collections libraries. For now, they provide compelling features beyond what’s included in the JDK.