Java has Streams. Do we need third-party collections?
Java 8 added Streams. Are competing collections libraries like Eclipse Collections, Trove, Guava, etc. effectively deprecated now?
I saw this question on StackOverflow, specifically about Eclipse Collections. Here is my answer.
Is there any reason to still use Eclipse Collections? Yes! Streams are a huge step forward, and a welcome improvement to Java. However, Eclipse Collections includes many features not yet in the JDK.
- Eager evaluation
- Efficient Maps and Sets
- Multimaps and Bags
- Immutable Collections
- Primitive Collections
- Hashing Strategies
Eager Evaluation
Streams always use lazy evaluation. We start a stream by calling collection.stream()
, stack one or more lazy operations, and finish the stream by calling a method like collect()
.
With Eclipse Collections, you may use lazy evaluation. Instead of stream()
you’d call asLazy()
.
Or you may use the eager api.
Lazy evaluation is great when your computation may short circuit, or when the result is reduced down to a primitive, like a boolean or count. Otherwise, there’s usually a performance penalty when using lazy evaluation, and the code winds up a lot longer.
Efficient Maps and Sets
The JDK’s HashMap
is implemented as a table of Entry
objects, where each Entry
wraps a key-value pair. These entries waste memory, and the extra hop can waste time too.
Eclipse Collections includes UnifiedMap
, which uses 50% the memory on average.
The JDK’s HashSet
is implemented by delegating to a HashMap
and just ignoring the Map’s values, which is even more wasteful. Eclipse Collections includes UnifiedSet
, which uses 25% the memory on average.
Trove, FastUtils, HPPC, and others have similar replacements for HashMap
and HashSet
.
These issues with HashMap
and HashSet
have been around since the collections library was added back in Java 1.2. They are unlikely ever to be fixed due to backwards compatibility concerns.
Multimaps and Bags
Multimaps are like Maps where each key maps to multiple values. Bags, aka multisets, are like Sets where each item is mapped to a count, or number of occurrences.
Guava popularized these types in Java. Keep in mind that Guava doesn’t implement replacements for built-in types. So Guava’s HashMultimap
is backed by a HashMap<K, HashSet<V>>
which wastes a lot of memory as we just learned.
Eclipse Collections includes iteration patterns like groupBy()
which return efficient Multimaps. In this example people
is a MutableSet
so groupBy()
returns a MutableSetMultimap
(backed by a UnifiedMap of UnifiedSets).
Java 8 added grouping but no multimaps.
A Map
isn’t quite as convenient as a Multimap
, because we have to always remember to deal with null
. With Streams, we also have to remember to use the right collector to match the collection. If we had used the single-argument groupingBy()
we would have returned a map of lists.
The same points apply to counting…
…and to Bags
Immutable Collections and Primitive Collections
Most third party collections libraries add immutable collections or primitive collections. Eclipse Collections adds both. The JDK has unmodifiable wrappers, but no immutable collections. Java 8 added primitive streams, but there’s no way to finish the stream by collecting into a primitive collection.
Hashing Strategies
In relational databases, a table may have a primary key and additional unique indices, allowing lookups in different ways. Hash tables always perform lookups using equals and hashcode, which is like only allowing a primary key. This is where HashingStrategy
comes in.
Eclipse Collections includes iteration patterns that work with HashingStrategies…
… and data structures that work with HashingStrategies.
Conclusion
As future versions of Java pull in more features, there will be less of a need for third-party collections libraries. For now, they provide compelling features beyond what’s included in the JDK.
This article was originally posted at https://motlin.com/2018-05-15-third-party-collections/