HashMap changes in Java 8
The way java.util.HashMap
entries are indexed and stored has changed in the Java 8 update. Hash elements use balanced trees instead of linked lists under certain circumstances now. All these is what this article is about.
The main idea is that when the number of items in a hash is larger than a certain value, the hash will change from using a linked list of elements or entries to a balanced tree, this will improve the worst case performance from O(n) to O(log n).
The fix has been implemented in the classes java.util.HashMap
, java.util.LinkedHashMap
and java.util.concurrent.ConcurrentHashMap
. No interfaces or methods specifications have been changed, only the behavior in the implementation of the concurrent hash map is different. So there is no need to change the applications using these classes. However, the iteration order when accessing hash maps entries may be different, this is explained in this article and should be reviewed in your programs.
Here is a list of classes implementing hash maps have not changed in relation to this fix:
java.util.concurrent.ConcurrentHashMap
already contains this implementation. Portions of the code already used in this class have been reused in the changes explained above.- The
java.util.HashTable
class (present since java 1) has not been changed with this new technique. The main reason for that is that some legacy code uses and expects the historical iteration order of thejava.util.Hashtable
class. - The class
java.util.WeakHashMap
does not contain this change neither because of the complexity would be to high and is not worth it. - The class
java.util.IdentityHashMap
does not need this improvement. This class generates hash codes by using theSystem.identityHashCode()
method and collisions are very rare or non existent.
1. Consequences
This change has some risks and consequences that have to be taken into consideration. We are going to explain here the fact that the iteration order when accessing hash map entries may be different when using java 8 because of the implementation explained above.
Several applications rely in the fact that hash map entries are retrieved in the same order that they were inserted in the map. This was never assured by the java.util.HashMap
but some programmers ignored it and built their programs assuming that the iteration order will be historical. Using java 7 entries will be retrieved in the same way that they were inserted (more or less). The following program shows the differences when using linked hash maps and normal hash maps in the iteration order:
public static void main( String[] args ) { /** * Using HashMap */ System.out.println( "Using plain hash map with balanced trees:" ); HashMap stringMap = new HashMap(); for( int i = 0; i < 100; ++i ) { stringMap.put( "index_" + i, String.valueOf( i ) ); } stringMap.values().forEach( System.out::println ); /** * Using LinkedHashMap */ System.out.println( "Using LinkedHashMap:" ); LinkedHashMap linkedHashMap = new LinkedHashMap(); for( int i = 0; i < 100; ++i ) { linkedHashMap.put( "index_" + i, String.valueOf( i ) ); } linkedHashMap.values().forEach( System.out::println ); }
The output would be:
Using plain hash map with balanced trees: 99 98 95 94 97 96 91 90 18 93 19 92 ... Using LinkedHashMap: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 98 99
We can appreciate that the order in the hash map implementation is not predictable. In case the order of iteration is dependent on the historical insertion order of the hash map, the class java.util.LinkedHashMap
should be used, since this class guarantees the iteration order.
If we would compile the program above using java 8 and java 7 compilers we can distinguish the differences in the order of iteration using the HashMap
between them, so programs that rely on that order will probably not work after updating to java 8. However this is an error in the assumption that the iteration order through hash maps should be somehow predictable.
2. Used concepts
It is useful to explain some concepts used in this article:
2.1. O(n) performance
The big-O notation is a measure of complexity for a given algorithm. “n” is the amount of data used in the algorithm. It indicates the amount of time the algorithm will take when n tends to infinitive. O(2n) or O(constant * n) do not exist, O(1) means constant time (performance is not related to the data that is processed) and O(n) means that the performance is directly related or proportional to the amount of data that is processed.
2.2. O(log n) performance
In this case, it means that the algorithm will perform better when the amount of data is larger. Performance is not directly proportional to the large of the processed data but in a log n relation. O(log n) performs better than O(n).
You can find several good articles, discussions and books about algorithm performance and measures, here are a couple of links:
- http://stackoverflow.com/questions/4776219/algorithm-performance-explanation-ex-on
- http://www.perlmonks.org/?node_id=227909
2.3. Balanced trees
A tree is balanced if the left and the right sub trees are balanced (recursion!) and their height differ by at most one. The main goal is to keep the depths of all nodes to be O(log n). The maintenance of the balanced tree has a penalty in the insertion of new elements but improve the indexing and accessing performance.
This article contains a lot of information about balanced trees: http://webdocs.cs.ualberta.ca/~holte/T26/balanced-trees.html.
2.4 Linked lists
From Wikipedia: In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a data and a reference (in other words, a link) to the next node in the sequence; more complex variants add additional links. This structure allows for efficient insertion or removal of elements from any position in the sequence.
Its worst case performance for accessing and indexing is O(n).
3. Summary
In this small article we explained one of the enhancements in the java.util.HashMap
class. The performance has been improved by using balanced trees instead of linked lists under specific circumstances. It has only been implemented in the classes java.util.HashMap
, java.util.LinkedHashMap
and java.util.concurrent.ConcurrentHashMap
.
We explained the basic concepts used in this implementation like balanced trees and linked lists and we saw one of the main consequences in the usage of hash maps: the iteration order may be affected.
4. Links
More information about this Java 8 enhancement, its causes and consequences and details related to Maps improvements and changes in Java8:
- http://openjdk.java.net/jeps/180
- http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/43bd5ee0205e
- https://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html
Download the Source Code
So in this example we show some improvements about HashMap implementation in Java 8.
You can download the full source code of this example here: hashMapJava8.zip