HashMap

HashMap changes in Java 8

The way java.util.HashMap entries are indexed and stored has changed in the Java 8 update. Hash elements use balanced trees instead of linked lists under certain circumstances now. All these is what this article is about.

The main idea is that when the number of items in a hash is larger than a certain value, the hash will change from using a linked list of elements or entries to a balanced tree, this will improve the worst case performance from O(n) to O(log n).
 
 
 
 

 
The fix has been implemented in the classes java.util.HashMap, java.util.LinkedHashMap and java.util.concurrent.ConcurrentHashMap. No interfaces or methods specifications have been changed, only the behavior in the implementation of the concurrent hash map is different. So there is no need to change the applications using these classes. However, the iteration order when accessing hash maps entries may be different, this is explained in this article and should be reviewed in your programs.

Here is a list of classes implementing hash maps have not changed in relation to this fix:

  • java.util.concurrent.ConcurrentHashMap already contains this implementation. Portions of the code already used in this class have been reused in the changes explained above.
  • The java.util.HashTable class (present since java 1) has not been changed with this new technique. The main reason for that is that some legacy code uses and expects the historical iteration order of the java.util.Hashtable class.
  • The class java.util.WeakHashMap does not contain this change neither because of the complexity would be to high and is not worth it.
  • The class java.util.IdentityHashMap does not need this improvement. This class generates hash codes by using the System.identityHashCode() method and collisions are very rare or non existent.

1. Consequences

This change has some risks and consequences that have to be taken into consideration. We are going to explain here the fact that the iteration order when accessing hash map entries may be different when using java 8 because of the implementation explained above.

Several applications rely in the fact that hash map entries are retrieved in the same order that they were inserted in the map. This was never assured by the java.util.HashMap but some programmers ignored it and built their programs assuming that the iteration order will be historical. Using java 7 entries will be retrieved in the same way that they were inserted (more or less). The following program shows the differences when using linked hash maps and normal hash maps in the iteration order:

 public static void main( String[] args )
 {
     /**
     * Using HashMap
     */
     System.out.println( "Using plain hash map with balanced trees:" );
     
     HashMap stringMap = new HashMap();
     
     for( int i = 0; i < 100; ++i )
     {
          stringMap.put( "index_" + i, String.valueOf( i ) );
     }
     
     stringMap.values().forEach( System.out::println );
     
     /**
     * Using LinkedHashMap
     */
     System.out.println( "Using LinkedHashMap:" );
     
     LinkedHashMap linkedHashMap = new LinkedHashMap();
     
     for( int i = 0; i < 100; ++i )
     {
          linkedHashMap.put( "index_" + i, String.valueOf( i ) );
     }
     
     linkedHashMap.values().forEach( System.out::println );
 }

The output would be:

Using plain hash map with balanced trees:
99
98
95
94
97
96
91
90
18
93
19
92
...
Using LinkedHashMap:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...
98
99

We can appreciate that the order in the hash map implementation is not predictable. In case the order of iteration is dependent on the historical insertion order of the hash map, the class java.util.LinkedHashMap should be used, since this class guarantees the iteration order.

If we would compile the program above using java 8 and java 7 compilers we can distinguish the differences in the order of iteration using the HashMap between them, so programs that rely on that order will probably not work after updating to java 8. However this is an error in the assumption that the iteration order through hash maps should be somehow predictable.

2. Used concepts

It is useful to explain some concepts used in this article:

2.1. O(n) performance

The big-O notation is a measure of complexity for a given algorithm. “n” is the amount of data used in the algorithm. It indicates the amount of time the algorithm will take when n tends to infinitive. O(2n) or O(constant * n) do not exist, O(1) means constant time (performance is not related to the data that is processed) and O(n) means that the performance is directly related or proportional to the amount of data that is processed.

2.2. O(log n) performance

In this case, it means that the algorithm will perform better when the amount of data is larger. Performance is not directly proportional to the large of the processed data but in a log n relation. O(log n) performs better than O(n).

You can find several good articles, discussions and books about algorithm performance and measures, here are a couple of links:

2.3. Balanced trees

A tree is balanced if the left and the right sub trees are balanced (recursion!) and their height differ by at most one. The main goal is to keep the depths of all nodes to be O(log n). The maintenance of the balanced tree has a penalty in the insertion of new elements but improve the indexing and accessing performance.

This article contains a lot of information about balanced trees: http://webdocs.cs.ualberta.ca/~holte/T26/balanced-trees.html.

2.4 Linked lists

From Wikipedia: In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a data and a reference (in other words, a link) to the next node in the sequence; more complex variants add additional links. This structure allows for efficient insertion or removal of elements from any position in the sequence.

Its worst case performance for accessing and indexing is O(n).

3. Summary

In this small article we explained one of the enhancements in the java.util.HashMap class. The performance has been improved by using balanced trees instead of linked lists under specific circumstances. It has only been implemented in the classes java.util.HashMap, java.util.LinkedHashMap and java.util.concurrent.ConcurrentHashMap.
We explained the basic concepts used in this implementation like balanced trees and linked lists and we saw one of the main consequences in the usage of hash maps: the iteration order may be affected.

4. Links

More information about this Java 8 enhancement, its causes and consequences and details related to Maps improvements and changes in Java8:

Download the Source Code

So in this example we show some improvements about HashMap implementation in Java 8.

Download
You can download the full source code of this example here: hashMapJava8.zip

Dani Buiza

Daniel Gutierrez Diez holds a Master in Computer Science Engineering from the University of Oviedo (Spain) and a Post Grade as Specialist in Foreign Trade from the UNED (Spain). Daniel has been working for different clients and companies in several Java projects as programmer, designer, trainer, consultant and technical lead.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button