Java HashSet becomes a little faster

I was experimenting with Java HashSet, which is a pretty expensive collection to create, but it has a benefit of the O(1) performance while finding the elements from this collection. Based on my experiments performance of HashSet is improved over the last year.

I’ve written a small benchmark comparing the performance of the one year old JRE 1.8.0_05 with the latest 1.8.0_60. In my tests I’m creating a HashSet containing 100000 objects. The object looks like this:

import java.math.BigDecimal;

public class MyObject {

	public String s1 = "aaaaaaaaaaaaa";
	public Double d1 = 222222222222.22;
	public BigDecimal b1 = new BigDecimal(1.54);
	public int i1;
	
	public String s2 = "aaaaaaaaaaaaa&amp";
	public Double d2 = 222222222222.22;
	public BigDecimal b2 = new BigDecimal(1.54);
	public int i2;
	
	public String s3 = "aaaaaaaaaaaaa";
	public Double d3 = 222222222222.22;
	public BigDecimal b3 = new BigDecimal(1.54);
	public int i3;	
}

My test program runs two test loops. First, I preallocate the memory for the HashSet capable of storing 133000 object with the load factor 0.75. In the second loop I create the HashSet with the default initial size 16 and the same load factor. The load factor 0.75 causes the HashSet to grow if the number of elements becomes greater than 75% of the initial capacity. I was expecting the first loop to perform a little better because there is no need to do additional memory allocations. Here’s the test program:

import java.time.Duration;
import java.time.LocalTime;
import java.util.HashSet;

public class Main {

	static int iterations = 100000;
	static float loadFactor=0.75f;
	static int initialSize = (int)Math.round(iterations/loadFactor);
	
	public static void main(String[] args) {
       
	   int nTests = 10;
	   long totalTime = 0; 
	   
	   // HashSet with large initial size
	   for (int i=0; i&amp;lt;nTests; i++){
		totalTime += populateHashSet(initialSize, loadFactor);					
	   }
	   
	   System.out.println("With JRE &amp;quot; + System.getProperty(&amp;quot;java.version") + " the average time (in milis) to populate the HashSet of initial size " + initialSize + " is "+ totalTime/nTests);
	  
	   
	   // HashSet with default initial size
	   initialSize = 16;  // default for HandSet
	   totalTime = 0; 
	   
	   for (int i=0; i < nTests; i++){
		totalTime += populateHashSet(initialSize, loadFactor);					
	   }
	   
	   System.out.println("With JRE &amp;quot; + System.getProperty("java.version") + " the average time (in milis) to populate the HashSet of initial size " + initialSize + " is "+ totalTime/nTests);
	}
	

	static long populateHashSet(int size, float loadFactor){
     
		System.gc();
		
		HashSet&amp<MyObject> hashSet = new HashSet<>(initialSize, loadFactor);
		
			LocalTime before = LocalTime.now(); 
			
	
			for (int i =0; i < iterations; i++){
				MyObject obj = new MyObject();
				obj.i1 = i*2; 
				hashSet.add(obj);
			}
		
		LocalTime after = LocalTime.now();
		
       return Duration.between(before, after).toMillis();		
				
	}
}

Running this program in two different JREs shows the following results:

With JRE 1.8.0_05 the average time (in milis) to populate the HashSet of initial size 133333 is 167
With JRE 1.8.0_05 the average time (in milis) to populate the HashSet of initial size 16 is 152

With JRE 1.8.0_60 the average time (in milis) to populate the HashSet of initial size 133333 is 153
With JRE 1.8.0_60 the average time (in milis) to populate the HashSet of initial size 16 is 141

My test shows that the HashSet in JRE 1.8.0_60 gets populated about 10% faster than with JRE 1.8.0_05. If your application works with large HashSets you should upgrade your JRE.

What I can’t explain is why the numbers with default initial size are better than with the HashSet with pre-allocated memory.

2 thoughts on “Java HashSet becomes a little faster”

Dmitry says:

August 18, 2015 at 12:59 pm

Nice post!
Looks like the benchmark is a little bit unfair. It’s a microbenchmark. It doesn’t take into account JIT compilation. Also, It seems more reliable to do a set of experiments and use standard deviation, mean value, or something. Take a look at http://openjdk.java.net/projects/code-tools/jmh/ it’s a standard tool for benchmarking which works correctly with GC, JIT, … impacts.

Yakov Fain says:

August 18, 2015 at 1:19 pm

This is microbenchmark, and the time measurement numbers may be different in the real-world applications, but this test shows that the numbers are smaller in newer JREs.