Friday, March 15, 2013

When 2 is less than 1

Suppose you have a data structure with a boolean field, but the field is lazily initialized, so you need to know whether it has been initialized. Do you immediately think of using a Boolean object instead of a boolean primitive so that you can use the null reference to represent not initialized? What about using a second boolean instead to represent initialization state?

Which way is better? Let’s analyze this question from memory usage standpoint. A boolean primitive takes up one byte. The object header for an empty object on a 64-bit JVM is twelve bytes. Of course, a Boolean object is not empty, it has a boolean primitive inside of it, so that’s thirteen bytes. So we are looking at thirteen bytes for the one field solution vs. two bytes for the two field solution.

Does this matter in all situations? Of course, not. But if there will be many instances of your data structure, the memory savings add up very quickly.

6 comments:

Ian bull said...

This is a good point, and something that many people don't think about. However, it should be noted that a 'byte' doesn't really just take up one 'byte'.

I created 2 x 100,000 Objects. The first set had a single int field (4 bytes) and the second had a single byte field. I then looked at the memory usage of each set, and they were about the same. This is likely because of how objects are word aligned in memory. If you have multiple byte fields however, then they do take up less room than a bunch of int fields.

However, certainly using a byte is less memory than using a whole other data structure, which was your point.

Ian bull said...

This is a good point, and something that many people don't think about. However, it should be noted that a 'byte' doesn't really just take up one 'byte'.

I created 2 x 100,000 Objects. The first set had a single int field (4 bytes) and the second had a single byte field. I then looked at the memory usage of each set, and they were about the same. This is likely because of how objects are word aligned in memory. If you have multiple byte fields however, then they do take up less room than a bunch of int fields.

However, certainly using a byte is less memory than using a whole other data structure, which was your point.

Eric Rizzo said...

Really? Even with 1 million instances, the Boolean storage only amounts to just over 10MB of memory usage difference - pretty negligible on any machine I've used in recent memory.
The readability sacrifice that using two booleans brings is more than the memory footprint, IMO. Micro-managing performance/memory like this should only be considered after performance/memory issues are identified and measured in a production environment - I think you'd be better to make that clear with this post.

Konstantin Komissarchik said...

Ian,

You are correct. Due to memory alignment considerations, the sizes that I listed need to be padded if there are no other fields in the object.

Actually, I just saw that I forgot to add the size of the reference, which is up to eight bytes on a 64-bit JVM.

So, worst case:

Boolean : 12 + 4 + 8 = 24 bytes
Two booleans : 4 bytes

Eric,

1 million instances is not that many in a lot of situations and 10MB multiplied by the number of places the pattern comes up adds up to a significant amount of memory in a hurry.

While I generally agree with "don't optimize until you know you have a performance problem and then measure" thinking, too often this is used to justify sloppy programming techniques and never thinking about performance of code that's written. By the time you know you you've got a problem, it's a very expensive mess to analyze and detangle.

SeB said...

You should have a look at the Optional pattern : http://www.coconut-palm-software.com/the_new_visual_editor/doku.php?id=blog:improvements_on_null_safety_for_java_and_eclipse

Konstantin Komissarchik said...

SeB,

I suspect that you didn't understand the point of my post. The blog you have referenced discusses techniques for making null return easier to deal with in some ways. It does so at the expense of performance and memory utilization, which is the focus of my blog post.

For instance, consider class Some (concrete implementation of Possible as presented in the blog). It holds an object reference and an IStatus reference.

Even ignoring IStatus, the extra wrapper around Boolean would bring memory cost to 44 bytes (24 for Boolean and another 20 for the wrapper object with a reference to the Boolean).