Java Heap Allocation Faster than C++

Question

I already posted this question on SO and it did ok. It was unfortunately closed though(only needs one vote to reopen) but someone suggested I post it on here as it is a better fit so the following is literally a copy paste of the question

I was reading the comments on this answer and I saw this quote.

Object instantiation and object-oriented features are blazing fast to use (faster than C++ in many cases) because they're designed in from the beginning. and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.

One user (with really high rep I might add) boldly defended this claim, stating that

heap allocation in java is better than C++'s
and added this statement defending the collections in java

And Java collections are fast compared to C++ collections due largely to the different memory subsystem.

So my question is can any of this really be true, and if so why is java's heap allocation so much faster.

You may find my answer to a similar question over on SO useful/relevant. — Daniel Pryden, Aug 18 '13 at 7:11
@DanielPryden Really nice answer definitely explains somethings — aaronman, Aug 18 '13 at 7:18
It's trivial: with Java (or any other managed, restricted environment) you can move objects and update pointers to them - i.e., optimise for a better cache locality dynamically. With C++ and its pointer arithmetics with uncontrolled bitcasts all the objects are pinned to their location forever. — SK-logic, Aug 22 '13 at 9:27
I never thought I'd hear someone say Java memory management is faster because it copies memory around all the time. sigh. — gbjbaanb, Aug 22 '13 at 11:24
@gbjbaanb, have you ever heard of memory hierarchy? Cache miss penalty? Do you realise that a general purpose allocator is expensive, while a first generation allocation is just a single addition operation? — SK-logic, Aug 22 '13 at 14:22

mikera · Accepted Answer · 2013-08-23 03:07:44Z

This is an interesting question, and the answer is complex.

Overall, I think it is fair to say that the JVM garbage collector is very well designed and extremely efficient. It's probably the best general purpose memory management system.

C++ can beat the JVM GC with specialised memory allocators that are designed for specific purposes. Examples might be:

Per-frame memory allocators, which wipe the whole memory area at periodic intervals. These are frequently used in C++ games, for example, where a temporary memory area is used once per frame and immediately discarded.
Custom allocators managing a pool of fixed-sized objects
Stack based allocation (although note that the JVM also does this in various circumstances, e.g. via escape analysis)

Specialised memory allocators are, of course, limited by definition. They usually have restrictions on object lifecycle and/or restrictions on the type of object that can be managed. Garbage collection is much more flexible.

Garbage collection also gives you a some significant advantages from a performance perspective:

Object instantiation is indeed extremely fast. Because of the way that new objects are allocated sequentially in memory, it often requires little more than one pointer addition, which is certainly faster than typical C++ heap allocation algorithms.
You avoid the need for lifecycle management costs - e.g. reference counting (sometimes used as an alternative to GC) is extremely poor from a performance perspective since the frequent incrementing and decrementing of reference counts adds a lot of performance overhead (typically much more than GC).
If you use immutable objects, you can take advantage of structural sharing to save memory and improve cache efficiency. This is used heavily by functional languages on the JVM like Scala and Clojure. It is very difficult to do this without GC, because it is extremely hard to manage the lifetimes of shared objects. If you believe (as I do) that immutability and structural sharing are key to building large concurrent applications, then this is arguably the biggest performance advantage of GC.
You can avoid copying if all types of object and their respective lifecycles are managed by the same garbage collection system. Contrast with C++, where you often have to take full copies of data because the destination requires a different memory management approach or has a different object lifecycle.

Java GC has one major downside: because the work of collecting garbage is deferred and done in chunks of work at periodic intervals, it causes occasional GC pauses to collect garbage, which can effect latency. This is usually not a problem for typical applications, but can rule Java out in situations where hard realtime is a requirement (e.g. robotic control). Soft realtime (e.g. games, multimedia) is typically OK.

there are specialized libraries in the c++ area that address that problem. The probably most famous example for that is SmartHeap. — Tobias Langner, Aug 22 '13 at 12:37
What C++ can't do at all is dynamic memory optimisation, so, in theory, managed environments must be much more efficient than the purely static ones. Of course, this sort of memory management is still in its infancy, so we're only starting to see the real benefits now, after decades of GC research. — SK-logic, Aug 22 '13 at 14:37
Soft-realtime doesn't mean you're OK to stop usually. It just means you can pause/retry in real bad situation - usually unexpected - instead of halt/crash/failure. No one would like to use usually pausing music player. The problem of GC pause is it happens usually and unpredictably. In that manner, GC pause is not acceptable even for soft-realtime application. GC pause is acceptable only when users don't care application quality. And nowadays, people are not that much naive anymore. — Eonil, Dec 8 '13 at 0:51
Please post some performance measurements to support your claims, otherwise we're comparing apples and oranges. — JBRWilkinson, Dec 8 '13 at 9:54
@Eonil Only if the pause is long enough. If heaps are small, the pause will be extremely short. — Demetri, May 2 '14 at 2:20

rwong · Answer 2 · 2013-08-18 05:58:06Z

up vote 3 down vote

This is not a scientific claim. I am simply giving some food for thought on this issue.

One visual analogy is this: you are given an apartment (a residential unit) which is carpeted. The carpet is dirty. What is the fastest way (in terms of hours) to make the apartment's floor sparkling clean?

Answer: simply roll up the old carpet; throw away; and roll out a new carpet.

What are we neglecting here?

The cost of moving out existing personal belongings and then moving in.
- This is known as the "stop-the-world" cost of garbage collection.
The cost of the new carpet.
- Which, coincidentally for RAM, it is free.

Garbage collection is a huge topic and there are plenty of questions both in Programmers.SE and StackOverflow.

On a side issue, a C/C++ allocation manager known as TCMalloc together with object reference counting is theoretically able to meet the best performance claims of any GC system.

answered Aug 18 '13 at 5:58

rwong
9,0011948

actually c++11 even has a garbage collection ABI, this is pretty similar to some of the answers I got on SO – aaronman Aug 18 '13 at 6:01

It is the fear of breaking existing C/C++ programs (code bases, such as Linux kernels and archaic_but_still_economically_important libraries such as libtiff) that hindered progress of language innovation in C++. – rwong Aug 18 '13 at 6:04

Makes sense, I would guess by c++17 it will be more complete, but the truth is once you really learn how to program in c++ you don't even want it anymore, maybe they can find a way to combine the two idioms nicely – aaronman Aug 18 '13 at 6:05

Do you realise that there are garbage collectors that do not stop the world? Have you considered performance implications of compactification (on the GC side) and heap fragmentation (for generic C++ allocators)? – SK-logic Aug 22 '13 at 9:24

1

I think the main flaw in this analogy is that what GC actually does is to find the dirty bits, cut them out and then saw the remaining bits back together to create a new carpet. – svick Aug 22 '13 at 20:20

| show 2 more comments

gbjbaanb · Answer 3 · 2013-08-22 11:37:02Z

The main reason is that, when you ask Java for a new lump of memory,it goes straight to the end of the heap and gives you a block. In these way, memory allocation is as fast as allocating on the stack (which is how you do it most of the time in C/C++, but apart from that..)

So allocations are fast as anything but... that doesn't count the cost of freeing the memory. Just because you don't free anything until much later doesn't mean it doesn't cost quite a lot, and in the case of GC system, the cost is quite a lot more than 'normal' heap allocations - not only does the GC have to run through all objects to see if they're alive or not, it also then has to free them, and (the big cost) copy the memory around to compact the heap - so you can have the fast allocation at the end mechanism (or you'd run out of memory, C/C++ for example will walk the heap on every allocation looking for the next block of free space that can fit the object).

This is one reason why Java/.NET benchmarks show such good performance, yet real-world applications show such bad performance. I only have to look at the apps on my phone - the really fast, responsive ones are all written using the NDK, so much so even I was surprised.

Collections nowadays can be fast if all the objects are locally allocated, eg in a single contiguous block. Now, in Java, you simply don't get contiguous blocks as objects are allocated one at a time from the free end of the heap. You can end up with them happily contiguous, but only by luck (ie down to the whim of the GC compaction routines and how it copies objects). C/C++ on the other hand explicitly supports contiguous allocations (via the stack, obviously). Generally heap objects in C/C++ is no different from Java's BTW.

Now with C/C++ you can get better than the default allocators which were designed to save memory and use it efficiently. You can replace the allocator with a set of fixed-block pools, so you can always find a block that is exactly the right size for the object you're allocating. Walking the heap just becomes a matter of a bitmap lookup to see where a free block is, and de-allocation is simply re-setting a bit in that bitmap. The cost is that you use more memory as you allocate in fixed-size blocks, so you have a heap of 4 byte blocks, another for 16 byte blocks, etc.

Looks like you do not understand GCs at all. Consider the most typical scenario - hundreds of small objects are constantly allocated, but only a dozen of them will survive for more than a second. This way, there is absolutely no cost in freeing the memory - this dozen is copied from the young generation (and compactified, as an additional benefit), and the rest is discarded at no cost. And, by the way, pathetic Dalvik GC has nothing to do with the modern, state of the art GCs you'll find in the proper JVM implementations. — SK-logic, Aug 22 '13 at 14:41
If one of those freed objects is in the middle of the heap, the rest of the heap will be compacted to reclaim the space. Or are you saying GC compaction doesn't happen unless its the best-case you describe? I know generational GCs do a lot better here, unless you release an object in the middle of the later generations, in which case the impact can be relatively large. There was something written by a Microsoftie who worked on their GC that I read that described the GC tradeoffs when making a generational GC.. I'll see if I can find it again. — gbjbaanb, Aug 22 '13 at 22:00
What "heap" are you talking about? Most of the garbage is reclaimed at the young generation stage, and most of the performance benefits are coming exactly from that compactification. Of course, it's mostly visible on a memory allocation profile typical for functional programming (many short-living small objects). And, of course, there are numerous optimisation opportunities not quite explored yet - for example, a dynamic region analysis which may turn heap allocations in a certain path into stack or pool allocations automatically. — SK-logic, Aug 22 '13 at 22:08
I disagree with your claim that heap allocation is 'as fast as the stack' - heap allocation requires thread synchronisation and stack does not (by definition) — JBRWilkinson, Dec 8 '13 at 9:57
I guess so, but with Java and .net you see my point - you don't have to walk the heap to find the next free block so its significantly faster in that regard, but yes - you're right, it has to be locked which will hurt threaded apps. — gbjbaanb, Dec 8 '13 at 14:36

asked	2 years ago
viewed	4564 times
active	1 year ago

current community

your communities

more stack exchange communities

Java Heap Allocation Faster than C++

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged java c++ performance heap collections or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Java Heap Allocation Faster than C++

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java c++ performance heap collections or ask your own question.

Related

Hot Network Questions