The main reason is that, when you ask Java for a new lump of memory,it goes straight to the end of the heap and gives you a block. In these way, memory allocation is as fast as allocating on the stack (which is how you do it most of the time in C/C++, but apart from that..)
So allocations are fast as anything but... that doesn't count the cost of freeing the memory. Just because you don't free anything until much later doesn't mean it doesn't cost quite a lot, and in the case of GC system, the cost is quite a lot more than 'normal' heap allocations - not only does the GC have to run through all objects to see if they're alive or not, it also then has to free them, and (the big cost) copy the memory around to compact the heap - so you can have the fast allocation at the end mechanism (or you'd run out of memory, C/C++ for example will walk the heap on every allocation looking for the next block of free space that can fit the object).
This is one reason why Java/.NET benchmarks show such good performance, yet real-world applications show such bad performance. I only have to look at the apps on my phone - the really fast, responsive ones are all written using the NDK, so much so even I was surprised.
Collections nowadays can be fast if all the objects are locally allocated, eg in a single contiguous block. Now, in Java, you simply don't get contiguous blocks as objects are allocated one at a time from the free end of the heap. You can end up with them happily contiguous, but only by luck (ie down to the whim of the GC compaction routines and how it copies objects). C/C++ on the other hand explicitly supports contiguous allocations (via the stack, obviously). Generally heap objects in C/C++ is no different from Java's BTW.
Now with C/C++ you can get better than the default allocators which were designed to save memory and use it efficiently. You can replace the allocator with a set of fixed-block pools, so you can always find a block that is exactly the right size for the object you're allocating. Walking the heap just becomes a matter of a bitmap lookup to see where a free block is, and de-allocation is simply re-setting a bit in that bitmap. The cost is that you use more memory as you allocate in fixed-size blocks, so you have a heap of 4 byte blocks, another for 16 byte blocks, etc.