I spoke with Gil Tene, Vice President of Technology and CTO, Co-Founder of Azul Systems the other day about Azul's Pauseless Garbage Collection and heard about the 2Gbyte boundary that Java and .Net seem to be running into. It seems that it has to do with garbage collection because going past that point will significantly increase the delays associated with garbage collection.
Yes, garbage collectors are getting better but, with some exceptions, most collectors eventually have to compact memory. When they do the world stops. This has led to an interesting architecture with large systems consisting of dozens or hunderds of virtual machines running a Java virtual machine (JVM) and an associated Java application. These communicate via sockets. It is a good approach given the memory limitation but it leads to all sorts of hassles. For example, systems are often reset in the evening just so the compaction phase does not occur.
Azul started out to break this barrier and has done so with its latest Vega 3 processor that is based on their 54-core chip. Yes, that's a 54-core, 64-bit processor that just runs Java, an a garbage collector. It implements a load value barrier (LVB) instruction that follows any load of a Java reference. I won't get into all the details. You can check them out in a later article I am working on or on Azul's site where you can find various whitepapers on the topic. Essentially the LVB works hand-in-glove with the garbage collection system to make sure the reference target can be accessed. This is a little trickier than you might imagine since their pauseless garbage collection system is always running. It is also compacting on the fly.
What is interesting about the compaction process is that each garbage collection thread only needs one virtual memory page. Many other compacting collectors require twice the amount of free memory to do this chore. Of course, Azul uses a 2 Mbyte page size, not the usual 4 Kbyte page. Still, the hardware handles this and it improves handling of large amounts of data common in the enterprise arena that Azul operates in.
Custom hardware is nice but it turns out that the same tricks can be done on stock hardware with the right virtual memory support. That is what Azul did with its Zing Elastic Software Platform. Zing's Java JIT compiler runs on an x86_64 platform and emits a few instructions instead of the custom LVB. The rest of the system runs about the same though.
The end result is pauseless garbage collection on stock hardware that allows a single environment to utilize one to two orders of magnitude more memory with better overall performance.
The trick is that Zing runs in a virtual machine so it can gain direct access to the virtual memory support. This also means that it works like the Vega appliance because Azul essentially provides a stub that runs on a host platform like Windows or Linux. The actual Java applications run on the Zing or Vega applicance and communicate with the host via sockets.
In the case of Zing, this is still required because operating systems like Windows and Linux do not provide the level of virtual memory support that Azul needs. To this end they have started the The Managed Runtime Initiative (MRI) where they are showing off extensions to the Linux kernel that improve the virtual memory support they use by two orders of magnitude. All the source code is at the MRI run site but it takes someone experienced with rebuilding OpenJDK and the Linux kernel to get a running system. Still, MRI is really a discussion point rather than a platform. The idea is to get these changes into the Linux kernel and other operating systems. There seems to be some forward movement on this but it will likely be a couple years before this happens. As it turns out, these enhancements could be useful for other environments like .NET.
I'm working on another blog post that will address another reason for wanting this type of support. One of the advantages Java has from a framework perspective is garbage collection. Memory management is a royal pain with C++ frameworks and it is a significant limiting factor on the creation and use of frameworks on platforms that explicitly manage memory. But that is for another day.
By the way, Azul's approach is much different than Atego's. Atego has an embedded product called AonixPerc (see Hands On Real Time Java - Atego PERC). This is a real time Java solution that also runs a parallel garbage collector.