There is perhaps a common impression that 64bit is generally always better than 32bit software on PCs. This tends to be based on:
- 64bit obviously provides sufficient virtual address space to allow all RAM in any PC to be easily addressable – 32bit runs into problems with modern PCs.
- AMD64 provides twice the number of GPRs over IA32.
- AMD64 provides RIP relative addressing mode, to ease implementation of PI code.
This impression is not quite correct though. There are costs to 64bit. Particularly in increased memory usage, which can be of consequence certain environments, such as hosting VMs. More surprisingly, even the common wisdom that 64bit improves performance can have the odd exception.
All this means that there is value in running a 32bit userspace, even on 64bit kernels on 64bit hardware. Only those applications which require 64bit pointers, or have an actual demonstrable performance benefit from 64bit word sizes need be run as 64bit. Doing so can allow significant amounts of memory can be saved (30%, 60%, sometimes more).
64 Bit Virtual Address Space
This matters to any software which wants to address large amounts of memory. 32 bits are only sufficient to directly address 4GB of memory. Beyond such amounts software either will not be able to access some memory, or else must either make use of special bank-switching schemes to swap different regions of memory in/out of its virtual-memory context. E.g. PAE at the low-level; or in userspace, using mmap() to map different files in & out. This is an awkward extra complication that costs performance. 4GB of physical RAM has been well within the reach of PCs today, given that RAM on things like graphics cards must also be mapped into those 32bits. Also, even for systems with less than 4GB of physical RAM, 4GB of virtual address space, which once seemed immense, can now be quite cramped with the demands made of it, e.g. by the ever-growing numbers of libraries to be loaded, as well as security techniques such as ASLR.
The prime example is the core “kernel” of an operating system, which needs to manage the mappings between physical and virtual address spaces in order to allow the system to work. Memory it can not address likely is memory the system can not use at all, ever. Userspace examples include large databases, scientific computing on large datasets, naive text editors and, very sadly, if trends continue single-process web-browsers may also hit problems with just 32bits.
There is however a down-side to 64bits: Increased memory usage. On 32bit machines, the “int”, “long” and “pointer” data types typically are all 32bit – often referred to as “ILP32”. To make use of 64bits the pointer data type obviously must be 64bits, and usually the “long” type is too, called “LP64”. I.e. longs and pointers take twice the space on 64bit. This means there is usually at least a 30% increase in overall system memory usage with 64bit. In the worst case, for codes with rich, interlinked data structures (e.g. indices over lots of small pieces of data) the memory usage increase can even be several fold. E.g. I’ve measured the increase between 32bit and 64bit for Fedora, for booting to the GDM login prompt, to be just under 60%.
However, many programmes are O(1) in their memory usage and will pretty much never have a direct need for large amounts of memory. Of the rest, many have modest memory needs and are unlikely to have problems with 32bits of address space any time soon.
So there is a cost to that extra address space, and it’s one which need not come with a benefit to much of the software running on a system.
AMD64 also brings other architectural improvements over IA32, as per the last 2 items in the list above. The doubled number of GPRs particularly should make a noticeable difference to performance, it is argued. This is of course true, and many performance sensitive codes can make great use of these extra registers. The compiler should have extra opportunities for avoiding having to store and reload values to/from memory, which might allow memory traffic to be reduced a little and the CPU to wait a little less. The 8-byte integer data-type inherent to LP64 also can bring great performance benefits to certain applications.
As an example, let’s look at the performance of OpenSSL, using its built-in “speed” benchmark. The underlying system (for all the benchmark results here) is a dual-core, 1.8GhZ AMD 2210 CPU, running Fedora 13 with a primarily 32bit userspace on an x86_64 kernel, 184.108.40.206-56.fc13.x86_64):
We can see that generally AMD64 offers significant speed increases, particularly for the RSA and DSA public-key algorithms and particularly with larger key sizes. There are a couple of exceptions (MD2, DES, BlowFish) which might be due to something like these algorithms not yet having highly optimised AMD64 implementations, while the i386 version might be hand-crafted assembly (plus, MD2, who cares?).
What is surprising though is that sometimes the advantage of these extra registers can actually be cancelled out by the extra memory traffic from having to load/store 64bit longs and pointers, and the reduced coverage of the caches. Indeed, the balance of these two competing effects can it seems sometimes adversely impact performance when switching from IA32 to AMD64 / x86_64. That is, AMD64 / x86_64 can be slower, for certain applications.
- Mozilla js, package js-1.70-10.fc13
- 32bit: Score: 68.6 max: 68.6 min: 61.2 Avg: 67.06 run: 20
- 64bit: Score: 90.4 max: 91.1 min: 78.8 Avg: 88.3571 run: 49
- Google V8, package v8-2.3.5-1.20100806svn5198.fc13
- 32bit: Score: 2940 max: 2996 min: 2399 Avg: 2840.25 run: 20
- 64bit: Score: 2816 max: 2917 min: 2415 Avg: 2818.78 run: 50
There is some indeterminism in these benchmarks. Higher scores are better.