Critical comparison of benefits of 64bit to 32bit

Summary

There is perhaps a common impression that 64bit is generally always better than 32bit software on PCs. This tends to be based on:

  1. 64bit obviously provides sufficient virtual address space to allow all RAM in any PC to be easily addressable – 32bit runs into problems with modern PCs.
  2. AMD64 provides twice the number of GPRs over IA32.
  3. AMD64 provides RIP relative addressing mode, to ease implementation of PI code.

This impression is not quite correct though. There are costs to 64bit. Particularly in increased memory usage, which can be of consequence certain environments, such as hosting VMs. More surprisingly, even the common wisdom that 64bit improves performance can have the odd exception.

All this means that there is value in running a 32bit userspace, even on 64bit kernels on 64bit hardware. Only those applications which require 64bit pointers, or have an actual demonstrable performance benefit from 64bit word sizes need be run as 64bit. Doing so can allow significant amounts of memory can be saved (30%, 60%, sometimes more).

64 Bit Virtual Address Space

This matters to any software which wants to address large amounts of memory. 32 bits are only sufficient to directly address 4GB of memory. Beyond such amounts software either will not be able to access some memory, or else must either make use of special bank-switching schemes to swap different regions of memory in/out of its virtual-memory context. E.g. PAE at the low-level; or in userspace, using mmap() to map different files in & out. This is an awkward extra complication that costs performance. 4GB of physical RAM has been well within the reach of PCs today, given that RAM on things like graphics cards must also be mapped into those 32bits. Also, even for systems with less than 4GB of physical RAM, 4GB of virtual address space, which once seemed immense, can now be quite cramped with the demands made of it, e.g. by the ever-growing numbers of libraries to be loaded, as well as security techniques such as ASLR.

The prime example is the core “kernel” of an operating system, which needs to manage the mappings between physical and virtual address spaces in order to allow the system to work. Memory it can not address likely is memory the system can not use at all, ever. Userspace examples include large databases, scientific computing on large datasets, naive text editors and, very sadly, if trends continue single-process web-browsers may also hit problems with just 32bits.

There is however a down-side to 64bits: Increased memory usage. On 32bit machines, the “int”, “long” and “pointer” data types typically are all 32bit – often referred to as “ILP32”. To make use of 64bits the pointer data type obviously must be 64bits, and usually the “long” type is too, called “LP64”. I.e. longs and pointers take twice the space on 64bit. This means there is usually at least a 30% increase in overall system memory usage with 64bit. In the worst case, for codes with rich, interlinked data structures (e.g. indices over lots of small pieces of data) the memory usage increase can even be several fold. E.g. I’ve measured the increase between 32bit and 64bit for Fedora, for booting to the GDM login prompt, to be just under  60%.

However, many programmes are O(1) in their memory usage and will pretty much never have a direct need for large amounts of memory. Of the rest, many have modest memory needs and are unlikely to have problems with 32bits of address space any time soon.

So there is a cost to that extra address space, and it’s one which need not come with a benefit to much of the software running on a system.

Architectural Improvements

AMD64 also brings other architectural improvements over IA32, as per the last 2 items in the list above. The doubled number of GPRs particularly should make a noticeable difference to performance, it is argued. This is of course true, and many performance sensitive codes can make great use of these extra registers. The compiler should have extra opportunities for avoiding having to store and reload values to/from memory, which might allow memory traffic to be reduced a little and the CPU to wait a little less. The 8-byte integer data-type inherent to LP64 also can bring great performance benefits to certain applications.

As an example, let’s look at the performance of OpenSSL, using its built-in “speed” benchmark. The underlying system (for all the benchmark results here) is a dual-core, 1.8GhZ AMD 2210 CPU, running Fedora 13 with a primarily 32bit userspace on an x86_64 kernel, 2.6.34.7-56.fc13.x86_64):

Percentage speed increase for AMD64 over IA32, for OpenSSL enciphering algorithms

Percentage speed increase for AMD64 over IA32, for OpenSSL enciphering algorithms

Percentage speed increase for AMD64 v i386, for OpenSSL cryptographic hashing algorithms

Percentage speed increase for AMD64 v i386, for OpenSSL cryptographic hashing algorithms

Percentage speed increase for AMD64 v i386 for OpenSSL public-key algorithms

Percentage speed increase for AMD64 v i386 for OpenSSL public-key algorithms

We can see that generally AMD64 offers significant speed increases, particularly for the RSA and DSA public-key algorithms and particularly with larger key sizes. There are a couple of exceptions (MD2, DES, BlowFish) which might be due to something like these algorithms not yet having highly optimised AMD64 implementations, while the i386 version might be hand-crafted assembly (plus, MD2, who cares?).

As another example, the POVRay 3D ray-tracer‘s standard benchmark scene took 30m29s for i386 and 26m36s for AMD64 (userspace times). Which is a nice 22% speedup.

What is surprising though is that sometimes the advantage of these extra registers can actually be cancelled out by the extra memory traffic from having to load/store 64bit longs and pointers, and the reduced coverage of the caches.  Indeed, the balance of these two competing effects can it seems sometimes adversely impact performance when switching from IA32 to AMD64 / x86_64. That is, AMD64 / x86_64 can be slower, for certain applications.

To demonstrate this, let’s run Googles’ JavaScript test-suite for V8:

  • Mozilla js, package js-1.70-10.fc13
    • 32bit: Score: 68.6 max: 68.6 min: 61.2 Avg: 67.06 run: 20
    • 64bit: Score: 90.4 max: 91.1 min: 78.8 Avg: 88.3571 run: 49
  • Google V8, package v8-2.3.5-1.20100806svn5198.fc13
    • 32bit: Score: 2940 max: 2996 min: 2399 Avg: 2840.25 run: 20
    • 64bit: Score: 2816 max: 2917 min: 2415 Avg: 2818.78 run: 50

There is some indeterminism in these benchmarks. Higher scores are better.

The Mozilla js scores show a nice improvement going from 32 to 64bit, of order 30%. However, Mozilla JS is an older implementation of the language – it’s quite slow. Google’s V8 is a more sophisticated JIT implementation and clearly much faster. However, for V8, 64bit causes a slight but repeatable slowdown, of about 1 to 3%. So, while 64bit is an improvement for the slower Javascript engine, 32bit is an improvement for the fastest engine. A somewhat surprising result.

2 Comments »

  1. […] This post was mentioned on Twitter by Mo, Paul Jakma. Paul Jakma said: My blog entry on the benefits of 64bit over 32bit: http://bit.ly/dt7jI9 […]

  2. Paul Jakma said

    Oh, as an aside, there have been architectures with 64bit word sizes but just 32bit pointers. The 64bit RISC architectures particularly often defined an ILP32 ABI with 64bit “long long” data-type (call it “LL64+ILP32”, say), such as SGI IRIX on MIPS64 with its “n32” ABI. The GMP library ABI support documentation suggests PA-RISC’s 2.0n ABI, IA-64’s ILP32 ABI, PPC64’s “mode32” ABI all support such LL64+ILP32 ABIs.

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: