NUMA and you, perfect together (Part 1)

I know this is a slightly more esoteric topic, even for me, but I want to address cc:NUMA platforms, and how they matter to Windows and Windows applications. What is NUMA you ask? NUMA stands for Non-Uniform Memory Architecture. (The cc: stands for Cache Coherent, by the way, because there is non-cache coherent NUMA as well, but I won't address that here since there are no Windows support platforms that are non-cache coherent.)

To understand why NUMA exists, we need to look at Symmetric Multiprocessing (SMP). SMP has a few core principles it is built around, and one is that every CPU in the system has an identical view of the system. Memory, I/O subsystem, and other CPU's can all be treated the same by software. The problem comes when this assumption is no longer true. As you scale up the size of a system, it becomes harder and harder to keep everything close together, literally. The more switches and busses your data flows through, the longer it takes.

This fact means that in order to squeeze the maximum amount of performance out of the system, it behooves the OS as well as the programmer to try and keep data as close to the place where it's needed as possible. By keeping track of which pages of memory and CPU's have the best locality to each other, decisions can be made when threads are scheduled and memory allocated that will squeeze that extra little bit out of the system.

Until only a few years ago, this was exclusively the realm of large mainframe style computers, not the PC world. But with the introduction of the Unisys ES7000 in 2000, the PC suddenly had something to benefit by being NUMA aware. Even then, this was something that mostly concerned large scale-up server implementations, not the average user or programmer. That is, until AMD announced their unique implementation of their new Opteron and Athlon64 processors. Suddenly, any system that has more than one of those CPUs could potentially benefit from NUMA optimizations. I'll go into why in the next entry.

Comments

  • Anonymous
    August 30, 2004
    I need to thank whoever implemented NUMA in xp sp2 - I get over 10GB/s memory bandwidth on my dual Opteron!

    (SiSoft Sandra benchmark on my blog till it rolls off - but I'm on a pocket pc and can't get the permalink - sorry!)
  • Anonymous
    September 01, 2004
    I didn't realize that had happened. I've been too busy playing with the amd64 port, which is awesome, by the way. Any idea if NUMA is in that build?
  • Anonymous
    September 02, 2004
    AFAIK, the Win32 APIs to use NUMA smartly from within an app are only exposed on Windows Server 2003 or later.
  • Anonymous
    September 02, 2004
    The comment has been removed
  • Anonymous
    September 08, 2004
    amd64 == 3790 kernel, same as server 2003. I bet the APIs are there too but i'm too lazy to look atm.
  • Anonymous
    June 01, 2009
    PingBack from http://uniformstores.info/story.php?id=19054