Some excerpts from the original post:
- Computers now have 1000 times the memory as 15 years ago when the first started handling 10 thousand connections.
- The limits of scalability aren’t our hardware, but the software, specifically, the operating system.
- The first problem is that there is no “fast path” for data. It’s a conflict between a “multi-user” design and a “single-purpose” design.
- The second problem is multi-core scalability. Old code is written according to “multi-threaded” or “multi-tasking” principles, where multiple tasks must share a single CPU. New code must be written using different principles, where a single task must be split across multiple CPUs.
- The third problem is memory. In the last 15 years, the size of main memory has gone up a thousand times, where it’s now practical to get 128-gigabytes in a cheap desktop computer. Yet, L1 and L2 caches have remained the same size, with L3 (or last level) caches only going up 10 times. This means at scale, every pointer is a cache miss. To fix this, code must pay attention to the issues of caching and paging.
- The solution to all these problems is to partition the machine into two parts: the “control” part and the “data” part. The data-partition grabs the network cards it wants and replaces the drivers with ones that bypass the kernel. The data-partition grabs all the CPU cores it wants and removes them from the operating system scheduler, so no other tasks will run on them. The data-partition grabs the bulk of the RAM and manages it itself.
- A decade ago, engineers tackled what they called “the c10k problem” of making servers handle 10 thousand simultaneous connections. The conclusion was to fix the kernels, and write applications in an asynchronous manner. Today with C10M, we start with asynchronous code, but instead of better kernels, we move our code completely out of the kernel.