Multi-Core vs Multi-Threaded

If a program is multi-threaded, it does not mean that it is suitable for multi-core. Multi-threaded programs are (were) written to synchronize between themselves. And to synchronize between themselves, they use various locking mechanisms such as mutexes, spin locks (with in kernel), semaphores etc. For most user level programs (for example, those that use pthreads), whenever a thread wants to synchronize, it would try to acquire a lock. If a lock is already held, then the acquiring thread would go to sleep in a queue. Sleeping means kernel taking over control, doing a context switch, running a scheduling algorithm to find out the best candidate to run and running it.

In the case of a single CPU or a CPU with a single core, this is necessary because without this, another thread cannot run and release the lock. But in case of multi core CPU, the thread may be running on another core, while the thread on the current core is being put to sleep. It may get ready sooner because the other thread might have released the lock. This way, on a multi-core CPU, context switching poses lot of overhead that the scalability is not linear, but sub-linear. That is, the kernel comes into way too much for multi-threaded programs with syncrhonization that the performance begins to hurt after adding some cores.

Robert Graham at ErrataSec has a very nice article on multi-threaded vs multi-core programs. He mentions that the performace of a system peaks at 4 cores and begins to decline after that.

And the comments section mentioned about “Erlang” which works very well for the multi-core platforms. Should take a look at it once.

Some very important points to note from that post:

  • Unable to increase clock speeds, chip companies have been adding extra logic to increase the throughput – like multiple instructions per clock cycle, multiple cores etc.
  • Multi-threaded programs are not multi-core ready.
  • For multi-core, we need more independent execution, rather than mutually synchronizing models.
  • Avoid sharing as much as possible.
  • Do a “Divide and Conquer” w.r.t sharing – that is maintain own states, and merge them/join them when needed to get the required state.
  • Use lock free versions of data structures. Use stuff such as RCU.
  • Two basic models in multi-core: pipelining and worker thread models. Former is like a assembly line, while the latter is like a bunch of robots doing everything from start to end. When there is something to share, put a pipeline stage there. When work can be done independently, worker thread model is desirable.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s