Ways of Sleeping in Linux Kernel

There are 2-3 different ways of sleeping in Linux Kernel.

The first and simple way of sleeping is to set the state of the current process to either INTERRUPTIBLE or NON_INTERRUPTIBLE and then call schedule. Setting the state to something other than RUNNING is important because only then the kernel will take the process out of running queue. Now that the process is scheduled out, it has to be scheduled back in some way – that is achieved using wake_up(). It takes the task_struct of a process as a parameter. Here is a sample piece of code gotten from Linux Journal:

//Process A:
if(list_empty(&list_head)) {

/* Rest of the code ... */

//Process B:
list_add_tail(&list_head, new_node);

There is one race condition problem in the above piece of code which results in “Lost Wake Up” call. It is as follows: the process checks for some condition and then sets the task state as interruptible and goes on to sleep. There can be a small race condition where the process which fulfills this condition will wake up the process (waking up is just setting the task state to RUNNING and keeping it in run queue) just after the condition is checked for – this results in a situation where the sleeping process has gone to sleep after the waking process has woken it up – this is the lost wakeup problem. The consequences of this can be serious or not. If this is the sleeping process that is going to mark the condition as false and goes on to sleep further, then this lost wake up problem results in the sleeping process remain in that state forever. However, if the condition is satisfied externally, then eventually the condition will become true again and the waking process will wake up the sleeping process. The fix for this is to set the task state before making a check – so if the process goes to sleep, then it will be just kept in the running queue instead of wait queue. schedule() API will keep the process only in the running queue is the task state is running.

One more issue with this form of sleeping is that the waker process has to know the task_struct of the sleeping process. This can become tedious when there is more than one process involved in sleeping. So, another form of sleeping in the kernel is to use wait queues. Here is a sample piece of code that declares a wait queue and keeps itself in the wait queue.

wait_event_interruptible(my_event,  (cond == x));

void my_wake_up(void)
    if (cond == x)
    set_bit(2, &my_flags);

As can be seen in the above piece of code, when we go to wait in a queue, we also pass a condition to be checked against – so the kernel will make sure that this condition is not true before keeping the process in the wait queue. The kernel will change the state of the process and then check the condition and put the process in the wait queue. When it is time, a waker process will come and wake up process on the wait queue – that way, it need not know which process is sleeping, all it knows is the wait queue. Somewhat better than the earlier non-scalable form of sleeping. There is also a API wake_up_all() which will wake up all the processes – this API results in a thundering herd problem if not used properly. However, there are use cases for this too – for example, the problem of multiple readers and one writer can use this API – when the writer has acquired the lock, all the readers will be put in the wait queue – when the writer is done, it can do a wake_up_all() of all the processes in the wait queue. All the readers will successfully acquire the lock.

Another way of sleeping is to sleep for a definite time period. The process can sleep for so many jiffies or it can sleep for so many milliseconds/seconds. Here is a sample code for sleeping in milliseconds.

read_done = 0;
while (read_done == 0) {
  msleep(2); //sleep for a couple of milliseconds.

// Another thread
read_done = 1;

The process does not know how long it is going to take, but is sure that it is not going to take long – so it chose to avoid creating another wait queue and simply used msleep() API to sleep for a millisecond. Sooner or later, the condition becomes true and the process moves forward. This incurs a small overhead on the CPU in terms of context switch etc., but it is a simple design. BTW, if we want to sleep in jiffies, then one can use a schedule_timeout() API. Internally, it is going to add the process to some wait queue and then wake it up when it is time. msleep() also does the same thing.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s