Calibrating Loops Per Jiffy in Linux Kernel

I was going through the kernel code of calibrating the delay loop and was just amazed at the beauty of it. Thought, would share my joy here. Here is the source code for the warm up before calculation:

        while ((loops_per_jiffy <<= 1) != 0) {
            /* wait for "start of" clock tick */
            ticks = jiffies;
            while (ticks == jiffies)
                /* nothing */;
            /* Go .. */
            ticks = jiffies;
            ticks = jiffies - ticks;
            if (ticks)

The above loop tries to set the most possible significant bit in loops_per_jiffy variable such that the resultant number (loops_per_jiffy) when passed to __delay function, will cause the __delay to take more than one jiffy to execute. In other words, starting from LSB, it moves the bit to the left (in effect doubling loops_per_jiffy) in every iteration such until this number is big enough to cause __delay function to take more than one jiffy. The void while loop inside is basically to let go of the current jiffy – that is, to start at the beginning of the next jiffy time window, by letting go of the current jiffy window (we don’t know where in the current jiffy window we are so if we start __delay now, then our loops_per_jiffy calculation is skewed).

Lets take a small notation here: lpj_x is loops_per_jiffy variable with only bit x set. And LPJ be the closest value of the variable that can cause the __delay function to lapse exactly jiffy unit of time.

When we come out of loop, the following assertion is true: lpj_x causes __delay to cross a jiffy and lpj_(x-1) does not. Does this mean that lpj_x is the exact delay that can cause a jiffy to pass? No. That exact LPJ is somewhere between lpj_x and lpj_(x-1). So, starting from here we have to find LPJ further.

And here in comes another piece of beauty.

        loops_per_jiffy >>= 1;
        loopbit = loops_per_jiffy;
        while (lps_precision-- && (loopbit >>= 1)) {
            loops_per_jiffy |= loopbit;
            ticks = jiffies;
            while (ticks == jiffies)
                /* nothing */;
            ticks = jiffies;
            if (jiffies != ticks)   /* longer than 1 tick */
                loops_per_jiffy &= ~loopbit;

Since lpj_x is a loose upper bound on LPJ, we proceed to calculate a tighter upper bound. The above loop does it. To understand this, consider x to be 3. Then lpj_3 8 – it is 1000 in binary. And lpj_2 4 is 100. The more precise LPJ is between lpj_3 and lpj_2. We would start from 4 and proceed to 8. We start from lpj_2 and set further lsbs and see it comes close. That is, we start with 4, make it 6 and then 7 (100, 110 and 111). Although we miss out on some numbers, this is apparently considered to be good enough an approximation.

That is what the loop does exactly. As usual, the inner void while loop is to align ourselves to the beginning of a jiffy window.

At the end, we will have lpj close to LPJ.

The only other magic sauce here seems to be __delay. It is just an instruction that is used to some CPU intensive (or CPU only) operation for so many iterations. We use that as a tool to tune loops_per_jiffy.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s