Posted in Linux, Programming

Loops Per Jiffy to mdelay, udelay and ndelay

What do we do with LPJ? You can use that value and create delays in execution without sleeping (or blocking). But, what if we want to sleep for lesser durations? Like microseconds or nanoseconds? Then we need to do some arithmetic.

loops_per_jiffy = x;
loops_per_sec = x * HZ (there are HZ jiffies in a sec)
loops_per_usec = x * HZ * (1/1000000)
loops_per_n_usec = n * loops_per_usec

The basic problem with this is – loops_per_usec falls below zero and only integer operations are fast and it becomes zero.  Lets take an example:

loops_per_jiffy = 3500
HZ = 1000
loops_per_usec = 3500 * 1000 * (1/1000000) = 3.5

There is one problem with the above calculation. Linux does not use floating point arithmetic. Because floating point arithmetic is slow. If we use integer arithmetic – then we will have loops_per_usec as zero because of the (1/1000000) component.

So, how to make it non-zero? The path taken by kernel is as follows: it multiplies loops_per_usec with 2^32.  Because of that, whole product moves left by 32 bit and then you take the most significant 32 bits.

The asm-ppc has this in its code:

/*
* Note that 19 * 226 == 4294 ==~ 2^32 / 10^6, so
* loops = (4294 * usecs * loops_per_jiffy * HZ) / 2^32.
*
* The mulhwu instruction gives us loops = (a * b) / 2^32.
* We choose a = usecs * 19 * HZ and b = loops_per_jiffy * 226
* because this lets us support a wide range of HZ and
* loops_per_jiffy values without either a or b overflowing 2^32.
* Thus we need usecs * HZ <= (2^32 - 1) / 19 = 226050910 and
* loops_per_jiffy <= (2^32 - 1) / 226 = 19004280
* (which corresponds to ~3800 bogomips at HZ = 100).
*  -- paulus
*/
#define __MAX_UDELAY    (226050910UL/HZ)    /* maximum udelay argument */

extern __inline__ void __udelay(unsigned int x)
{
unsigned int loops;

__asm__("mulhwu %0,%1,%2" : "=r" (loops) :
"r" (x), "r" (loops_per_jiffy * 226));
__delay(loops);
}

#define udelay(n) (__builtin_constant_p(n)? \
((n) > __MAX_UDELAY? __bad_udelay(): __udelay((n) * (19 * HZ))) : \
__udelay((n) * (19 * HZ)))

PPC’s mulhw instruction takes two numbers, multiplies them and divides them by 2^32. So, the code takes all the numbers and segregates them into a and b such that a and b does not overflow for the maximum possible value of udelay. With these constraints, __MAX_UDELAY is defined. So, the __udelay() function then taken the number of usecs, makes sanity checks and then passes on “a” as the parameter to the function. The function already has “b” with it, takes a and uses mulhw to get us the loops which is then passed to __delay function to produce so much delay.

And for the ndelay calculation, the code looks like this:

#define __MAX_NDELAY    (4294967295UL/HZ)   /* maximum ndelay argument */

extern __inline__ void __ndelay(unsigned int x)
{
unsigned int loops;

__asm__("mulhwu %0,%1,%2" : "=r" (loops) :
"r" (x), "r" (loops_per_jiffy * 5));
__delay(loops);
}

#define ndelay(n) (__builtin_constant_p(n)? \
((n) > __MAX_NDELAY? __bad_ndelay(): __ndelay((n) * HZ)) : \
__ndelay((n) * HZ))

The only thing that needs to be observed here is that – since 2^32/10^6 is 4294, 2^32/10^9 is 4.2. This is the only change that occurs. a and b are again separated so that max ndelay can be achieved. loops_per_jiffy is now multiplied with 5 (ceiling of 4.294). 