An Easy Way to Generate Aliases in Bash

I was looking at this quick bash tip and found that there is a better way to create aliases, than what is mentioned in the article:

function z(){
   varName=$1
   cwd=`pwd`
   eval alias $varName=”cd\ $cwd”
}

so, its usage will then become, say, “z webd” which will take the current dir and assign an alias webd to cd to it. Instead of remembering g1, g2 etc., we can remember the aliases with meaningful names. What do you think?

Loops Per Jiffy to mdelay, udelay and ndelay

What do we do with LPJ? You can use that value and create delays in execution without sleeping (or blocking). But, what if we want to sleep for lesser durations? Like microseconds or nanoseconds? Then we need to do some arithmetic.

loops_per_jiffy = x;
loops_per_sec = x * HZ (there are HZ jiffies in a sec)
loops_per_usec = x * HZ * (1/1000000)
loops_per_n_usec = n * loops_per_usec

The basic problem with this is – loops_per_usec falls below zero and only integer operations are fast and it becomes zero.  Lets take an example:

loops_per_jiffy = 3500
HZ = 1000
loops_per_usec = 3500 * 1000 * (1/1000000) = 3.5

There is one problem with the above calculation. Linux does not use floating point arithmetic. Because floating point arithmetic is slow. If we use integer arithmetic – then we will have loops_per_usec as zero because of the (1/1000000) component.

So, how to make it non-zero? The path taken by kernel is as follows: it multiplies loops_per_usec with 2^32.  Because of that, whole product moves left by 32 bit and then you take the most significant 32 bits.

The asm-ppc has this in its code:

/*
 * Note that 19 * 226 == 4294 ==~ 2^32 / 10^6, so
 * loops = (4294 * usecs * loops_per_jiffy * HZ) / 2^32.
 *
 * The mulhwu instruction gives us loops = (a * b) / 2^32.
 * We choose a = usecs * 19 * HZ and b = loops_per_jiffy * 226
 * because this lets us support a wide range of HZ and
 * loops_per_jiffy values without either a or b overflowing 2^32.
 * Thus we need usecs * HZ <= (2^32 - 1) / 19 = 226050910 and
 * loops_per_jiffy <= (2^32 - 1) / 226 = 19004280
 * (which corresponds to ~3800 bogomips at HZ = 100).
 *  -- paulus
 */
#define __MAX_UDELAY    (226050910UL/HZ)    /* maximum udelay argument */

extern __inline__ void __udelay(unsigned int x)
{
    unsigned int loops;

    __asm__("mulhwu %0,%1,%2" : "=r" (loops) :
        "r" (x), "r" (loops_per_jiffy * 226));
    __delay(loops);
}

#define udelay(n) (__builtin_constant_p(n)? \
    ((n) > __MAX_UDELAY? __bad_udelay(): __udelay((n) * (19 * HZ))) : \
    __udelay((n) * (19 * HZ)))

PPC’s mulhw instruction takes two numbers, multiplies them and divides them by 2^32. So, the code takes all the numbers and segregates them into a and b such that a and b does not overflow for the maximum possible value of udelay. With these constraints, __MAX_UDELAY is defined. So, the __udelay() function then taken the number of usecs, makes sanity checks and then passes on “a” as the parameter to the function. The function already has “b” with it, takes a and uses mulhw to get us the loops which is then passed to __delay function to produce so much delay.

And for the ndelay calculation, the code looks like this:

#define __MAX_NDELAY    (4294967295UL/HZ)   /* maximum ndelay argument */

extern __inline__ void __ndelay(unsigned int x)
{
    unsigned int loops;

    __asm__("mulhwu %0,%1,%2" : "=r" (loops) :
        "r" (x), "r" (loops_per_jiffy * 5));
    __delay(loops);
}

#define ndelay(n) (__builtin_constant_p(n)? \
    ((n) > __MAX_NDELAY? __bad_ndelay(): __ndelay((n) * HZ)) : \
    __ndelay((n) * HZ))

The only thing that needs to be observed here is that – since 2^32/10^6 is 4294, 2^32/10^9 is 4.2. This is the only change that occurs. a and b are again separated so that max ndelay can be achieved. loops_per_jiffy is now multiplied with 5 (ceiling of 4.294).

Calibrating Loops Per Jiffy in Linux Kernel

I was going through the kernel code of calibrating the delay loop and was just amazed at the beauty of it. Thought, would share my joy here. Here is the source code for the warm up before calculation:

        while ((loops_per_jiffy <<= 1) != 0) {
            /* wait for "start of" clock tick */
            ticks = jiffies;
            while (ticks == jiffies)
                /* nothing */;
            /* Go .. */
            ticks = jiffies;
            __delay(loops_per_jiffy);
            ticks = jiffies - ticks;
            if (ticks)
                break;
        }

The above loop tries to set the most possible significant bit in loops_per_jiffy variable such that the resultant number (loops_per_jiffy) when passed to __delay function, will cause the __delay to take more than one jiffy to execute. In other words, starting from LSB, it moves the bit to the left (in effect doubling loops_per_jiffy) in every iteration such until this number is big enough to cause __delay function to take more than one jiffy. The void while loop inside is basically to let go of the current jiffy – that is, to start at the beginning of the next jiffy time window, by letting go of the current jiffy window (we don’t know where in the current jiffy window we are so if we start __delay now, then our loops_per_jiffy calculation is skewed).

Lets take a small notation here: lpj_x is loops_per_jiffy variable with only bit x set. And LPJ be the closest value of the variable that can cause the __delay function to lapse exactly jiffy unit of time.

When we come out of loop, the following assertion is true: lpj_x causes __delay to cross a jiffy and lpj_(x-1) does not. Does this mean that lpj_x is the exact delay that can cause a jiffy to pass? No. That exact LPJ is somewhere between lpj_x and lpj_(x-1). So, starting from here we have to find LPJ further.

And here in comes another piece of beauty.

        loops_per_jiffy >>= 1;
        loopbit = loops_per_jiffy;
        while (lps_precision-- && (loopbit >>= 1)) {
            loops_per_jiffy |= loopbit;
            ticks = jiffies;
            while (ticks == jiffies)
                /* nothing */;
            ticks = jiffies;
            __delay(loops_per_jiffy);
            if (jiffies != ticks)   /* longer than 1 tick */
                loops_per_jiffy &= ~loopbit;
        }

Since lpj_x is a loose upper bound on LPJ, we proceed to calculate a tighter upper bound. The above loop does it. To understand this, consider x to be 3. Then lpj_3 8 – it is 1000 in binary. And lpj_2 4 is 100. The more precise LPJ is between lpj_3 and lpj_2. We would start from 4 and proceed to 8. We start from lpj_2 and set further lsbs and see it comes close. That is, we start with 4, make it 6 and then 7 (100, 110 and 111). Although we miss out on some numbers, this is apparently considered to be good enough an approximation.

That is what the loop does exactly. As usual, the inner void while loop is to align ourselves to the beginning of a jiffy window.

At the end, we will have lpj close to LPJ.

The only other magic sauce here seems to be __delay. It is just an instruction that is used to some CPU intensive (or CPU only) operation for so many iterations. We use that as a tool to tune loops_per_jiffy.

SYN-Cookies

I found a nice thing mentioned in the comments section of a blog entry which talks about asynchronous network communication for better performance:

Nkiller2 uses the technique of client syn-cookies to keep track of the SYNs it sends without any additional memory overhead. Essentially, it encodes the quadruple { src port, src IP address, dst port, dst IP address } along with a secret key into the TCP sequence field and upon receiving a SYN-ACK it can deduce whether or not this belonged to a SYN it previously sent by subtracting 1 from the TCP ACK field, and checking the number against the current packet’s reencoded quadruple.

The idea of treating the sequence number as a place holder for keeping custom handles is cool. :-)

This is possible only with SYNs. Because the ack that gets exchanged during the SYN will just increment sequentially after that. They cannot have random values once the intial ack is negotiated.

The article also highlights some important things which are unknown to me earlier:

  • How do you find the list of DNS servers out there on the web? – by iterating through all possible IP addresses and sending a DNS request packet to it. If it comes, good enough, note it down.
  • How to do things parallely? – by using asynchronous communication. It also improves throughput/performance. For example, in the above case, we cannot traverse the IP address space sequentially. We send it asynchronously – by creating as many sockets as possible and sending out a DNS packet to them.

Comments on Haskell

From Slashdot:

The thing about functional languages, and strict lazy functional languages like Haskell, is that the underlying principles are quite different from procedural languages like C. In C, you tell the computer to do things. In Haskell, you tell the computer the relationships between things, and it figures out what to do all on its own.

Personally, I suck at Haskell — I’m too much of a procedural programmer. My mind’s stuck in the rails of doing thing procedurally. But I’d very much like to learn it more, *because* it will teach me different ways of thinking about problems. If I can think of an ethernet router as a mapping between an input and output stream of packets rather than as a sequence of discrete events that get processed sequentially, then it may well encourage me to solve the problem in a some better way.

Hascal, and other functional languages may be good for multi-core development. However not to many programmers program in them… Plus I find they do not scale well for larger application. Its good for true computing problem solving. But today most developopment is for larger application which doesn’t necessarly solve problems per-say but create a tool that people can use.

The truth is that the vast majority of the software out there does pretty dull, mostly procedural jobs. That’s why the main languages in use are just dull variations on the procedural, C/Java/Perl style. No matter how much maths geeks go on about functional programming, procedural systems will always be more suited and easier to use for most of the problems out there.

The point is that there’s nothing those languages can do that can’t be done, often more easily, with the current crop of popular languages. Elegance cannot beat convenience in the workplace, or in most at any rate.

Your real problem with Haskell is that it is more complex per written token, and so you have to think more per token. Most people seem to generate some inner fear for things they don’t understand as good as they expect. And that’s the base of all your motivation to find reasons why you dislike Haskell. Of course you could simplify it, and get something like Python. But this is a bad idea on the long run, because then nature will only create bigger idiots. It’s better to wise up a bit, because what you get then, is really really nice!

A pure functional language would be a language where there are no side effects. I.e., you can’t change the state of anything, you can only construct new things out of existing things. As this gives some problems with IO, Haskell, taking purity to the extreme, had to wait for the invention of Monads to be able to do IO. Yes, Haskell was not capable of IO (reading/writing) for years. Functional languages follow this pattern: side-effects are only permitted if there is no other way. Examples are Lisp and Scheme, but also Matlab, Mathematica and Scala. Other languages allow side-effects by default, and have functional aspects in other respects. Examples of these are Javascript, Ruby, Python, Java, C++, C and even assembler (programming without any functional aspects is going to be hard). Quite likely Javascript programming can be done almost purely functionally. But so can C.

Don’t be ridiculous. Functional vs procedural isn’t a matter of intelligence. It’s simply a way of thinking. And the reality is that procedural languages better match the way the human mind works.

IMHO, learning to program in a functional style is like a right-handed person learning to write with their left. Yeah, they can do it, but it requires a ton of work for dubious real-world benefit, and in the end, it’s never really natural, simply because that’s not the way the brain is wired (except for the odd freakish exception ;) .

Some More Perl Facts

  • In Perl, every parameter is passed by reference. If we convert the implicit @_ array to named parameters, that’s when we switch from call by reference to call by value, which is obvious. Literals, if any, that are passed to a function are strictly read-only. Any attempt to modify them will result in an exception.
  • The return value of a function is the value of the last expression evaluated.
  • If an array or hash is passed to a subroutine, then they will be list interpolated. That means that the array contents or hash contents are just expanded and passed to the subroutine. In that sub routine, the array or hash cannot be accessed as a single entity, however the values modified are still reflected.
  • There is a keyword in Perl called wantarray that will be true (in Perl sense) if the sub routine is called in a list context, false otherwise.
  • Just like the way in which the input parameters are list interpolated, the returned parameters are also flat, so assigning them to multiple arrays is a wrong thing to do. Everything will go to the first array.
  • The & thing has more confusion to it than one can expect. When a function is called like &foo;, the parameter list of the caller is passed to the function. However, when it is called like &foo(), the parameter list is empty. The & thing should be placed as a prefix to the function whenever the function name is used in places like defined, undefined etc. Also the presence of & means that type checking will be disabled.

I think that finding and understanding the basic rules of a programming language is an essential thing to do rather than remembering the rest of the gory details. Because the latter follows from the former and the former is, in general, more compact and can easily fit inside one’s head. :-)

The Zen of Python

It would rather be the gist of Python:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

Strange C++ Code

I found this piece of code:

#include <iostream>
using namespace std;
int main() {
   const int a = 10;
   int *b = (int *)&a;
  *b = 20;
   cout << a << ‘\n’ << *b << endl;
   return 0;
}

The output of the above code is 10, 20. Because the compiler, upon finding a constant, does compile time optimizations. Casting away constness is undefined and anything can happen (like sth mentioned above).

Dijsktra in “The Humble Programmer”

Dijkstra in his Turing lecture The Humble Programmer, says/writes some classic things. First, he tells us that the two most popular opinions about programming are not worth the salt:

  • That to program, one has to be puzzle minded.
  • That programming is essentially all about optimizing the computational process.

He says that, in the earlier days, the automatic computer (a term that is not recognized now) was mild in its power and so programming was a small problem. But now they have become gigantic in power and so programming has also become a gigantic problem:

In this sense the electronic industry has not solved a single problem, it has only created them, it has created the problem of using its products. To put it in another way: as the power of available machines grew by a factor of more than a thousand, society’s ambition to apply these machines grew in proportion, and it was the poor programmer who found his job in this exploded field of tension between ends and means.

He says some profound words:

…I have the feeling that one of the most important aspects of any computing tool is its influence on the thinking habits of those that try to use it…

He comments about the software development process in the economic context that:

…as long as machines were the largest item on the budget, the programming profession could get away with its clumsy techniques, but that umbrella will fold very rapidly.

And regarding proving a program’s correctness, he says:

..We all know that the only mental tool by means of which a very finite piece of reasoning can cover a myriad cases is called abstraction; as a result the effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer. In this connection it might be worth-while to point out that the purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise. Of course I have tried to find a fundamental cause that would prevent our abstraction mechanisms from being sufficiently effective. But no matter how hard I tried, I did not find such a cause. As a result I tend to the assumption -up till now not disproved by experience- that by suitable application of our powers of abstraction, the intellectual effort needed to conceive or to understand a program need not grow more than proportional to program length.

Finally he concludes:

It has already taught us a few lessons and the one I have chosen to stress in this talk is the following. We shall do a much better programming job, provided that we approach the task with a full appreciation of its tremendous difficulty, provided that we stick to modest and elegant programming languages, provided that we respect the intrinsic limitations of the human mind and approach the task as Very Humble Programmers.

Finding Whether a SLL is circular or not.

SLL – Singly Linked List.

We all know the obvious one:

List *ptr1;

if (!curPtr) return 1; //An empty list is circular or non-circular?

ptr1 = curPtr->next;

while (ptr2 != NULL) {

if(ptr1 == ptr2) return 0;

ptr2 = ptr2->next;

}

return 1;

In this case, if the list has n elements and the list is circular, then there will be n pointer dereferences.

I have another one in mind: instead of traversing ptr1 in single steps, traverse it in double steps – so we will loop over fast:

ptr1=curPtr->next;

while(ptr1 != NULL) {

if(ptr1 == curPtr || ptr1->next == curPtr) return 0;

if(ptr1->next) ptr1 = ptr1->next->next;

else ptr1 = NULL;

}

return 1

Although I wonder if this is any better than the above one. At best, this is some loop unrolling I think. :-)

I came to know about another mechanism. Instead of having a single pointer, you keep two pointers. You make the second pointer traverse fast – that is instead of walking past single node, you walk past two nodes at a time – the basic idea being if it is a circular list, you will quickly loop over and come back to the slow moving pointer. Even if it is not a circular pointer, you will fall off the list quickly.

The interesting thing here is analysis – first question to ask is – will they meet eventually?, because it may happen that the fast walking pointer might skip the slow walking pointer in a hurry. So this needs some proof that both will meet.

So here is such an argument that they will eventually meet: after the slow and the fast pointers (so to speak) are initialized to i and i+1, it will take exactly n-1 iterations for them to meet. To make sense of it, consider this: after n-1 iterations, the slow pointer will be at (i+n-1)%n th position – which is equivalent to (i-1)%n. The fast pointer will be at:

(i+1 + 2(n-1))%n

= (i+1+2n -2)%n

= (i-1+2n)%n

= (i-1)%n th position.

Hence after n-1 iterations, both will be at (i-1)%n th position which means they both meet. But, this is slower than the alternative where we keep the first pointer static and move only the second pointer.

« Older entries

Follow

Get every new post delivered to your Inbox.