On Formal Reasoning

As I am going through Introduction to Algorithms course from MIT, just past the first lecture, we can see the importance of formally proving an algorithm correct. If an algorithm is intuitively incorrect at first, then the need for formal reasoning becomes even more important.

Coming to the 2-d peak finding algorithm, here is a semi-formal reasoning:

  • When we find a global maximum in a column and shift to either left (right), it is because this global maximum is less than the left (right) element. So, if there is a new global maximum in this column, then it is definitely more than all the elements in the previous column because it is more than the left (element) itself. So, the algorithm works.
  • Will the same argument apply if we find a 1-D peak in a column instead of a global maximum? Does not look to me like so…

Theta, O and Omega…

The first recitation lecture of Introduction to algorithms gives a different introduction to the asymptotic complexity.

  • Theta (n) is when you can get a function g(x) and constants m and n such that our candidate complexity function f(x) is between m.g(x) and n.g(x) – i.e Theta(n) can be said if we can find a function such that it can serve as both tight upper and lower bounds of the candidate function
  • When such a function is not possible (lets say we have a sin(x) in the function, per lecture), then we can only find a function g(x) such that f(x) is not more than m.g(x) (there is no “not less than” clause) here. When this is the case, we have O(n)
  • When there are both lower bounds and upper bounds for the candidate function f(x), but the lower and upper bounds are not with in a constant factor away, then Theta cannot be used. We can use Omega(lb(x)) or O(ub(x)) – where lb(x) and ub(x) are the lower and the upper bound functions in x respectively.

Divide and Conquer Approach to Peak Find

I was listening to the peak find lecture from MIT and was confused as to why divide and conquer approach would work as the array is not sorted. It took me some time and a few examples to understand what is going on.

  • The comparison at the mid point of array is not same as that of searching, but to check for the peak
  • When we branch to the left (or right), we have already checked one side for the property of peakness, the other side remains to be checked.
    • if the left (right) element is the only element, then it is the peak (trivial case)
    • if not, we further sub-divide the problem and continue on the pieces with the same logic.

Basically, by having “greater than or equal to” in the definition of the peak, we are guaranteed to find a peak in any given array.


Linux Parameters to Tune for Large Concurrent Connections

MigratoryData has achieved 12 million concurrent connections, that to with a JVM!! Time to rethink about JVM. Some Linux parameters that are tuned are:

  • The maximum number of open files the system can handle (by tuning /proc/sys/fs/nr_open)
  • The maximum number of open files a process can have (by tuning with “ulimit -n”)
  • Increase the number of ephemeral ports on a box (the default being from 32768 to 61000) from “500 to 65535”, with “sysctl -w net.ipv4.ip_local_port_range”. On the server side, it is the same listening address and the listening port that is used to pair with each client end point address.
    • Once a connection is established, the quadruple that is used to uniquely identify a connection is <serverIP, serverListeningPort, clientIp, clientEphemeralPort>
    • We dont need so many port numbers on the server side, we need only so many open sockets
  • Reuse TCP’s TIME_WAIT sockets with “/proc/sys/net/ipv4/tcp_tw_recycle”
  • Amount of memory allocated to tcp/udp with “sysctl -w net.ipv4.[tcp|udp]_mem”
  • Balance the interrupt processing by changing the smp affinity of each tx/rx queue (now is the time to read more about the Intel 10G ethernet adapter) to a different core (writing the bit to /proc/irq/<interruptNum>/smp_affinity

One thing to note is that with all this connections, CPU was not the bottleneck, but memory was. In fact, CPU was still at around 47% when the server reached 12M connections, but memory reached its brim.

Besides, some JVM tuning is also done, which is out of scope of my mind. :-). All in all, they broke the C10M benchmark with a stock kernel properly tuned!! Which is a magnificient achievement.


Custom ASICs or General Purpose ASICs For Networking?

If you read some papers such as this, you will see that it is a bit hard to beat custom ASICs for packet routing at line rate when compared to general purpose x86 ASICs. So, how do we make the networking world more open like the server world, given this constraint? If we see the x86 world, what is standardized is an ISA (Instruction Set Architecture) on top of which layers upon layers of multi-source stacks have been built.

Is it possible to standardize such an ISA for custom ASICs? Even though they are custom, all of the ASICs (whether they are closed ASICs from Cisco, Brocade or commodity ASICs from Broadcom, Cavium etc.) do one specific thing – network related processing. Given this, is it not possible to standardize the ISA? May be it is not so straight forward to achieve this, so people ended up with things such as openflow to standardize the interface to the hardware layer.

Anyways, the article linked above (by James Hamilton) gives a very good contrast of what happened in the server world and where we are in the networking world and how it is moving). As Cumulus guys said, history does not repeat, but does rhyme! 🙂



This article gives a very nice summary of what are the pros and cons of different I/O multiplexing mechanisms provided by POSIX operating systems (like Linux/FreeBSD)

We will assume here, select is level 0 to start with (and unfortunately, it is still being used by tons of legacy code). The basic problems of select are:

  • A fixed size bitmap with which you can specify the fds
  • Stateless – every time one has to set all the interested fds, as once select returns, only active (result) fds will be set
  • Scan through all the fds to see which are set (both by the process when select returns, as well as by the kernel when select is called) – doing unnecessary work

which poll tries to solve to some extent are:

  • A fixed size bitmap is replaced by a variable array
  • Two separate fields are provided to track interest and the result – i.e. interest is given through one field and result is returned in another field – the only advantage here is that we don’t need to re-initialize all the interest set every time we call poll
  • Kernel does not need to scan through all the fds as only the interested fds are passed on to it (no bitmap, just an array of interested fds). However, when the select returns, the process will still have to scan through all the fds to find which is active


  • poll is still stateless – interest fds are to be copied everytime – from user space to kernel space and vice versa

Then came along epoll:

  • It made a transition to stateful event tracking – programs register an fd once and it will be tracked until program wishes to remove it.
  • So, everytime there is no need to send a list of interested events to the kernel and neither the kernel needs to scan the list of interested events everytime, it is already maintained.
  • A separate function blocks waiting for any event to happen and when it returns, the process scans only through the result events, but not through all events (like poll does)

Then came along kqueue:

  • kqueue is even more generic framework where not just sockets, but timers, signals, regular files, processes etc. can be kept and tracked for events.
  • The framework is such that it is easily extensible to any future events that may get implemented.
  • In other words, kqueue is one stop call for tracking all kinds of events. Just that it is not available in Linux yet, only in FreeBSD. 🙂