Posted in software

## Installing and Using Perf tool

To Install perf:

``apt-get install linux-tools-common``
``apt-get install `linux-tools-generic`
``apt-get install `linux-tools-`uname -r``

To start monitoring a service using perf (for 10 seconds): (link from Brendan Gregg)

perf record  -p PID sleep 10

To report usage

perf report

Posted in software

## PCI – Lanes/Express etc.

This article gives a very good introduction about what PCI “lanes” are, what is the difference between raw speed and data speed and how the old and new standards fare w.r.t speed.

Posted in software

## The Three Types of Data Centers..

This blog post does a nice job in summarizing the difference between three data centers based on various parameters. I have summarized the same as a table:

Posted in software

## On Formal Reasoning

As I am going through Introduction to Algorithms course from MIT, just past the first lecture, we can see the importance of formally proving an algorithm correct. If an algorithm is intuitively incorrect at first, then the need for formal reasoning becomes even more important.

Coming to the 2-d peak finding algorithm, here is a semi-formal reasoning:

• When we find a global maximum in a column and shift to either left (right), it is because this global maximum is less than the left (right) element. So, if there is a new global maximum in this column, then it is definitely more than all the elements in the previous column because it is more than the left (element) itself. So, the algorithm works.
• Will the same argument apply if we find a 1-D peak in a column instead of a global maximum? Does not look to me like so…
Posted in software

## Theta, O and Omega…

The first recitation lecture of Introduction to algorithms gives a different introduction to the asymptotic complexity.

• Theta (n) is when you can get a function g(x) and constants m and n such that our candidate complexity function f(x) is between m.g(x) and n.g(x) – i.e Theta(n) can be said if we can find a function such that it can serve as both tight upper and lower bounds of the candidate function
• When such a function is not possible (lets say we have a sin(x) in the function, per lecture), then we can only find a function g(x) such that f(x) is not more than m.g(x) (there is no “not less than” clause) here. When this is the case, we have O(n)
• When there are both lower bounds and upper bounds for the candidate function f(x), but the lower and upper bounds are not with in a constant factor away, then Theta cannot be used. We can use Omega(lb(x)) or O(ub(x)) – where lb(x) and ub(x) are the lower and the upper bound functions in x respectively.
Posted in software

## Divide and Conquer Approach to Peak Find

I was listening to the peak find lecture from MIT and was confused as to why divide and conquer approach would work as the array is not sorted. It took me some time and a few examples to understand what is going on.

• The comparison at the mid point of array is not same as that of searching, but to check for the peak
• When we branch to the left (or right), we have already checked one side for the property of peakness, the other side remains to be checked.
• if the left (right) element is the only element, then it is the peak (trivial case)
• if not, we further sub-divide the problem and continue on the pieces with the same logic.

Basically, by having “greater than or equal to” in the definition of the peak, we are guaranteed to find a peak in any given array.

Posted in software

## Linux Parameters to Tune for Large Concurrent Connections

MigratoryData has achieved 12 million concurrent connections, that to with a JVM!! Time to rethink about JVM. Some Linux parameters that are tuned are:

• The maximum number of open files the system can handle (by tuning /proc/sys/fs/nr_open)
• The maximum number of open files a process can have (by tuning with “ulimit -n”)
• Increase the number of ephemeral ports on a box (the default being from 32768 to 61000) from “500 to 65535”, with “sysctl -w net.ipv4.ip_local_port_range”. On the server side, it is the same listening address and the listening port that is used to pair with each client end point address.
• Once a connection is established, the quadruple that is used to uniquely identify a connection is <serverIP, serverListeningPort, clientIp, clientEphemeralPort>
• We dont need so many port numbers on the server side, we need only so many open sockets
• Reuse TCP’s TIME_WAIT sockets with “/proc/sys/net/ipv4/tcp_tw_recycle”
• Amount of memory allocated to tcp/udp with “sysctl -w net.ipv4.[tcp|udp]_mem”
• Balance the interrupt processing by changing the smp affinity of each tx/rx queue (now is the time to read more about the Intel 10G ethernet adapter) to a different core (writing the bit to /proc/irq/<interruptNum>/smp_affinity

One thing to note is that with all this connections, CPU was not the bottleneck, but memory was. In fact, CPU was still at around 47% when the server reached 12M connections, but memory reached its brim.

Besides, some JVM tuning is also done, which is out of scope of my mind. :-). All in all, they broke the C10M benchmark with a stock kernel properly tuned!! Which is a magnificient achievement.

Posted in software

## Custom ASICs or General Purpose ASICs For Networking?

If you read some papers such as this, you will see that it is a bit hard to beat custom ASICs for packet routing at line rate when compared to general purpose x86 ASICs. So, how do we make the networking world more open like the server world, given this constraint? If we see the x86 world, what is standardized is an ISA (Instruction Set Architecture) on top of which layers upon layers of multi-source stacks have been built.

Is it possible to standardize such an ISA for custom ASICs? Even though they are custom, all of the ASICs (whether they are closed ASICs from Cisco, Brocade or commodity ASICs from Broadcom, Cavium etc.) do one specific thing – network related processing. Given this, is it not possible to standardize the ISA? May be it is not so straight forward to achieve this, so people ended up with things such as openflow to standardize the interface to the hardware layer.

Anyways, the article linked above (by James Hamilton) gives a very good contrast of what happened in the server world and where we are in the networking world and how it is moving). As Cumulus guys said, history does not repeat, but does rhyme! 🙂