0

Custom ASICs or General Purpose ASICs For Networking?

If you read some papers such as this, you will see that it is a bit hard to beat custom ASICs for packet routing at line rate when compared to general purpose x86 ASICs. So, how do we make the networking world more open like the server world, given this constraint? If we see the x86 world, what is standardized is an ISA (Instruction Set Architecture) on top of which layers upon layers of multi-source stacks have been built.

Is it possible to standardize such an ISA for custom ASICs? Even though they are custom, all of the ASICs (whether they are closed ASICs from Cisco, Brocade or commodity ASICs from Broadcom, Cavium etc.) do one specific thing – network related processing. Given this, is it not possible to standardize the ISA? May be it is not so straight forward to achieve this, so people ended up with things such as openflow to standardize the interface to the hardware layer.

Anyways, the article linked above (by James Hamilton) gives a very good contrast of what happened in the server world and where we are in the networking world and how it is moving). As Cumulus guys said, history does not repeat, but does rhyme! 🙂

Advertisements
0

select/poll/epoll/kqueue

This article gives a very nice summary of what are the pros and cons of different I/O multiplexing mechanisms provided by POSIX operating systems (like Linux/FreeBSD)

We will assume here, select is level 0 to start with (and unfortunately, it is still being used by tons of legacy code). The basic problems of select are:

  • A fixed size bitmap with which you can specify the fds
  • Stateless – every time one has to set all the interested fds, as once select returns, only active (result) fds will be set
  • Scan through all the fds to see which are set (both by the process when select returns, as well as by the kernel when select is called) – doing unnecessary work

which poll tries to solve to some extent are:

  • A fixed size bitmap is replaced by a variable array
  • Two separate fields are provided to track interest and the result – i.e. interest is given through one field and result is returned in another field – the only advantage here is that we don’t need to re-initialize all the interest set every time we call poll
  • Kernel does not need to scan through all the fds as only the interested fds are passed on to it (no bitmap, just an array of interested fds). However, when the select returns, the process will still have to scan through all the fds to find which is active

However

  • poll is still stateless – interest fds are to be copied everytime – from user space to kernel space and vice versa

Then came along epoll:

  • It made a transition to stateful event tracking – programs register an fd once and it will be tracked until program wishes to remove it.
  • So, everytime there is no need to send a list of interested events to the kernel and neither the kernel needs to scan the list of interested events everytime, it is already maintained.
  • A separate function blocks waiting for any event to happen and when it returns, the process scans only through the result events, but not through all events (like poll does)

Then came along kqueue:

  • kqueue is even more generic framework where not just sockets, but timers, signals, regular files, processes etc. can be kept and tracked for events.
  • The framework is such that it is easily extensible to any future events that may get implemented.
  • In other words, kqueue is one stop call for tracking all kinds of events. Just that it is not available in Linux yet, only in FreeBSD. 🙂

 

0

The MLAG Components

When we want to connect two switches and make it an mlag, the following things should be kept in mind:

  • Both the switches should be advertising the same system id to the partner
  • North bound or south bound BUM traffic should not be sent twice to the LACP partner (some rules possible in Linux to drop the traffic?)
  • MAC table should be synchronized between the two switches
  • MAC Learning should be such that no bouncing of MAC addresses coming from the LACP partner happens
  • Neighbor table should be in sync between the two switches (since it is not really L3, but somewhat L2 too)
  • Since we use LACP, the connection between two switches and the partner must be direct

The links to the switches can be many, asymmetrical and when two pairs of switches form a MLAG – it will be a fully meshed network (all connecting to all) with each pair having the same system MAC address (when a host is there instead of switches, this same system MAC address is taken care of by the bonding driver)

Nice summary here. Apparently, cumulus linux uses “clagd” to periodically synchronize the MAC database to the peer switch. “clagd”, among other things sets up the peering relationship, determines the primary/secondary roles for sending BPDUs etc. There is also a presention by Cumulus on MLAGs which is a simple but nice read

0

On Maturity of an Industry..

here:

..bare-metal networking is more than being affordable; it’s about giving customers degrees of freedom, transparency, and choice that they deserve in a mature industry.

In the dark ages of computing (aka 1983), a customer running IBM DB2 had to buy an IBM mainframe (complete with cables, disks, power distribution, memory, and IO) to go with the application. The compute industry has matured to a point where DB2 runs on hardware ranging from mainframes through p-series down into non-IBM x86 platforms hosted on operating systems including z/OS, Unix, Linux, and Windows. Application independent from OS independent from hardware; degrees of freedom and choice.

We view the natural state of the networking supply chain to be one by which customers are able to purchase their networking hardware as “close to the source” as they’d like, starting at original manufacturers all the way through well known Enterprise IT providers. Regardless of the hardware source, customers are able to deploy the networking software of their choice (CumulusLinux for instance :-). There are zero technical barriers to this model, and we do it around compute every day. This allows customers to make a safe capital investment in infrastructure without being locked into any one vendor; suppliers get to earn their spot every cycle.

The attributes of transparency, choice, and degrees of freedom, not price, are driving all of the mega-scale customers to bare-metal networking solutions, whether they do it in house or leverage companies like Cumulus Networks. Make no mistake, ALL mega-scales have revolted and are somewhere on the path towards independence.

And finally:

Mass availability and access to best-of-breed software and a value chain of features on top of a bare metal switch, as opposed to a proprietary stack pre-integrated custom ASIC solution (which you are stuck with) is NOT about changing the game. It’s what you need to do to STAY in the game. It’s time to scale up, bring the network out of the dark ages, and welcome the unbundled OS.