Posted in software

Programming Tidbits – C

  • When using getchar function in C to read a characted from standard input, one should take the value to an integer and not a character. Because EOF is -1 which is an integer
  • getline is a handy function that can get a line of input from a given FILE stream (which means a file or standard input)
  • strtok can take different delimiter bytes in each invocation for the same string! Did not know this.
    • two or more delimiter bytes at the start and at the end, in between the tokens are ignored.
    • When we have multiple threads tokenizing strings, strtok does not work. For that strtok_r should be used instead which also takes another parameter to store the context
  • waitpid just doesn’t wait for the child to exit. It waits for the child’s state to change. A child can exit, can get paused/resumed by a signal. These can all be caught
    • By default, it waits for child termination, until one of the options passed overrides the default behavior
  • wait is a simpler version of waitpid where it waits until one of its children terminates
  • The pid value passed on has a lot of semantics. If the pid is:
    • > 0, then it is the pid of a child process
    • -1, then it is any child process
    • < -1, then it is any child process that belongs to this process group (absolute value)
    • 0, then it is any child process that belongs to the caller’s process group
  • WIFEXITED and WIFSIGNALED are two macros that can help in knowing if the child process terminated normally or interrupted via a signal
Posted in software

Core Cost of Doing a Business

A couple of Intel folks gave a talk on the cost of using a core for the business of networking. Some key points to take away are:

  1. For a packet that hits the kernel fast path, on an average 1900 cycles are spent – they say even this is expensive
  2. When the packet misses fast path, it takes about 25000 cycles!! Which is 13x more costly
  3. Caches do matter! They are game changers.
  4. Even if we use a hardware accelerator, the worst case performance is same as that of a software data plane, because of the misses
  5. Surprisingly, even if we increase the number of cores allocated to handle miss processing, the performance does not increase!!
  6. Moving to OVS+DPDK has increased the performance from 1.3 MPPS to 25 MPPS, but still cannot beat a hardware accelerated environment. The performance improvement is because of avoiding frequent context switches between kernel and user space.
  7. Every core that is freed up from network services, is a core that goes to the revenue generating application
  8. Hardware offload of network services on a server is the coming future. It increases performance, is cost effective and generates more revenue.
Posted in software

Soft Skills for Senior Engineers

Hacker news is running a thread on wisdom (or enhanced knowledge) that senior engineers have acquired over time, some nuggets are as follows (in no order):

  • The ability to quickly learn things to be useful
  • Understand the customer – the best engineers are half product managers
  • Advanced googling
  • Ability to formulate more creative hypothesis when obvious lines of investigation run out
  • Be comfortable with more tools
  • Don’t bitch about the legacy software – they are not all evadable
  • Be thorough
  • Be quantitative
Posted in software

QOTD

From Scott Berkun’s page:

No matter what you do, a bored mind will always find a way to do something else

 

Every device with an off switch is “distraction free” if you choose to flip that switch. Every app has a close button. Take some responsibility. Sure, gadgets and software may help you, but at some point the problem is you, your commitment and your habits

Posted in software

An Exercise for Concentration

From Scott Berku’s page:

You can sit or lay down. Once  you are in position, close your eyes and relax. Just sit there for a minute and think about anything you want. After about a minute, start thinking about the neighborhood or subdivision in which your home is located. In your mind see your neighbors houses, the  roads, the streets, the trees. After you can see your neighborhood  clearly, move down the road your house is on and see the houses along the way…

When you begin, you will only be able to concentrate for a few seconds or minutes. If you work at it every day, though, you can build your concentration just as you can build the strength in your arms

Posted in software

Monitoring Tools..

There are a couple of tools that we frequently use, but we dont know how to use them properly. This page explains some quirks of the most commonly used monitoring tools:

  • For using top, the following shortcuts are referred:
    • z – make top display colorful!
    • 1 – show per CPU stats
    • c – give full path names of the executables and their arguments
    • M – sort the processes based on memory consumption
  • For using htop, the following shortcuts are referred:
    • l – display the list of open files in a process
    • s – display the strace output of process (you need root privileges, of course)
    • Clicking on a column with mouse, will sort the list based on that column
  • There is also one more glances tool that monitors pretty much everything. Most importantly, it exports data in various formats!!
Posted in software

The Real Skill of Programming

is not algorithmic wizardry, but organizational skills. The ability to manage state, to do error handling properly etc. Says one guy. It makes sense! The better we orgainze the code, the easier it is to maintain and the easier it is to keep it sane. I have not used any crazy algorithms so far in the code- just a height balanced tree, a hash and thats about it. It is the software development skills that one needs to develop and master.

Posted in software

DocString guidelines from Python

This PEP talks about how to use/frame docstrings for consistency with the rest of the community. Some noteworthy points are:

  • docstring should be in command language – do this, return that – rather than descriptive – does this, returns that
  • Use triple double quotes even for single line docstring
  • Multiline docstrings have summary in the first line, a blank line and then an elaborate explanation of the object in hand

More such conventions mentioned, worth noting…

Posted in software

On Passwords and Security

It is  a known fact that passwords are never stored in plain text, especially so on Unix based systems. A hash of it is calculated and the resultant hash is stored in a shadow file so that the given passwords are hashed and compared against the stored ones.

But what if the shadow file is stolen? Can somebody reverse engineer the password by brute force approach? Of course, yes. Tools like John the Ripper are reasonably sophisticated to do the job. Those tools are assisted by frequently used password lists so that they can find the passwords easily. Moreover, hashing algorithms to generate a signature of the given content are designed to be very fast (no suspicious motives, though!), so it accelerates password cracking process because with very fast hashing algorithms, passwords can be cracked very fast too!!

Dropbox engineering came up with extra layers of security over and above the hash which is stored in the shadow database. In short it does the following with the password:

  • Calculate a SHA 512 hash of the password (to normalize the length of the password which is to be sent to the next layer)
  • The hash is then encrypted with bcrypt with a per-user salt and a cost of 10. Cost of 10 implies taking significant amount of time to encrypt the hash (100ms, apparently – which means only 10 encryptions per second – which is significantly costlier on modern day CPUs). (Apparently, the bcrypt command line tool on Linux just takes the password [salt] and not any cost). bcrypt uses blowfish algorithm for encryption while crypt uses DES for encryption.
  • The resulting encrypted/hashed (it is, technically, encrypted, not hashed) content is again subjected to AES 256-bit encryption with a global key and the result is stored in the password database.

With so many password database leaks these days, Dropbox engineering has done some serious job in protecting user’s passwords. Multiple layers of fuzzing and segregated store for password and keys – are the two important things to note here.

While I was reading this, I got curious to understand how passwords are stored in the shadow database. I was wondering how does someone find out what is the algorithm used (for example crypt or bcrypt) to encrypt the password? It turns out that piece of information is encoded with in the hashed string.

Apparently, originally to begin with the passwords are encoded with Modular Crypt Format where the password string is encoded as $identifier$content where identifier says what kind of encoding/hashing is used and the content is the actual content. The content is usually a reg-exp of the form [A-Za-Z0-9./].  Popular schemes used are (taken from the above link)

hash schemes

The above schemes are natively used by OS for authenticating users while logging in. Applications are known to use some of the other schemes (again from the above link):

app hashes

There is also one more encoding scheme or standard The PHC String Format. The above scheme does not give the flexibility to encode some parameters like salt and other stuff. That’s where the PHC string format comes in. It allows different param=value pairs. I haven’t seen PHC string format in vogue at least on the linux boxes, that I work on. Its mostly the modular crypt format.

The above link (PHC format) also explains in some detail as to how base64 encoding is done for content. Base64 is just a system where all the bytes are from 0-63 – which means they are in the printable ASCII range. You do the math do convert and reconvert them back to normal range.

To sum up, Dropbox has gone a step ahead in making the passwords more secure. While the OS way of authenticating is going to be the same for some time, how the application securely stores the passwords is a space which is open for innovation. 🙂

Posted in software

Architectural Complexity and Fat

This is a bit old one, but an excellent article on how architectural complexity compares with body fat, on how architectural complexity needs to be kept at bay and what are some common patterns of increasing the complexity.

This was a read which came out of IETF’s article Reflections on Architecture which, among other things, highlights the differences between emergent complexity (which the author says is ok to have) and architectural complexity (which is not desirable and which is a result of poor design decisions – due to laziness or resource pressures).