Posted in software

The Real Skill of Programming

is not algorithmic wizardry, but organizational skills. The ability to manage state, to do error handling properly etc. Says one guy. It makes sense! The better we orgainze the code, the easier it is to maintain and the easier it is to keep it sane. I have not used any crazy algorithms so far in the code- just a height balanced tree, a hash and thats about it. It is the software development skills that one needs to develop and master.

Posted in software

DocString guidelines from Python

This PEP talks about how to use/frame docstrings for consistency with the rest of the community. Some noteworthy points are:

  • docstring should be in command language – do this, return that – rather than descriptive – does this, returns that
  • Use triple double quotes even for single line docstring
  • Multiline docstrings have summary in the first line, a blank line and then an elaborate explanation of the object in hand

More such conventions mentioned, worth noting…

Posted in software

On Passwords and Security

It is  a known fact that passwords are never stored in plain text, especially so on Unix based systems. A hash of it is calculated and the resultant hash is stored in a shadow file so that the given passwords are hashed and compared against the stored ones.

But what if the shadow file is stolen? Can somebody reverse engineer the password by brute force approach? Of course, yes. Tools like John the Ripper are reasonably sophisticated to do the job. Those tools are assisted by frequently used password lists so that they can find the passwords easily. Moreover, hashing algorithms to generate a signature of the given content are designed to be very fast (no suspicious motives, though!), so it accelerates password cracking process because with very fast hashing algorithms, passwords can be cracked very fast too!!

Dropbox engineering came up with extra layers of security over and above the hash which is stored in the shadow database. In short it does the following with the password:

  • Calculate a SHA 512 hash of the password (to normalize the length of the password which is to be sent to the next layer)
  • The hash is then encrypted with bcrypt with a per-user salt and a cost of 10. Cost of 10 implies taking significant amount of time to encrypt the hash (100ms, apparently – which means only 10 encryptions per second – which is significantly costlier on modern day CPUs). (Apparently, the bcrypt command line tool on Linux just takes the password [salt] and not any cost). bcrypt uses blowfish algorithm for encryption while crypt uses DES for encryption.
  • The resulting encrypted/hashed (it is, technically, encrypted, not hashed) content is again subjected to AES 256-bit encryption with a global key and the result is stored in the password database.

With so many password database leaks these days, Dropbox engineering has done some serious job in protecting user’s passwords. Multiple layers of fuzzing and segregated store for password and keys – are the two important things to note here.

While I was reading this, I got curious to understand how passwords are stored in the shadow database. I was wondering how does someone find out what is the algorithm used (for example crypt or bcrypt) to encrypt the password? It turns out that piece of information is encoded with in the hashed string.

Apparently, originally to begin with the passwords are encoded with Modular Crypt Format where the password string is encoded as $identifier$content where identifier says what kind of encoding/hashing is used and the content is the actual content. The content is usually a reg-exp of the form [A-Za-Z0-9./].  Popular schemes used are (taken from the above link)

hash schemes

The above schemes are natively used by OS for authenticating users while logging in. Applications are known to use some of the other schemes (again from the above link):

app hashes

There is also one more encoding scheme or standard The PHC String Format. The above scheme does not give the flexibility to encode some parameters like salt and other stuff. That’s where the PHC string format comes in. It allows different param=value pairs. I haven’t seen PHC string format in vogue at least on the linux boxes, that I work on. Its mostly the modular crypt format.

The above link (PHC format) also explains in some detail as to how base64 encoding is done for content. Base64 is just a system where all the bytes are from 0-63 – which means they are in the printable ASCII range. You do the math do convert and reconvert them back to normal range.

To sum up, Dropbox has gone a step ahead in making the passwords more secure. While the OS way of authenticating is going to be the same for some time, how the application securely stores the passwords is a space which is open for innovation. 🙂