I believe, there are essentially three kinds of problems, technical, engineering and research problems. Most the problems that we come across in our work lives are just technical problems. Rarely do we encounter some engineering problems and research problems are far away.

This is just my opinion.


Myths of Creativity

A Harvard Business School Professor says that creativity cannot be gotten by doing the following things:

  • By expecting that only some people can be creative and not others –  this is a myth not only in the heads of entrepreneurs, but also in the heads of individual employees – again in both cases where you include yourself in the creative group and out of it.
  • Money is a creativity motivator –  her research found that once people become conscious of tying with rewards, their psyches cannot fly free anymore.
  • Time pressure fuels creativity – may be true, but again so long as time pressure is not having a negative effect on the person. It can eat up the psyche. It is necessary to have the head free of things.
  • Fear forces breakthroughs –  Even to me, these things seem unintuitive. Creativity does not come this way.
  • Competition beats collaboration w.r.t creativity – again, stress – bad psyche, bad creativity.

After all, such alleged myths, the author also tells what are necessary pre-conditions for a creative mind:

  • Love the work you do i.e. have intrinsic motivation.
  • Deeply engage in the work without any distractions.
  • It depends on experience, talent, ability to think in different ways, capacity to push through uncreative dry spells.

How can you lose your data?

Robin Harris enumerates many ways in which you can lose your data. He is not talking only about disks because, disks are just one entity in the story.

This is a world, where complex problems are solved by building systems, where each sub system will do its job and interact with the other subsystem on a whole. The storage sub system is no exception. There are different components that make it up – controllers, disks, software that sits at all layers etc. Each of them can be faulty and can potentially can damage the data that is ultimately stored on the disk.

However, some components are more amenable to failure than others because of their tryst with physics. For example, disks can fail more frequently than controllers and caches as it has a lot of moving parts. Apart from the disk, the software that resides on the controllers and disks can be buggy and can lose data. Robin makes a good coverage of the same.

The point that is driven home is this: You cannot avoid losing all those hardware, but you can avoid losing your data !! – Take backups and save your skin.

I have been thinking about it for quite sometime now (been motivated or concerned after I read a couple of papers from CMU and Google which talks about disk failures). I don’t want to have disk as my back up media again. (I am still not convinced to have disk as the backup media). I am fine with USB drives (i.e. flash drives), CDs/DVDs for the only reason that they don’t have any moving parts and can potentially live longer!. It is my TBD. I will have to start backing up my data – before it is too late.

Design principles

I have been reading Practical Data Structures in C++ now a days. There the author suggests the following design principles while writing a program:

  • Maintain the viewpoint of a practitioner: By this the author intends to say that first get the program working, by whatever inelegant means, then try to find more elegant ways of doing the same thing, provided it is warranted.
  • There is no such thing as a complex solution: Keep the design as simple as possible. The simple solution is separated from you not by distance, but by time.
  • Don’t overgeneralize: Over generalizing can lead to a lot of special cases and thus may incur a lot of overhead.
  • Follow the 80-20 rule: Spend most of your effort on that 20% of code.
  • Design first, then optimize: He means that do the optimization first at the design level and then at the implementation level.
  • Try to get the most leverage out of any overhead you introduce.
  • You shouldn’t have to pay for something that you are not going to use.

The last two principles need some thought.

1 TB disk

is finally here. Tech Report has tested the new Hitachi 7K1000 hard drive. Some quick facts are:

  • 32 MB cache
  • 200GB per platter that gives a data transfer rate of 300MB/sec
  • UER of 1.0E15 bits or 1.0E14 bytes transferred, which means that we will encounter an error after a disk is read for about 100 times.

There may be many techniques for getting around with the UER problem. For example, we may, for scrubbing, use only oppurtunistic scanning so that we don’t end up scanning the surface separately.

Jim Gray (now missing, lost into the seas) predicted that the pipe that comes out of the hard disk is not getting fat, while the storage tank itself is getting fatter. With the recording density increasing, one can see that the throughput is also increasing. 300MB/sec is app. 10 times better than that of a normal disk, which means that an entire disk can be read in about a couple of hours!!

What are the new problems that may arise with a 1 TB disk? With the data capacity increasing, any complete disk scan operation (like disk scrubbing, backup, indexing etc.) will increase the chances of facing an unrecoverable error. So, these disk sweep operations have to be kept at minimum or have to be batched.

Another problem is that failure of a disk results in the loss of an entire TB of data. No point in going for a RAID-5 drive, if you cannot afford more than two disks. Even if you, Robbin Harris has already pointed out that RAID-5 won’t work as the number of disks in the RAID-5 array increases. So, it mostly have to be a mirrored data.

Geeki’s take:

I have learned that there are two kinds of people in this world: those who crib/complain and those who do not.