Homo Logicus

I am Homo Logicus. I am not a Homo Sapien. I get satisfied once my quest for understanding things is quenched. Success is not something that I look for. I had this in my mind, but until I found this terminology, I was not clear I was one.

How do I become a Homo Sapien?


Taking Notes and Tweets

I very vaguely remember a statment made by David Allen in his Getting Things Done book that we being a part of advanced civilization, cannot afford to be so unproductive when we can actually be productive.
I have long been thinking about keeping a log of the work related activities, with the only goal of coming back to them in future for reference purposes.
When I first heard about a twitter like application for corporate, I thought it was a very great idea. I like the idea of people tweeting about different work related things as and when they pop up in their minds. Whether they discover something – a bug, a hidden feature, the way something is done, get a question that was never asked before, get a perspective of things etc. Whatever that comes to their minds get tweeted. Interested people can follow them and comment to their thoughts. What’s more, it is searchable, so we can know if something has occured in the past or not just with a click. If not for, “whats happening right now”, the search would still let us explore through this loosely coupled knowledge base of employees. I am very excited about such a system!!
The point is this – taking notes during work hours is definitely going to be useful in the long term. We never know when something is useful. One guy, Bill Shaw, has this point to make:

I very vaguely remember a statment made by David Allen in his Getting Things Done book that we being a part of advanced civilization, cannot afford to be so unproductive when we can actually be productive.

I have long been thinking about keeping a log of the work related activities, with the only goal of coming back to them in future for reference purposes.

When I first heard about a twitter like application for the corporate, I thought it was a very great idea. I like the idea of people tweeting about different work related things as and when they pop up in their minds. Whether they discover something – a bug, a hidden feature, the way something is done, get a question that was never asked before, get a perspective of things etc. Whatever that comes to their minds get tweeted. Interested people can follow them and comment on their thoughts. What’s more, it is searchable, so we can know if something has occured in the past or not just with a click. If not for, “whats happening right now”, the search would still let us explore through this loosely coupled knowledge base of employees. I am very excited about such a system!!

The point is this – taking notes during work hours is definitely going to be useful in the long term. We never know when something is useful. One guy, Bill Shaw, has this point to make:

Now again, if you’re like me, you might be saying, “This is a pain.” I admit it, it is a pain, but a small one compared to the pain it can alleviate later. The payoff comes when you see an InvalidNamespaceException two months (or two years) later. I absolutely, positively, 100% guarantee this will happen to you (not a guarantee). There will be a nagging feeling in your brain that you’ve seen this before. “Of course I’ve seen this before! What did I do?” If you’ve taken notes, you won’t have to rack your brain. Simply search for “InvalidNamespaceException” and the answer will pop up.

This reminds me of Agent Smith’s dialogue from The Matrix Revolutions. It goes like – “Wait a minute, I have seen this before… I am standing here in front of you and I am supposed to say something.” :-).

It may take a while to get the entire corporate convinced about the advantages of tweeting, till then I have decided to use EverNote note taking tool. I was using it for sometime, but stopped using it. I dont remember why. Now, I have a new found purpose for it.

Bill Shaw suggests taking backup:
One more thing. Be sure to back up your note files! There’s only one thing that’s worse than keeping notes, and that’s learning to rely on keeping notes, and then losing them all. I have used notes that are seven years old to solve a problem. They will quickly become one of the most valuable tools in your toolbox. No, they will be the most valuable tool in your toolbox. The most valuable knowledge of your career doesn’t come from books, webpages, or conferences, but from the contents of your previous experience. Take notes!
How true!!

The Nature of Data

Jeff Jonas makes some interesting points, with good examples, on the nature of data:

According to Jonas, organizations need to be asking questions constantly if they want to get smarter. If you don’t query your data and test your previous assumptions with each new piece of data that you get, then you’re not getting smarter.

An example:

Jonas related an example of a financial scam at a bank. An outside perpetrator is arrested, but investigators suspect he may have been working with somebody inside the bank. Six months later, one of the employees changes their home address in payroll system to the same address as in the case. How would they know that occurred, Jonas asked. “They wouldn’t know. There’s not a company out there that would have known, unless they’re playing the game of data finds data and the relevance finds the user.”


Constantly asking questions and evaluating new pieces of data can help an organization overcome what Jonas calls enterprise amnesia. “The smartest your organization can be is the net sum of its perceptions,” Jonas told COMMON attendees.

A metaphor:

Getting smarter by asking questions with every new piece of data is the same as putting a picture puzzle together, Jonas said. This is something that Jonas calls persistent context. “You find one piece that’s simply blades of grass, but this is the piece that connects the windmill scene to the alligator scene,” he says. “Without this one piece that you asked about, you’d have no way of knowing these two scenes are connected.”

And here is where he explains the need for continuous query of data:

Sometimes, new pieces reverse earlier assertions. “The moment you process a new transaction (a new puzzle piece) it has the chance of changing the shape of the puzzle, and right before you go to the next piece, you ask yourself, ‘Did I learn something that matters?'” he asks. “The smartest your organization is going to be is considering the importance right when the data is being stitched together.”

Another concrete example:

Another project (not related to the government, but a commercial effort) had Jonas assisting an organization in compiling a database that correlated the identities of Americans with pieces of data from public records (such as property records, DMV records, phone books, etc). He knew there were about 300 million people in the U.S. But as Jonas started loading the data into his warehouse, the machine soon counted more than 300 million Americans. “We keep loading it, and pretty soon it says there are 600 million people in America–and if the number kept climbing to three billion, it surely would be a piece of junk. But my theory was it would collapse,” he said.

He was right. Consider what happens when there are two records describing two different people as they appear to share the same name. “What happens is a third record shows up in the future that works like glue, which causes them to collapse,” he said. Eventually, “the more data we loaded, the fewer number of people there were.”


“What’s happening is data volumes are growing at this pace, yet an organization’s ability to make sense of them isn’t keeping up,” Jonas said. “Today, say you can make sense of 7 percent of what’s available, and in a few years it might be 4 percent, and in a few years after that it might be one percent. So the percentage of what’s knowable is on the decline.”

So while the sum of our knowledge is increasing, the ratio of what’s knowable to the data that’s available is getting smaller. Without some new technology to help “stitch things together,” as Jonas puts it, we’ll soon be wallowing in gobs of structured and unstructured data, with no discernable path out.

Does web 2.0’s collective intelligence help in any way to provide some annotation to the huge amounts of data lying latent? Another proposal to let the user know when a question is answerable:

Jonas sees this type of technology–loading queries into a database as data–helping to overcome the counter-terrorism intelligence analyst’s dilemma of knowing when a question can be answered. “This is a nice and easy method that enables a future piece of data to find the question,” he said in a follow-up e-mail after this story was first published. “In other words, if the question asked by the user has no answer today…if a piece of data that can answer the question arrives tomorrow, the system can alert the user that their question is now true.”

Man!! If I think about it, how true this is. We all encounter situations daily in our lives where we have to understand something. We read, we ask and we think. Yet, we may not fully understand. The reason is because we may not have all the data that we need to understand the current situation. So, the question lingers in our mind and one fine day we encounter data which answers our question. For those who are really involved with the question, it automatically pops up in our minds and disappears with the answer.


Some Beautiful Nuggets From Web 2.0 Summit White Paper

Beautiful paper, kind of serves as a motivational introduction to the whole world of web 2.0. I strongly insist, one should read this. The actual white paper can be found here:

On what we have learnt:

In our first program, we asked why some companies survived the dotcom bust, while others had failed so miserably. We also studied a burgeoning group of startups and asked why they were growing so quickly. The answers helped us understand the rules of business on this new platform.

Chief among our insights was that “the network as platform” means far more than just offering old applications via the network (“software as a service”); it means building applications that literally get better the more people use them, harnessing network effects not only to acquire users, but also to learn from them and build on their contributions.

On how web 2.0 is really collective intelligence (the ability to take feedback and repond better):

Consider search – currently the lingua franca of the Web. The first search engines, starting with Brian Pinkerton’s webcrawler, put everything in their mouth, so to speak. They hungrily followed links, consuming everything they found. Ranking was by brute force keyword matching.

In 1998, Larry Page and Sergey Brin had a breakthrough, realizing that links were not merely a way of finding new content, but of ranking it and connecting it to a more sophisticated natural language grammar. In essence, every link became a vote, and votes from knowledgeable people (as measured by the number and quality of people who in turn vote for them) count more than others.

Modern search engines now use complex algorithms and hundreds of different ranking criteria to produce their results. Among the data sources is the feedback loop generated by the frequency of search terms, the number of user clicks on search results, and our own personal search and browsing history. For example, if a majority of users start clicking on the fifth item on a particular search results page more often than the first, Google’s algorithms take this as a signal that the fifth result may well be better than the first, and eventually adjust the results accordingly.

Once we read the above, the network as a platform idea makes sense to us. We take the network as the basic entity which has all that we want and use our devices (whatever) to get what we want as services from the network.

To get a glimpse of how much more smarter it has become:

Now consider an even more current search application, the Google Mobile Application for the iPhone. The application detects the movement of the phone to your ear, and automatically goes into speech recognition mode. It uses its microphone to listen to your voice, and decodes what you are saying by referencing not only its speech recognition database and algorithms, but also the correlation to the most frequent search terms in its search database. The phone uses GPS or cell-tower triangulation to detect its location, and uses that information as well. A search for “pizza” returns the result you most likely want: the name, location, and contact information for the three nearest pizza restaurants.

On how things are yet haphazardous:

It’s easy to forget that only 15 years ago, email was as fragmented as social networking is today, with hundreds of incompatible email systems joined by fragile and congested gateways. One of those systems – internet RFC 822 email – became the gold standard for interchange.

We expect to see similar standardization in key internet utilities and subsystems. Vendors who are competing with a winner-takes-all mindset would be advised to join together to enable systems built from the best-of-breed data subsystems of cooperating companies.

On how the learning component adds even more value to the web 2.0, how learning happens and an example of its application:

Speech recognition and computer vision are both excellent examples of this kind of machine learning. But it’s important to realize that machine learning techniques apply to far more than just sensor data. For example, Google’s ad auction is a learning system, in which optimal ad placement and pricing is generated in real time by machine learning algorithms.

In other cases, meaning is “taught” to the computer. That is, the application is given a mapping between one structured data set and another. For example, the association between street addresses and GPS coordinates is taught rather than learned. Both data sets are structured, but need a gateway to connect them.

It’s also possible to give structure to what appears to be unstructured data by teaching an application how to recognize the connection between the two. For example, You R Here, an iPhone app, neatly combines these two approaches. You use your iPhone camera to take a photo of a map that contains details not found on generic mapping applications such as Google maps – say a trailhead map in a park, or another hiking map. Use the phone’s GPS to set your current location on the map. Walk a distance away, and set a second po

On why this learning component is necessary:

Some of the most fundamental and useful services on the Web have been constructed in this way, by recognizing and then teaching the overlooked regularity of what at first appears to be unstructured data.

Ti Kan, Steve Scherf, and Graham Toal, the creators of CDDB, realized that the sequence of track lengths on a CD formed a unique signature that could be correlated with artist, album, and song names. Larry Page and Sergey Brin realized that a link is a vote. Marc Hedlund at Wesabe realized that every credit card swipe is also a vote, that there is hidden meaning in repeated visits to the same merchant. Mark Zuckerberg at Facebook realized that friend relationships online actually constitute a generalized social graph. They thus turn what at first appeared to be unstructured into structured data. And all of them used both machines and humans to do it.

It looks like we are very near to making a Terminator robot: 🙂

The Wikitude travel guide application for Android takes image recognition even further. Point the phone’s camera at a monument or other point of interest, and the application looks up what it sees in its online database (answering the question “what looks like that somewhere around here?”) The screen shows you what the camera sees, so it’s like a window but with a heads-up display of additional information about what you’re looking at. It’s the first taste of an “augmented reality” future. It superimposes distances to points of interest, using the compass to keep track of where you’re looking. You can sweep the phone around and scan the area for nearby interesting things.

A word on Information Shadows:

All of these breakthroughs are reflections of the fact noted by Mike Kuniavsky of ThingM, that real world objects have “information shadows” in cyberspace. For instance, a book has information shadows on Amazon, on Google Book Search, on Goodreads, Shelfari, and LibraryThing, on eBay and on BookMooch, on Twitter, and in a thousand blogs.

A song has information shadows on iTunes, on Amazon, on Rhapsody, on MySpace, or Facebook. A person has information shadows in a host of emails, instant messages, phone calls, tweets, blog postings, photographs, videos, and government documents. A product on the supermarket shelf, a car on a dealer’s lot, a pallet of newly mined boron sitting on a loading dock, a storefront on a small town’s main street — all have information shadows now.

As the information shadows become thicker, more substantial, the need for explicit metadata diminishes. Our cameras, our microphones, are becoming the eyes and ears of the Web, our motion sensors, proximity sensors its proprioception, GPS its sense of location. Indeed, the baby is growing up. We are meeting the Internet, and it is us.

On the role of massive data in learning:

There’s a fascinating fact noted by Jeff Jonas in his work on identity resolution. Jonas’ work included building a database of known US persons from various sources. His database grew to about 630 million “identities” before the system had enough information to identify all the variations. But at a certain point, his database began to learn, and then to shrink. Each new load of data made the database smaller, not bigger. 630 million plus 30 million became 600 million, as the subtle calculus of recognition by “context accumulation” worked its magic.

Sensors and monitoring programs are not acting alone, but in concert with their human partners. We teach our photo program to recognize faces that matter to us, we share news that we care about, we add tags to our tweets so that they can be grouped more easily. In adding value for ourselves, we are adding value to the social web as well. Our devices extend us, and we extend them.

Its not just web2.0:

But as is so often the case, the future isn’t clearest in the pronouncements of big companies but in the clever optimizations of early adopters and “alpha geeks.” Radar blogger Nat Torkington tells the story of a taxi driver he met in Wellington, NZ, who kept logs of six weeks of pickups (GPS, weather, passenger, and three other variables), fed them into his computer, and did some analysis to figure out where he should be at any given point in the day to maximize his take. As a result, he’s making a very nice living with much less work than other taxi drivers. Instrumenting the world pays off.

Consider the so-called “smart electrical grid.” Gavin Starks, the founder of AMEE, a neutral web-services back-end for energy-related sensor data, noted that researchers combing the smart meter data from 1.2 million homes in the UK have already discovered that each device in the home has a unique energy signature. It is possible to determine not only the wattage being drawn by the device, but the make and model of each major appliance within – think CDDB for appliances and consumer electronics!

On real time events:

Real-time search encourages real-time response. Retweeted “information cascades” spread breaking news across Twitter in moments, making it the earliest source for many people to learn about what’s just happened. And again, this is just the beginning. With services like Twitter and Facebook’s status updates, a new data source has been added to the Web – realtime indications of what is on our collective mind.

Its not just the web that is learning:

Even without sensor-driven purchasing, real-time information is having a huge impact on business. When your customers are declaring their intent all over the Web (and on Twitter) – either through their actions or their words, companies must both listen and join the conversation. Comcast has changed its customer service approach using Twitter; other companies are following suit.

Some applications:

But in his advice on the direction of the Government 2.0 Summit Federal CTO Aneesh Chopra has urged us not to focus on the successes of Web 2.0 in government, but rather on the unsolved problems. How can the technology community help with such problems as tracking the progress of the economic stimulus package in creating new jobs? How can it speed our progress towards energy independence and a reduction in CO2emissions? How can it help us remake our education system to produce a more competitive workforce? How can it help us reduce the ballooning costs of healthcare?

Twitter is being used to report news of disasters, and to coordinate emergency response. Initiatives like Instedd(Innovative Support to Emergencies, Diseases, and Disasters) take this trend and amp it up. Instedd uses collective intelligence techniques to mine sources like SMS messages (e.g., Geochat), RSS feeds, email lists (e.g., ProMed, Veratect, HealthMap, Biocaster, EpiSpider), OpenROSA, Map Sync, Epi Info™, documents, web pages, electronic medical records (e.g., OpenMRS), animal disease data (e.g., OIE, AVRI hotline), environmental feed, (e.g., NASA remote sensing, etc.) for signals of emerging diseases

Companies like 23andMe and PatientsLikeMe are applying crowdsourcing to build databases of use to the personalized medicine community. 23andMe provides genetic testing for personal use, but their long term goal is to provide a database of genetic information that members could voluntarily provide to researchers. PatientsLikeMe has created a social network for people with various life-changing diseases; by sharing details of treatment – what’s working and what’s not – they are in effect providing a basis for the world’s largest longitudinal medical outcome testing service. What other creative applications of Web 2.0 technology are you seeing to advance the state of the art in healthcare?

How do we create economic opportunities in reducing the cost of healthcare? As Stanford’s Abraham Verghesewrites, the reason it’s so hard to cut healthcare costs is that “a dollar spent on medical care is a dollar of income for someone.” We can’t just cut costs. We need to find ways to make money by cutting costs. In this regard, we’re intrigued by startups like CVsim, a cardio-vascular simulation company. Increasingly accurate data from CAT scans, coupled with blood flow simulation software running on a cloud platform, makes it conceivable to improve health outcomes and reduce costs while shrinking a multi-billion dollar market for angiography, an expensive and risky medical procedure. If CVsim succeeds in this goal, they’ll build a huge company while shrinking the nation’s healthcare bill. What other similar opportunities are there for technology to replace older, less effective medical procedures with newer ones that are potentially more effective while costing less?


VoIP vulnerabilities

Forbes has an interview with Philip Zimmermann, the founder of ZRTP the VoIP encryption software available for free. In it,  Zimmermann mentions why there is a need to encrypt VoIP more than with traditional telephony:

The traditional public telephone system that we’ve been using for the last hundred years is fairly well protected. It’s easy for the government to wiretap it by going to the phone company, but not easy for anyone else to wiretap it. If anyone else wanted to wiretap someone’s conversations, they’d have to find a place close to his or her office, get some alligator clips, and try to find the right wire out of thousands to clip them onto, and hope that nobody spots you doing it.

With traditional telephony, our threat model was mostly government wiretapping. With VoIP, anyone can wiretap us: the Russian mafia, foreign governments, hackers, disgruntled former employees. Anyone.

Historically, there’s been an asymmetry between government wiretapping and everyone else wiretapping that’s been in the government’s favor. As we migrate to VoIP, that differential collapses. The government itself is just as vulnerable. Wiretappers can reveal details of ongoing investigations, names and personal details of informants, conversations between officials and their wives about what time they pick up their kids at school.

Everyone thinks that VoIP is the future of telephony. It’s cheaper, more versatile, more feature-rich. So technological pressure herds us towards VoIP; we’ll have to encrypt it. Wiretapping will become so easy that the criminals–not just governments–will be able to do it routinely. There will be insider trading, blackmail, organized crime spying on judges and prosecutors, key witnesses killed before they can testify.

On his ZRTP and Zfone:

ZRTP is a protocol that defines how VoIP phones talk to each other in an encrypted way. Zfone is a program that we’ve developed for end users that employs ZTRP. They both use strong cryptographic algorithms to negotiate cryptographic keys between two parties without the participation of any phone company… They’re automatically created at the start of the call, and destroyed at the end. Only the two parties know the keys, and the phone company isn’t in a position where it can give the keys to a third party.

On why law enforcement agencies can still be doing their job:

From the point of view of law enforcement, traffic analysis can be quite useful. But for a criminal trying to get information for insider training, he’s only interested in the content. So encryption actually hits criminals harder than it hits law enforcement agencies.


Making Code Slower

Ever seen a situation where the code has to be slowed down for a purpose? Here it is from Microsoft: 🙂

Even Office 2007, Microsoft’s most recent version, was nearly saddled with a serious encryption bug. One part of every encryption system is the portion that handles passwords. The relevant code in Office 2007 was simply too well-written; because it worked so quickly, a crypto-attacker could rapidly progress through all possible password variations. Someone caught the error before the product was shipped and rewrote the software to make it slower–50,000 times slower, as it happened. It may be one of the few times in software history that programmers deliberately made a program run slower rather than faster.


Homomorphic Encryption

Even though I came across homomorphic function while studying discrete mathemtical structures in undergrad, I only had a vague idea what application it had. An IBM fellow has found an encryption mechanism that is homomorphic. That is, if the encryption function is f(), then it is homomorphic on operation + then

f(a+b) = f(a) + f(b)

This means that if a is encrypted and b is also encrypted (with function f), then to add them, there is no need to decrypt a and b, to add them. You can operate on the encrypted versions themselves. Why? Because f is homomorphic.

This has profound implications on its applications. If everything is seen as operation on data, then one need not really know what the data is, but can still operate on the data by working on their encrypted versions.

When more of the computing is moving to cloud, data privacy needs to be assured for (see, for example, how Facebook tried to take ownership, but failed). Homomorphic encryption is a technical solution to the problem, as opposed to policy based solution (licenses, MoUs etc.)

This is just theory so far. Not all operations can be made homomorphic for example. Research has to be done along this dimension to find ways to make all operations homomorphic. This seems to be going to take long time as the article mentions that it will take at least a decade to get it done. 10 years is too much a time in this modern world to bet on a technology for long time. Who knows what ‘s in store for future?

Moreover, the functions may not be purely homomorphic. After some operations, the original data may be corrupted (who knows whether is function is purely homomorphic until it is proven?). The solution proposed by the inventor is to double encrypt the data and periodically re-encrypt the inner encryption layer by decrypting it.

What about the key management problems?