The Nature of Data

Jeff Jonas makes some interesting points, with good examples, on the nature of data:

According to Jonas, organizations need to be asking questions constantly if they want to get smarter. If you don’t query your data and test your previous assumptions with each new piece of data that you get, then you’re not getting smarter.

An example:

Jonas related an example of a financial scam at a bank. An outside perpetrator is arrested, but investigators suspect he may have been working with somebody inside the bank. Six months later, one of the employees changes their home address in payroll system to the same address as in the case. How would they know that occurred, Jonas asked. “They wouldn’t know. There’s not a company out there that would have known, unless they’re playing the game of data finds data and the relevance finds the user.”


Constantly asking questions and evaluating new pieces of data can help an organization overcome what Jonas calls enterprise amnesia. “The smartest your organization can be is the net sum of its perceptions,” Jonas told COMMON attendees.

A metaphor:

Getting smarter by asking questions with every new piece of data is the same as putting a picture puzzle together, Jonas said. This is something that Jonas calls persistent context. “You find one piece that’s simply blades of grass, but this is the piece that connects the windmill scene to the alligator scene,” he says. “Without this one piece that you asked about, you’d have no way of knowing these two scenes are connected.”

And here is where he explains the need for continuous query of data:

Sometimes, new pieces reverse earlier assertions. “The moment you process a new transaction (a new puzzle piece) it has the chance of changing the shape of the puzzle, and right before you go to the next piece, you ask yourself, ‘Did I learn something that matters?'” he asks. “The smartest your organization is going to be is considering the importance right when the data is being stitched together.”

Another concrete example:

Another project (not related to the government, but a commercial effort) had Jonas assisting an organization in compiling a database that correlated the identities of Americans with pieces of data from public records (such as property records, DMV records, phone books, etc). He knew there were about 300 million people in the U.S. But as Jonas started loading the data into his warehouse, the machine soon counted more than 300 million Americans. “We keep loading it, and pretty soon it says there are 600 million people in America–and if the number kept climbing to three billion, it surely would be a piece of junk. But my theory was it would collapse,” he said.

He was right. Consider what happens when there are two records describing two different people as they appear to share the same name. “What happens is a third record shows up in the future that works like glue, which causes them to collapse,” he said. Eventually, “the more data we loaded, the fewer number of people there were.”


“What’s happening is data volumes are growing at this pace, yet an organization’s ability to make sense of them isn’t keeping up,” Jonas said. “Today, say you can make sense of 7 percent of what’s available, and in a few years it might be 4 percent, and in a few years after that it might be one percent. So the percentage of what’s knowable is on the decline.”

So while the sum of our knowledge is increasing, the ratio of what’s knowable to the data that’s available is getting smaller. Without some new technology to help “stitch things together,” as Jonas puts it, we’ll soon be wallowing in gobs of structured and unstructured data, with no discernable path out.

Does web 2.0’s collective intelligence help in any way to provide some annotation to the huge amounts of data lying latent? Another proposal to let the user know when a question is answerable:

Jonas sees this type of technology–loading queries into a database as data–helping to overcome the counter-terrorism intelligence analyst’s dilemma of knowing when a question can be answered. “This is a nice and easy method that enables a future piece of data to find the question,” he said in a follow-up e-mail after this story was first published. “In other words, if the question asked by the user has no answer today…if a piece of data that can answer the question arrives tomorrow, the system can alert the user that their question is now true.”

Man!! If I think about it, how true this is. We all encounter situations daily in our lives where we have to understand something. We read, we ask and we think. Yet, we may not fully understand. The reason is because we may not have all the data that we need to understand the current situation. So, the question lingers in our mind and one fine day we encounter data which answers our question. For those who are really involved with the question, it automatically pops up in our minds and disappears with the answer.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s