Order

3.6 Zettabytes – Information Overload = time for the Semantic Web

January 22, 2010

According to the How Much Information study of 2009:

In 2008, Americans consumed information for about 1.3 trillion hours, an average of almost 12 hours per day. Consumption totaled 3.6 zettabytes and 10,845 trillion words, corresponding to 100,500 words and 34 gigabytes for an average person on an average day. A zettabyte is 10 to the 21st power bytes, a million million gigabytes.

From the TopQuadrant whitepaper:

The amount of digitized information is growing at unprecedented rate. By 2007 the size of individual databases at many organizations reached up to hundreds and in some cases thousands of terabytes. For example, in 2004 AT&T had 11 exabytes (107 TB) of wireline, wireless and Internet data. This is an equivalent amount of data to that held by 1 million Libraries of Congress. On average, the size of transactional databases doubles every five years with core databases doubling every two years. Data reporting and analysis warehouses (OLAP stores) triple in size every three years. On the web, by 2007 there were 29.7 billion pages, roughly five pages for every man, woman, and child on the planet. In 2006 alone, the size of the information created or replicated worldwide was 161 exabytes.

We are consuming more and more information at an exponential rate. And, just as Moore’s Law has continued unabated for longer than most people thought it would, this firehose of information is going to get bigger and bigger for the foreseeable future. We’ll need to make sense of it. While I believe firmly that the semantic web is the way out of this trap, I don’t want to convince you of that. I think you’ll see that for yourself. What I want you to see is that our current approaches to producing and consuming information are fundamentally broken and are scaling badly.

Television is a great example. When I watch CNN, I’m seeing a loop. I often see the same story several times. Yet CNN has produced plenty of material I would like that they have in their files, but throwing everything through one big pipe isn’t going to get me what I want. People in broadcast television know that their days are numbered. What we need is broadpull – the ability of everyone to specify what he/she wants to see and then have it delivered to any screen or device that’s convenient. The good news: I won’t have to set up some special CNN preferences at the CNN web site. Instead, I’ll give CNN my personal ontology, and that will be my music/TV/film “director” for everything I consume. I won’t care as much about media brands; I’ll care about content. My personal ontology will mash up a hundred different broadcasters, so brand loyalty is going to go out the window. Another thing about TV – you know those sports scores you see running across the top of the screen during a football or baseball game? They are part of the video signal. That’s great in an analog world, but when we are watching TV on screens connected directly to the Internet, we’ll want to configure those data feeds ourselves. And we will.

Information Technology is scaling terribly. Most companies spend 80% of their IT budgets fixing software that’s already broken. That is not going to change, until we re-architect our data infrastructure. We can keep pouring millions of dollars into proprietary systems (SAP, Oracle, etc) and insure the vendor lock-in, or we can switch to a new, more open, cloud-based, semantic approach using standard data formats that break the grip of vendors and make the data more important than the applications. For a few industries, that day has already arrived. For your industry – it’s coming.

Next: A Guide to the Seesmic Look Launch Video

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>