Metadata is data or information that describes or enables something. Metadata is catalog descriptions, ratings, reviews, prices, maps, menus, schedules, tickets, timelines, news coverage, tracking numbers, receipts, charts, records, and other forms of describing things. Stock price listings and advertisements are metadata. Your health records describe you – they are metadata. Everything you see on Amazon.com or eBay is metadata. The only thing that isn’t metadata online is content – material that has direct value without describing something else. Books are usually content, stories are content, artwork is content. Many things (like display ads) are made of both content and metadata.
Most metadata is made for human consumption – made to be printed or displayed on a web page in a way that humans can read and understand it. Today, people manipulate a lot of metadata by hand, copying, forwarding, updating, reading, and interpreting it. In the future, metadata will be machine understandable and human readable, so systems can make sense of it and work with it. This will let our future systems do the mechanical work for us, and much of the simple reasoning and interpretation as well. Today, we often can’t pass our calendar from our desktop to our phone because we use two different systems that can’t talk to each other. In the semantic future, common formats will allow any software to pass event data back and forth, making it easy to weave your schedule into your everyday life tools without vendor lock-in.
There are three kinds of metadata:
Unstructured: Like most text on the web, metadata that’s unstructured is meant for humans to read and interpret.Unstructured data has no tags that say what the data means. Magazine articles are unstructured. Most blogs and web pages are unstructured – you would have to tell a program explicitly what each piece of information means. Most photos have unstructured data – a search engine can’t tell who or what the photo is of and where it was taken. That’s why search engines use keywords – surrounding text – to try to understand the meaning and context of information online.
Structured: Data that has structure has tags, and the software that encounters it knows what the tags mean. When you fill out a form, you’re adding data to fields, and the fields are like tags – they say what the data means. Here’s an example of a description of a Honda Accord that’s extremely structured. Yet those tags don’t mean anything to any software outside that one site, so we say it’s structured but it’s in a silo.
Semi-structured: Much of the data online is semi-structured to varying degrees. For example, most WikiPedia articles are very lightly structured, whereas most Amazon.com product pages are untagged, but you can figure out what the tags are by looking in context. Most real-estate listings are semi structured. In the semi-structured world of the web as we know it today, we need to do a lot of guessing and interpretation to make apples-to-apples comparisons. To do that, our programs for labels – nearby text that describes data, but not always reliably.
Metadata is at the heart of a shift from pushing information to pulling it – the first new way to use information in over 4,000 years.
See also: The Semantic Web Acid Test.
Today, many systems translate from one system to another. These systems may be semantic (unambiguous) but they still don’t enable the semantic web because there are too many different ways to do the same thing.
Here are some of the companies and groups helping build this future: