Traditionally, a taxonomy meant a hierarchical way to categorize all the items in a given world: books, products, species, concepts, etc. In the semantic web, a taxonomy is a kind of dictionary of terms and their precise definitions. When a dictionary is ordered logically, in a hierarchy, it’s called a taxonomy. It’s a shared resource everyone in an information ecosystem uses to sync the meaning of terms. An excellent example is the US GAAP Taxonomy, which you can browse using their online viewer. This taxonomy is incredibly deep and precise. If you want to learn what a taxonomy really is, spend some time using their viewer.
By definition, a taxonomy is hierarchical. That is, it has a root level and a number of sub levels and terms at the bottom, like a tree with branches, or like a nested list, similar to the structure of folders and documents in your computer filing system. The navigation for this web site is another example. The advantage of a taxonomy is that it is simple, easy to understand, easy to maintain, and easy to program into various systems that need to use it. The tree structure of a taxonomy necessarily involves a choice of priorities and categories that may not work for everyone but may work for many people. This is the power of taxonomies – they often solve 80% of a categorization problem without getting overcomplicated, and they can be made precise without a lot of debugging.
A taxonomy with no content (leaves of the tree) is called a categorization scheme. When it is ordered alphabetically, it’s called a vocabulary. When it is more flexible, it can also be called a subject backbone. There are subject backbones that are flat (WikiPedia), hierarchical (most web sites have one), and those that are more flexible, designed to work with ontologies.
There are now thousands of taxonomies online for describing everything from works of art to chemical compounds to diseases to proteins to legal terms and process descriptions. The taxonomies that will bring us into the semantic web are those people agree are universal resources, where everyone contributes to and uses the same taxonomy, using the same set of terms (this is called a controlled vocabulary). In contrast, most taxonomies today are internal. As an example, Amazon.com has a product taxonomy that helps display related products, but it’s not an industry resource.
A taxonomy is a very useful tool, mostly because people can easily build, improve, and extend them. But taxonomies have problems. They are not flexible. They can’t regroup around different kinds of problems. They can become difficult to extend to a large description space. The more elements you add, the more you find are connected in new ways, leading the traditional tree structure to break. There are ways to improve flexibility by combining taxonomies with vocabularies and various faceted database concepts, but that makes them more subsceptible to error.
The next level of complexity and flexibility is an ontology, which is much more powerful and much harder to build.
Internal taxonomies are used by one company or group. Many taxonomies are hard at work today helping companies manage catalogs, software processes, org charts, research topics, etc. But if every company or organization creates its own taxonomy, semantic dispersion increases. Today, many web sites use tag clouds rather than taxonomies to
Here are some of the companies and groups helping build this future: