« FAST Acquires Convera's Retrievalware | Main | "Sitemaps" are smaller step forward than we had hoped »

April 25, 2007

The Most Important Taxonomy for your Web Site

Taxonomies continue to be popular in companies, but I have to wonder if they are really that useful for the majority of organizations.  I can’t tell you how many times I‘ve had otherwise intelligent people tell me “We plan to implement a corporate enterprise search solution – as soon as we finish our taxonomy project”. When I hear this, I know search won’t be happening for at last two years or more, and in the meantime every visitor to the web site suffers. Usually I spend a few seconds feeling sorry for their users and/or employees, but then I realize that innovative companies with real work to do are moving ahead full speed with Enterprise Search 2.0 platforms and I feel better.

Taxonomies generally fall into on of two categories: subject-based taxonomies and content based taxonomies.

Subject or "Domain" taxonomies attempt to completely describe all of the terms in a field, as well as the relationship between the terms. Typically these relationships are hierarchical, and they are the kind of taxonomies we use to classify knowledge - the kind of taxonomies your biology teacher would talk about. You need a real subject matter expert to create useful subject based taxonomy. And whatever you do, don't hire two (or more) subject experts, because they will never agree on the taxonomy.

Content based taxonomies are organized using existing content. Organization charts, computer directory/folder structures, or social tagging content is typically a 'content based' taxonomy. These taxonomies are often built by humans - you do it yourself when you decide what folders to use on your computer. But these can also be done automatically with tools many search and content management vendors sell.

Whether you go with a subject or a content taxonomy for your company, hooking it into your enterprise search technology will be a trick. This is the dirty little secret of the search software business: There are few, if any, commercial engines that can really take advantage of a complex taxonomy. What do you do with it, after all? Do you tag every document with the full taxonomy of terms in the hierarchy for every term in the document? Do you think that somehow the search engine will automatically know what to do with the taxonomy, and look up and down the taxonomy tree to find related terms? Verity had a great concept when they invented Topics in the late 80s, but since then even they have lost some of the taxonomy emphasis.

We think there is a third kind of taxonomy that is even more important that the traditional subject and content taxonomies: we call it a Behavior-Based Taxonomy.

Really, the reason most companies want a taxonomy is to help people find content. You can probably keep several experts and a bunch of computes working for years to anticipate every possible term and every possible hierarchy that someone on your internet or intranet site may use. But we think the most important taxonomy on any web site is the list of search terms that people actually use when they search a site.

If your search engine can provide great results for the ‘top 100’ queries on your site, you have a lot of happy users. Why do you think search experts at trade shows have finally started talking about your search logs? You can't know what your behavior-based taxonomy (BBT) is unless you are monitoring your search activity at least quarterly. Verify that the ‘top 100’ queries are working fine - either with organic search results or with featured links (or best bets or result promotion, depending on your search vendor).

You keep your Behavior Based Taxonomy up-to-date, and your search users will be satisfied!



TrackBack URL for this entry:

Listed below are links to weblogs that reference The Most Important Taxonomy for your Web Site:



Well the point is that neither Autonomy or Fast or Recommind have a solution for Taxonomy. Also IBM does not have a solution (Omnifind). The only working solution on this planet is InfoCodex. InfoCodex comes with 3 Mio words and can actually do Cross-Language search, so if you search in English it will also find Spanish or Italian documents. Another nice feature is that you can find similar documents as well. There is no other software that can do that without 1 day of training. InfoCodex can do that.


The comments to this entry are closed.