July 18, 2008

Microsoft Terms for SQL Server Search Components

I found a nice article about Microsoft Language Packs and MS SQL Server, including some info on Japanese and CJK handling, but another tidbit of info they had was how Microsoft refers to certain parts of their search engine:

  • What most vendors refer to as "indexing" MS refers to as "population" (into an index)
  • What most vendors call a "collection" Microsoft calls a "catalog" - we've seen other vendors use that term in the past.
  • And what most vendors call "tokenzation" or "tokenizers", Microsoft calls "word breakers", which is actually a bit more descriptive to a non programmer.

I actually wrote an article a few years ago comparing traditional relational databases to full-text search engines, which included a table of equivalent terms and concepts (near the end of the article).  If you're already familiar with databases, this will get you up to speed much faster!


