« Search Terms | Main | Today's Search Term: hybrid search »

August 30, 2010

Today's Search Term: Stemming


Related Terms:  lemmatization, normalize
Search engines use stemming as a means to
determine the root of a given written word. Using a program or algorithm all of the affixes to a word (prefix and /or suffix in the English language) are removed, leaving the root word. By implementing the rules of the given language obstacles such as third- person singular present (as cries is of the verb cry) in the English language can be accurately indexed.

Stemmers become harder to design as the rules of the target language becomes more complex. For example, some languages have more verb and pronoun forms. Other languages do not always have clear word breaks between each word, and you can't do stemming until you've isolated the words!



TrackBack URL for this entry:

Listed below are links to weblogs that reference Today's Search Term: Stemming:


Also normalization to increase search matching using de-pluralization - find the singular form of plural words step, and normalization of gender forms to a single form.

S-removal is one useful technique in de-pluralization. Where stemming might take accountant and accounts to the same root word, in the HR field this would be semantically incorrect causing false positive matches.

In some European languages, word endings change to determine a 'male' or 'female' job title.
Excellent points Jon... stemming properly is much more than just removing the trailing "S"! Makes you glad that really good search engines handle all that for you!

The comments to this entry are closed.