Today's Search Term: Stemming
stemming
Related Terms:
lemmatization, normalize
Search engines use stemming as a means to determine the root of a given written
word. Using a program or algorithm all of the affixes to a word (prefix and /or suffix in
the English language) are removed, leaving the root word. By implementing the rules of the given language obstacles such as
third- person singular present (as cries is of the verb cry) in the English language can be accurately indexed.
Stemmers become harder to design as the rules of the target language becomes more complex. For example, some languages have more verb and pronoun forms. Other languages do not always have clear word breaks between each word, and you can't do stemming until you've isolated the words!
Also normalization to increase search matching using de-pluralization - find the singular form of plural words step, and normalization of gender forms to a single form.
S-removal is one useful technique in de-pluralization. Where stemming might take accountant and accounts to the same root word, in the HR field this would be semantically incorrect causing false positive matches.
In some European languages, word endings change to determine a 'male' or 'female' job title.
===
Excellent points Jon... stemming properly is much more than just removing the trailing "S"! Makes you glad that really good search engines handle all that for you!
s/Miles
Posted by: Jon Lehto | August 31, 2010 at 08:02 AM