« Why Wikia Search must Prevail | Main | Document Level Security on Google Search Appliance, good Doc link »

March 12, 2009

Search Relevancy and Japanese text, CJK, interesting thread on SearchDev.org

A really nice discussion over on SearchDev.org about relevancy when searching Japanese text and other CJK languages.  Touches on a lot of technical issues including tokenization, thesaurus, character set normalization, etc.

Folks chiming in about how a number of different search engines handle this including Autonomy IDOL, K2, Ultraseek and MarkLogic.

The actual thread:
http://tech.groups.yahoo.com/group/search_dev/messages/718?threaded=1&m=e&var=1&tidx=1

A tad hard to read with all the quoted text, but well worth a full skim, keep scrolling!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef011168efe750970c

Listed below are links to weblogs that reference Search Relevancy and Japanese text, CJK, interesting thread on SearchDev.org:

Comments

The comments to this entry are closed.