Search Relevancy and Japanese text, CJK, interesting thread on SearchDev.org
A really nice discussion over on SearchDev.org about relevancy when searching Japanese text and other CJK languages. Touches on a lot of technical issues including tokenization, thesaurus, character set normalization, etc.
Folks chiming in about how a number of different search engines handle this including Autonomy IDOL, K2, Ultraseek and MarkLogic.
A tad hard to read with all the quoted text, but well worth a full skim, keep scrolling!