« Why Wikia Search must Prevail | Main | Document Level Security on Google Search Appliance, good Doc link »

March 12, 2009

Search Relevancy and Japanese text, CJK, interesting thread on SearchDev.org

A really nice discussion over on SearchDev.org about relevancy when searching Japanese text and other CJK languages.  Touches on a lot of technical issues including tokenization, thesaurus, character set normalization, etc.

Folks chiming in about how a number of different search engines handle this including Autonomy IDOL, K2, Ultraseek and MarkLogic.

The actual thread:
http://tech.groups.yahoo.com/group/search_dev/messages/718?threaded=1&m=e&var=1&tidx=1

A tad hard to read with all the quoted text, but well worth a full skim, keep scrolling!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef011168efe750970c

Listed below are links to weblogs that reference Search Relevancy and Japanese text, CJK, interesting thread on SearchDev.org:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.