« Enterprise Search Packaging Trends | Main | The "Sliced Raw Fish Shoes it Wishes" - the Google Green Onion thing! »

March 24, 2010

Indexing Text and HTML with Solr

Lucid Imagination has a tutorial on how to use Tika, Solr Cell, and curl to index HTML and plain text files. Indexing web pages with Solr, like magic

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c84cf53ef0120a9612346970b

Listed below are links to weblogs that reference Indexing Text and HTML with Solr:

Comments

Indexing text is good if Google can do it.
--
Well, I agree that, for public web sites, content that Google can index is good; content it cannot index is bad. But even for public-facing corporate websites, there is often content that Google cannot get to; but which the organization's own search engine absolutely has to find. Database content and content that requires multiple different levels of security are not generally available to the public Google engine; but it still must be found by the enterprise platform. Enterprise search is harder than public web site search. Thanks for leaving your comment!
/s/Miles

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.