Indexing Text and HTML with Solr
Lucid Imagination has a tutorial on how to use Tika, Solr Cell, and
curl to index HTML and plain text files. Indexing web
pages with Solr, like magic
« Enterprise Search Packaging Trends | Main | The "Sliced Raw Fish Shoes it Wishes" - the Google Green Onion thing! »
TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c84cf53ef0120a9612346970b
Listed below are links to weblogs that reference Indexing Text and HTML with Solr:
This is only a preview. Your comment has not yet been posted.
As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.
Having trouble reading this image? View an alternate.
Indexing text is good if Google can do it.
--
Well, I agree that, for public web sites, content that Google can index is good; content it cannot index is bad. But even for public-facing corporate websites, there is often content that Google cannot get to; but which the organization's own search engine absolutely has to find. Database content and content that requires multiple different levels of security are not generally available to the public Google engine; but it still must be found by the enterprise platform. Enterprise search is harder than public web site search. Thanks for leaving your comment!
/s/Miles
Posted by: Website Templates | May 26, 2011 at 03:07 AM