« October 2007 | Main | January 2008 »

2 posts from November 2007

November 02, 2007

rcvdk like tool for Lucene ? Sort of...

For folks familiar with Autonomy / Verity K2 there is a command line / console based tool for searching collections called rcvdk (and rck2, for socket based vs. file system based)

If you're accessing a Lucene system via SSH or Telnet, you might like a similar tool.  There are at least 4 or 5 options (well... one of them is a workaround).  Disclaimer / TODO: I would be much more helpful if I would actually provide details/examples of any of these methods...

1: Use LucLi, which is a Lucene command line class.  It reads from standard in and writes to standard out.  So far I haven't found a good exmple of it yet.  Lots of hits on Google, but all of them non-narrative.

2: Use the Java "bean shell", which lets you interact with Java Beans, I found several copies of Andrzej Bialecki's post:

...you can use BeanShell - just put the bsh*.jar in lib/, and then do:

# bin/nutch bsh.Interpreter
BeanShell 2.0b4 - by Pat Niemeyer ...
bsh % import org.apache.lucene.index.*;
bsh % import org.apache.lucene.document.*;
bsh % ir = IndexReader.open("indexes/part-00001");
bsh % print(ir.numDocs());
1524567
bsh %

3: (the workaround) Use the graphic tool Luke via SSH tunneling of X-Windows.  This is where you redirect a TCP/IP port over an SSH login.  Luke is a popular graphical based Java utility for looking at Lucene indices; I believe it's implemented in Java Swing (so it therefore requires a local graphical context to display the UI)

4: Write a small Hello World style Java program.  I actually do this quite a bit, to get exactly what I want.

5: In theory, use Python as a front end, and use the interactive nature of Python's command prompt.  There are likely at least 4 ways this could be done: a) the old Lupy distribution, b & c) one of the two  PyLucene distributions, or d) via Jython (java based Python, which can call Java classes)

I hope to update this post if I get more details.  Feel free to ping us if you're reading this months from now and feeling stuck...

November 01, 2007

Nice slides comparing Lucene, Solr and Nutch

http://www.slideshare.net/dnaber/apache-lucene-searching-the-web-and-everything-else-jazoon07/

(flash based, see slide 32 for comparison / summary)

From the folks at danielnaber.de and mindquarry.com