« Nice slides comparing Lucene, Solr and Nutch | Main | Microsoft is acquiring FAST Search for $1.2 billion »

November 02, 2007

rcvdk like tool for Lucene ? Sort of...

For folks familiar with Autonomy / Verity K2 there is a command line / console based tool for searching collections called rcvdk (and rck2, for socket based vs. file system based)

If you're accessing a Lucene system via SSH or Telnet, you might like a similar tool.  There are at least 4 or 5 options (well... one of them is a workaround).  Disclaimer / TODO: I would be much more helpful if I would actually provide details/examples of any of these methods...

1: Use LucLi, which is a Lucene command line class.  It reads from standard in and writes to standard out.  So far I haven't found a good exmple of it yet.  Lots of hits on Google, but all of them non-narrative.

2: Use the Java "bean shell", which lets you interact with Java Beans, I found several copies of Andrzej Bialecki's post:

...you can use BeanShell - just put the bsh*.jar in lib/, and then do:

# bin/nutch bsh.Interpreter
BeanShell 2.0b4 - by Pat Niemeyer ...
bsh % import org.apache.lucene.index.*;
bsh % import org.apache.lucene.document.*;
bsh % ir = IndexReader.open("indexes/part-00001");
bsh % print(ir.numDocs());
1524567
bsh %

3: (the workaround) Use the graphic tool Luke via SSH tunneling of X-Windows.  This is where you redirect a TCP/IP port over an SSH login.  Luke is a popular graphical based Java utility for looking at Lucene indices; I believe it's implemented in Java Swing (so it therefore requires a local graphical context to display the UI)

4: Write a small Hello World style Java program.  I actually do this quite a bit, to get exactly what I want.

5: In theory, use Python as a front end, and use the interactive nature of Python's command prompt.  There are likely at least 4 ways this could be done: a) the old Lupy distribution, b & c) one of the two  PyLucene distributions, or d) via Jython (java based Python, which can call Java classes)

I hope to update this post if I get more details.  Feel free to ping us if you're reading this months from now and feeling stuck...

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef00e54f783f038833

Listed below are links to weblogs that reference rcvdk like tool for Lucene ? Sort of...:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.