November 02, 2007

rcvdk like tool for Lucene ? Sort of...

For folks familiar with Autonomy / Verity K2 there is a command line / console based tool for searching collections called rcvdk (and rck2, for socket based vs. file system based)

If you're accessing a Lucene system via SSH or Telnet, you might like a similar tool.  There are at least 4 or 5 options (well... one of them is a workaround).  Disclaimer / TODO: I would be much more helpful if I would actually provide details/examples of any of these methods...

1: Use LucLi, which is a Lucene command line class.  It reads from standard in and writes to standard out.  So far I haven't found a good exmple of it yet.  Lots of hits on Google, but all of them non-narrative.

2: Use the Java "bean shell", which lets you interact with Java Beans, I found several copies of Andrzej Bialecki's post:

...you can use BeanShell - just put the bsh*.jar in lib/, and then do:

# bin/nutch bsh.Interpreter
BeanShell 2.0b4 - by Pat Niemeyer ...
bsh % import org.apache.lucene.index.*;
bsh % import org.apache.lucene.document.*;
bsh % ir = IndexReader.open("indexes/part-00001");
bsh % print(ir.numDocs());
bsh %

3: (the workaround) Use the graphic tool Luke via SSH tunneling of X-Windows.  This is where you redirect a TCP/IP port over an SSH login.  Luke is a popular graphical based Java utility for looking at Lucene indices; I believe it's implemented in Java Swing (so it therefore requires a local graphical context to display the UI)

4: Write a small Hello World style Java program.  I actually do this quite a bit, to get exactly what I want.

5: In theory, use Python as a front end, and use the interactive nature of Python's command prompt.  There are likely at least 4 ways this could be done: a) the old Lupy distribution, b & c) one of the two  PyLucene distributions, or d) via Jython (java based Python, which can call Java classes)

I hope to update this post if I get more details.  Feel free to ping us if you're reading this months from now and feeling stuck...

November 01, 2007

Nice slides comparing Lucene, Solr and Nutch


(flash based, see slide 32 for comparison / summary)

From the folks at danielnaber.de and mindquarry.com