« Enterprise Search group on LinkedIn | Main | Updated List of Free, Open Source and Low Cost Search Engines »

January 12, 2009

Virtualization and Search: Performance Tests Summary

We've been researching Virtualization and Search, and have recently presented on the topic at a couple shows.  We wanted to compare the performance penalty running a spider on virtual machine instead of a physical machine.  This is a summary of our findings. 

What we expected to find in terms of a performance penalty:

  • Estimated 3 to 20% penalty
  • Leaning towards 20% given the heavy disk IO

Actual results:

  • We found an approximate 10% average penalty.
  • There was a pretty wide margin, various tests measured between 0% to 17%, but always under the 20% we had estimated.  (actually less than zero percent in some cases, but we labeled those as outliers)
  • Overall the performance was better than expected, and certainly reasonable for many applications.

The test and environment:

  • HP mid tier workstation, dual core, AMD Athlon 64 X2 4400+
  • 8 Gigs memory, local SATA disk (non RAID)
  • Microsoft Windows Server 2008 64-bit for both host and client, using MS Hyper-V.
  • Sun's 64-bit JVM v6 set to 1 Gig max (which it did not fully respect)
  • Nutch 0.9 stock distribution
  • Dataset was the Enron public emails, approx half million emails in 1.5 Gigs of source data, served by IIS on separate local machine
  • Email files were mapped to text filter
  • Data was fetched and indexed into Lucene by Nutch
  • Clock time ranged from 31 to 35 minutes for all tests, with the physical tests (non-virtual) giving the widest measured deviation

Drop us an email if you'd like the full PDF when published.


TrackBack URL for this entry:

Listed below are links to weblogs that reference Virtualization and Search: Performance Tests Summary:


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.