« Enterprise Search group on LinkedIn | Main | Updated List of Free, Open Source and Low Cost Search Engines »

January 12, 2009

Virtualization and Search: Performance Tests Summary

We've been researching Virtualization and Search, and have recently presented on the topic at a couple shows.  We wanted to compare the performance penalty running a spider on virtual machine instead of a physical machine.  This is a summary of our findings. 

What we expected to find in terms of a performance penalty:

  • Estimated 3 to 20% penalty
  • Leaning towards 20% given the heavy disk IO

Actual results:

  • We found an approximate 10% average penalty.
  • There was a pretty wide margin, various tests measured between 0% to 17%, but always under the 20% we had estimated.  (actually less than zero percent in some cases, but we labeled those as outliers)
  • Overall the performance was better than expected, and certainly reasonable for many applications.

The test and environment:

  • HP mid tier workstation, dual core, AMD Athlon 64 X2 4400+
  • 8 Gigs memory, local SATA disk (non RAID)
  • Microsoft Windows Server 2008 64-bit for both host and client, using MS Hyper-V.
  • Sun's 64-bit JVM v6 set to 1 Gig max (which it did not fully respect)
  • Nutch 0.9 stock distribution
  • Dataset was the Enron public emails, approx half million emails in 1.5 Gigs of source data, served by IIS on separate local machine
  • Email files were mapped to text filter
  • Data was fetched and indexed into Lucene by Nutch
  • Clock time ranged from 31 to 35 minutes for all tests, with the physical tests (non-virtual) giving the widest measured deviation

Drop us an email if you'd like the full PDF when published.

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c84cf53ef010536c993d3970c

Listed below are links to weblogs that reference Virtualization and Search: Performance Tests Summary:

Comments

The comments to this entry are closed.