Virtualization and Search: Performance Tests Summary
We've been researching Virtualization and Search, and have recently presented on the topic at a couple shows. We wanted to compare the performance penalty running a spider on virtual machine instead of a physical machine. This is a summary of our findings.
What we expected to find in terms of a performance penalty:
- Estimated 3 to 20% penalty
- Leaning towards 20% given the heavy disk IO
Actual results:
- We found an approximate 10% average penalty.
- There was a pretty wide margin, various tests measured between 0% to 17%, but always under the 20% we had estimated. (actually less than zero percent in some cases, but we labeled those as outliers)
- Overall the performance was better than expected, and certainly reasonable for many applications.
The test and environment:
- HP mid tier workstation, dual core, AMD Athlon 64 X2 4400+
- 8 Gigs memory, local SATA disk (non RAID)
- Microsoft Windows Server 2008 64-bit for both host and client, using MS Hyper-V.
- Sun's 64-bit JVM v6 set to 1 Gig max (which it did not fully respect)
- Nutch 0.9 stock distribution
- Dataset was the Enron public emails, approx half million emails in 1.5 Gigs of source data, served by IIS on separate local machine
- Email files were mapped to text filter
- Data was fetched and indexed into Lucene by Nutch
- Clock time ranged from 31 to 35 minutes for all tests, with the physical tests (non-virtual) giving the widest measured deviation
Drop us an email if you'd like the full PDF when published.
Comments