« Some interesting NLP Projects, Semantics, Disambiguation, etc. | Main | Got OGP? A Social Media Lesson for the Enterprise »

September 05, 2012

The "Gotcha's" of Disk and Memory Issues with Search

Here are some performance problems that can be caused by shared disk resources or RAM memory.

SAN / NAS Disk Latency

Note: Both SAN and NAS are types of shared disk drives that can be used by multiple machines.

Many NIE customers are using SAN storage for search indices and sometimes for content. Although this is becoming a more common practice, there is one issue to be particularly vigilant for. Search uses storage somewhat differently than other applications such as multimedia storage; search makes many round trips to disk, so the latency of all these serial transactions can stack up, and latency can be more of an issue than raw bandwidth.

Symptoms of Serial Latency Issues:

  • Other applications on similar machines, or using the same storage, not reporting performance problems, but search indexing or retrieval is slow.
  • Large performance differences between the same search application running in different environments, such as dev vs. staging.
  • Sudden changes in search performance when only “minor” changes were made to systems or networking. Other problems with network storage are thankfully seen much less often in modern systems:
  • Filehandle limitations, or different limits between local and network filehandles.
  • Exact order of transactions not maintained
  • File locking and cascading system failures. Note that some search engines may still have file locking limitations with simultaneous transactions, regardless of where the storage is located.

Memory and Virtual Machines

Search engines can be heavy users of RAM. If servers are hosting multiple applications, or many virtual servers which are each running RAM intensive apps, performance can suffer. Most operational teams are aware of this problem and try to avoid it. But there are ways this can sneak up on even the best of teams:
  • Performance degrades slowly over time
  • Performance degrades sporadically is therefore harder to analyze
  • Search might simply the first application that is noticeably impacted by resource constraints
  • Performance is greatly impacted by other applications, but at irregular intervals, the “bump in the night”. Organizations are often unaware of their full application loading picture over the course of an entire week or month.
  • Memory and performance can be affected by the activities of other virtual servers running on the same host. If too many virtual servers are consuming lots of memory, the physical host may need to start swapping to disk or otherwise constrain or delay memory access.
  • Performance varies between different environments on seemingly similar systems. The similar systems may actually have very different sets of applications or run schedules. However, such differences can also be related to SAN or NAS storage issues.
  • Parts of the Linux OS may allocate unused memory for other purposes, so operators become accustomed to seeing low available memory. At later times it’s then harder to spot true memory shortages vs. the “normal” low memory cause.
  • Some part of the OS or application stack is accidently using a 32 bit subsystem instead of 64 bit, perhaps as the result of a recent software update.


TrackBack URL for this entry:

Listed below are links to weblogs that reference The "Gotcha's" of Disk and Memory Issues with Search:


Another virtual machines related memory issue worth noting is hypervisor swapping. Hypervisor swapping can make the virtual machine slow when serving the first response after some time idle.

Many enterprise search systems are primary answering queries from users in normal office hours. Crawling is done at night to not affect performance of other system when the users are actively using them.
Great information Runar thanks! (For your SharePoint users, remember too that you should 'warm up' your search result pages from time to time to keep them cached and ready to go quickly.

At the night when the craw is running the host machine detects that the virtual machines is using more memory and start to swap out the memory that is not actively being used to disk. When the first users arrive at the office and want to run a query, all the memory for the search possess has been swapped out, and must first be read inn from the disk, making the first response very slow.

Note that this is not the same as normal swapping done by the operating system. The operating systems swapping are more intelligent. Most hi-end search system also has ways of telling the operating system not to swap out the most important part of the search process memory. Unfortunately thus methods won't prevent the hypervisor from swapping.

At list in VMware there is an option called “memory reservation” that can be used to prevent this from happening. At Searchdaimon we have found that one normally should reserve at list half of the virtual machine memory to ease hypervisor swapping issues.

The comments to this entry are closed.