1 posts categorized "hadoop"

July 07, 2014

Data quality as critical to 'big data' as it is to search

For years, we've been preaching in the wilderness about the important of 'data quality' - the new name for 'garbage in, garbage out'. Maybe it gets so little respect because it was first cited on April Fools Day (back in 1963, according to Wikipedia). Bad content has caused enterprise search owners headaches for years - heck, one of our most popular posts, Sixty Guys named Sarah is really about a data quality problem.  

Last Monday, Fortune Magazine posted an article called Big data's dirty problem, telling us all that:

"Inaccuracies, misspellings, and obsolete information makes achieving the big data utopia a slog for businesses and researchers"

For those of us who have worked in search for a while, it comes as no surprise. (It's also a reason why you need great search along with a big data distro to succeed).

So many companies approach enterprise search with what I call a 'fire and forget' mentality. Google on the public web makes it look so easy - how hard can it be?

At so many companies, we've seen this vicious cycle: Pick the search platform that looks best, install it, ignore it, and repeat in two to four years. No - really, ask yourself: how long have you used your current enterprise search platform?

Now ask yourself "How often do we review top queries, top misspellings, zero hit query reports?"  In my experience, there is an inverse correlation between longevity of search platform and money invested in monitoring and maintaining the search platform. In search, big data, and life, you get what you pay for.

Looking to end the vicious circle of 'roll out, replace, repeat' with your enterprise OR big data apps? You might start with a data audit - we can help.

Miles Kehoe/July 6, 2014