26 posts categorized "Search Analytics"

January 06, 2020

It's a new year: Time for better metadata!

The new year is a time when most of us resolve to make changes in our personal lives: losing weight, exercising more, spending more time with a spouse and/or the kids. We start the year with great energy to meet our goals, but sadly many of us fall short through the year.

This often happens in the enterprise as well. Improving internal search is a common resolution at the time of the year. For eCommerce sites, January generally means fewer site visitors once the holiday rush is done; so making changes won’t have a great impact on sales. For corporations, it’s a time of new budgets and great expectations: and more than a few of the clients I’ve we’ve worked with over the years tell me how poorly their internal search performs compared to the public search sites like Google, Bing, and DuckDuckGo. Why do these search platforms work so well? And why can’t your site search match their success? It’s a numbers game. By definition, public search platforms index millions of sites; and many of these contain similar if not identical content. This makes is easy to find what you’re looking for because thousands of sites have relevant results for just about any query you may try.

Intranet sites are different, Usually, there is only one page with the information you are looking for. But often, content authors, who have read about how to promote consent on Google, will add keywords using Microsoft Word’s “Properties” field in an effort to promote their documents. This attempt to ‘game’ the internal search platform generally interferes with the platform’s relevance functions and results in poor result relevance. Even the Document Properties the Microsoft Word provides can interfere with search effectiveness.

Years ago, we were working with a client who was interested in knowing which employees were contributing to the intranet content. When the data was processed, it turned out that an Administrative Assistant in Marketing had authored more documents than anyone else in the corporation. After a quick review, we discovered why this one person was apparently more prolific than any other employee. That person had created all of the template forms used throughout the company, so the Word Document Properties listed that employee’s name as the author of virtually every standard template throughout the company.

So in the spirit of the new year, I’d suggest that you spend a day or two performing a data audit to discover where your content – or lack thereof – is negatively impacting your enterprise search results. And if you find any doozies – I’d love to hear about it!

 

 

December 10, 2019

A Working Vacation

The month of January is associated with the Roman god Janus who, with two heads, could look forward and back. That said, I find December a quiet time that provides the opportunity to review the current year and to plan the coming new year. As I tweeted yesterday at @miles_kehoe, this is the most stressful time of the year for most sites focused on eCommerce. Changes are generally 'off-limits' - even an hour offline can put a dent in sales.

But for those responsible for corporate internal and public-facing sites, this is the time to review content, identify potential changes, and even new content. And if planned well, the holidays are often a great time to update intranet sites: from late November through the new year, activity tends to slow for more corporate sites. Both IT and content staff should be using this quiet time to make changes, from updates to current content - the new vacation schedule is just one the comes to mind - to minor restructuring. (Note: while the holidays are a great time to roll out major changes, these should have been in planning months ago: it's a holiday, not a sabbatical!)

For the search team, this is time to review search activity: top queries, zero hits, misspellings, and synonyms come to mind as a minimum effort. It's also a good time to identify popular content, as well as content that was either never part of any search result or was included in result lists but never viewed.


So - December is nearly half over: take advantage of what is normally a quiet time for intranets and make that site better!

Happy Holidays!

 

November 14, 2018

Do you have big data? Or just lot of it?

Of course, I’m a search nerd. I've been involved in enterprise search for over 20 years. I see search and big data as related technologies, but in most cases, I do not see them as synonymous.

And I'd also say that, while most enterprises have a lot of data, the term ‘big data’ is not applicable to most organizations.

Consider that Google (and others) define ‘big data’ as “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions”.

Yes, the data that Amazon, Google, Facebook, and others collect qualifies as big data. These companies mine everything you do when you're using their sites. Amazon wants to be able to report “people like you bought …” to sell more product; Google wants to know what ‘people like you’ look at after a query so they can suggest it to the next person like you; Facebook.. well, they want to know what to try to sell you as you chat with and about your friends. Is search involved? Maybe; but more often some strong machine learning and internal analytics are key.

Do consulting firms like Ernst & Young or PWC have big data? Well, my bet is they have alot of information about their clients, business practices, accounting, etc.. but is it ‘big data’? Probably not.

Solr, Elastic and other search technologies can search-enable huge sets of data, so often big data is indexed to be searchable by humans. And both Solr and Elastic come with some great analytical tools.. Kibana on Elastic, and Banana, the port of Kibana for Solr based engines.

But again, is that big data Or just lots of it?

I’d vote the latter.

 

August 09, 2018

Fake Search?

Enterprise search was once easy. It was often bad - but understanding the results was pretty easy. If the query term(s) were in the document, it was there in the results. Period. The more times the terms appeared, the higher the result appeared in the result list. And when the user typed a multi-term query, documents with all of the terms displayed higher in the result list than those with only some of the terms.

And some search platforms could 'explain' why a particular document was ranked where it was. Those of us who have been in the business a while may remember the Verity Topic and Verity K2 product lines. One of the most advanced capabilities in these was the 'explain' function. It would reverse engineer the score for an individual document and report the critical 'why' the selected document was ranked as it was.

"People Like You"

Now, search results are generally better, but every now and then, you’ll see a result that surprises you, especially on sites that are enhanced with machine learning technologies. A friend of mine tells of a query she did on Google while she was looking for summer clothes, but the top result was a pair of shoes. She related her surprise: "I asked for summer clothes, and Google shows me SHOES?".  But, she admitted, "Those shoes ARE pretty nice!"

How did that happen? Somewhere, deep in the data Google maintains on her search history, it concluded that "people like her" purchased that pair of shoes.

In the enterprise, we don't have the volume of content and query activity of the large Internet players, but we do tend to have more focused content and a narrower query vocabulary. ML/AI tools like MLLib, part of both Mahout and Spark, can help our search platforms generate such odd yet often relevant results; but these technologies are still limited when it comes to explaining the 'why' for a given result. And those of us who still exhibit skepticism when it comes to computers, that capability would be nice.

Are you using or planning to implement) ML-in-search? A skeptic? Which camp you're in? Let me hear from you! [email protected].

May 31, 2016

The Findwise Enterprise Search and Findability Survey 2016 is open for business

Would you find it helpful to benchmark your Enterprise Search operations against hundreds of corporations, organizations and government agencies worldwide? Before you answer, would you find that information useful enough that you’re spend a few minutes answering a survey about your enterprise search practices? It seems like a pretty good deal to me to have real-world data from people just like yourself worldwide.

This survey, the results of which are useful, insightful, and actionable for search managers everywhere, provides the insight into many of the critical areas of search.

Findwise, the Swedish company with offices there and in Denmark, Norway Poland, Norway and London, is gathering data now for the 2016 version of their annual Enterprise Search and Findability Survey at http://bit.ly/1sY9qiE.

What sorts of things will you learn?

Past surveys give insight into the difference between companies will happy search users versus those whose employees prefer to avoid using internal search. One particularly interesting finding last year was that there are three levels of ‘search maturity’, identifiable by how search is implemented across content.

The least mature search organizations, roughly 25% of respondents, have search for specific repositories (siloes), but they generally treat search as ‘fire and forget’, and once installed, there is no ongoing oversight.

More mature search organizations that represent about 60% of respondents, have one search for all silos; but maintaining and improving search technology has very little staff attention.

The remaining 15% of organizations answering the survey invest in search technology and staff, and continuously attempt to improve search and findability. These organizations often have multiple search instances tailored for specific users and repositories.

One of my favorite findings a few years back was that a majority of enterprises have “one or less” full time staff responsible for search; and yet a similar majority of employees reported that search just didn’t work. The good news? Subsequent surveys have shown that staffing search with as few as 2 FTEs improves overall search satisfactions; and 3 FTEs seem to strongly improve overall satisfaction. And even more good news: Over the years, the trend in enterprise search shows that more and more organizations are taking search and findability seriously.

You can participate in the 2016 Findwise Enterprise Search and Findability Survey in just 10 or 15 minutes and you’ll be among the first to know what this year brings. Again, you’ll find the 2016 survey at http://bit.ly/1sY9qiE.

September 18, 2014

Lucidworks ships Fusion 1.0 - Pretty exciting next gen platform.

OK, I've known about this coming for a while, just didn't know when until this afternoon - so I stayed up late to get the download started after midnight.

Fusion is more than an updated release of Lucidworks Search. It is Solr based, but it's a re-write from top to bottom. And it's not a bare bones search API only a developer can love. Connectors? Check. Security? Check. Analytics? Check. Entity extraction? Check. All included. 

But what it adds is where the real capabilities and contributions are. Machine learning? Check. Admin console? Check. Machine learning? Check. Log analytics? Check. A document pre-processing pipeline? Check. Deep signal processing (think 'automated context processing')? Check. 

Even if you think these new unique capabilities are not your style, then you can buy Solr support and still get licenses for connectors, entity extraction, and a handful of other formerly 'premium' products. Want it all? License the full product at a per-node price I always thought was underpriced. I'm sure you'll be hearing alot more in the coming days and weeks, but go - download - try - and see what it does for your sites. Your developers will love it, your business owners will love it, your users will love it, and I bet even your CFO will love it.  

Full disclosure: I am a former employee of Lucidworks; but I'd be just as excited even if I were not. Go download it for sure and try it on your content. But be sure to check out the  'search as killer app' video on Lucid's home page www.lucidworks.com

s/ Miles

 

 

September 09, 2014

Sometimes you're just wrong! (Maybe).

OK, this one falls into the 'eat your own words' category, so I have to come clean. Well, partly clean. Let me explain.

I was out of town last week, but just before I left I wrote an article asserting that Elasticsearch really isn't 'enterprise' search. The article drew alot of attention and comments from both sides of the argument. I have to say I still think that's the case, but an announcement by Microsoft seems to differ, and end up a net positive for Elasticsearch. Microsoft tells us that Elasticsearch is the platform under the covers of Microsoft's Azure search offering. It looks like you have a couple of options - as long as you're on Azure:

a) You can download and use the open source Elasticsearch platform available on GitHub; or

b) Use Microsoft's managed service 'Facetflow Elasticsearch' which incorporates (some of) the open source code in various places.

Microsoft calls this "a fully-managed real-time search and analytics service" while, according to ZDNet, it is for 'web and mobile application developers looking to incorporate full-text search into their applications'. 

Either way, it's certainly yet another step forward for Elasticsearch, and is a big step forward in visibility for the company. It's not clear what kind of revenue they will receive from the deal - Microsoft being relatively famous for being quite frugal. And after all, smart search folks like Kevin Green of Spantree Technology Group talk about its strengths and liabilities, saying it *is* fast ('wicked fast'); fault-tolerant; distributed and more. But it is not a crawler; a machine learner; a user-facing front end, and it is not secure. 

So I'll agree a partial 'mea culpa' is in order; adding capabilities to an open source project can make it more enterprise ready. But I think the jury may still be out on the rest of my piece. Stay tuned!

September 11, 2012

Are you Tracking MRR? - "Mean Reciprocal Rank" Trend Monitoring

MRR is a simple numerical technique to monitor the overall relevancy performance of search engines over time. It is based on click-throughs in the search results, where a click on the top document is scored as 100%, a click on the second document is 50%, 3rd document is 33%, etc. These numbers are collected and averaged over units of time.

The absolute value of MRR is not necessarily the important statistic because each site has different content, different classes of users, and different search technology. However, the trend of MRR over time can allow a site to spot changes quickly. It can also be used to score "A/B" testing.

There are certainly more elaborate algorithms that can be used, and MRR doesn’t account for whether a user liked the document once they opened it. But having even possibly imperfect performance data that can be trended over time is better than having nothing.

Reference: http://en.wikipedia.org/wiki/Mean_reciprocal_rank

Walter Underwood (of Ultraseek, Netflix, MarkLogic fame) gave a presentation (in PPT/PowerPoint) of this topic a couple years ago about NetFlix's use of MRR.

January 11, 2012

Webinar: What users want from enterprise search in 2012

If you ask the average enterprise user what he or she wants from their internal search platform, chances are good that they will tell you they want search 'just like Google'. After all, people are born with the ability to use Google; why should they need to learn how to use their internal search?

The problem is that web search works so well because, at the sheer scale of the internet, search can take advantage of methodologies that are not directly applicable to the intranet. Yet many of the things that make the public web experience so good can, in fact, be adapted in the enterprise. Our opinion is that, beyond a base level, the success of any enterprise search platform depends on how it is implemented and managed rather than on the core technology.

In this webinar we'll talk about what users want, and how you can address the specific challenges of enterprise content and still deliver a satisfying and successful enterprise search experience inside the firewall.

Register today for our first webinar of the new year scheduled for January 25 : What enterprise users want from search in 2012.

 

 

 

 

 

 

December 12, 2011

New Phrase for determining Sentiment Analysis / Customer Interest

If you lookup:

fedex "Package not due for delivery"

which is one of the status messages you can get when tracking a package, you'll see a lot of postings asking about it.

FYI: It means your new toy has arrived in the city you live in, but will NOT be delivered today, because they didn't promise to get it to you until tomorrow.  Whether this is to force customers into paying for express service, or simply a logistics issue, or a mix of the two, depends on your view of companies and I won't get into that here.

However, you'll notice a lot of the postings asking about it are from folks waiting for delivery of things they're very excited to get, often some big-ticket peice of shiny electronics.  They're dying for Fedex to deliver it - they're so anxious and upset about the delay that they motivated enough to go online and search, and make ranting posts - all because their "toy" is delayed.

So we have particular emotional response, often about an upscale product, with a reasonably distinct search phrase - cool!

Yes, yes, of course you could say that the customers are mad about the percieved injustice of it, the Occupy Wall Street spin, or that sometimes the package could be really important for other reasons, which are certainly valid points.  I'm not taking sides or passing judgement - and I found discovered this today looking for a friend's overdue toy - that's not the point.  I'm just saying that I bet there's a good statistical correlation, and of course it wouldn't apply 100% of the time - which would actually be quite rare in such things.