3 posts categorized "Apache Spark"

August 09, 2018

Fake Search?

Enterprise search was once easy. It was often bad - but understanding the results was pretty easy. If the query term(s) were in the document, it was there in the results. Period. The more times the terms appeared, the higher the result appeared in the result list. And when the user typed a multi-term query, documents with all of the terms displayed higher in the result list than those with only some of the terms.

And some search platforms could 'explain' why a particular document was ranked where it was. Those of us who have been in the business a while may remember the Verity Topic and Verity K2 product lines. One of the most advanced capabilities in these was the 'explain' function. It would reverse engineer the score for an individual document and report the critical 'why' the selected document was ranked as it was.

"People Like You"

Now, search results are generally better, but every now and then, you’ll see a result that surprises you, especially on sites that are enhanced with machine learning technologies. A friend of mine tells of a query she did on Google while she was looking for summer clothes, but the top result was a pair of shoes. She related her surprise: "I asked for summer clothes, and Google shows me SHOES?".  But, she admitted, "Those shoes ARE pretty nice!"

How did that happen? Somewhere, deep in the data Google maintains on her search history, it concluded that "people like her" purchased that pair of shoes.

In the enterprise, we don't have the volume of content and query activity of the large Internet players, but we do tend to have more focused content and a narrower query vocabulary. ML/AI tools like MLLib, part of both Mahout and Spark, can help our search platforms generate such odd yet often relevant results; but these technologies are still limited when it comes to explaining the 'why' for a given result. And those of us who still exhibit skepticism when it comes to computers, that capability would be nice.

Are you using or planning to implement) ML-in-search? A skeptic? Which camp you're in? Let me hear from you! [email protected].

June 22, 2017

First Impressions on the new Forrester Wave

The new Forrester Wave™: Cognitive Search And Knowledge Discovery Solutions is out, and once again I think Forrester, along with Gartner and others, miss the mark on the real enterprise search market. 

In the belief that sharing my quick first impression will at least start a conversation going until I can write up a more complete analysis, I am going to share these first thoughts.

First, I am not wild about the new buzzterms 'cognitive search' and "insight engines". Yes, enterprise search can be intelligent, but it's not cognitive. which Webster defines as "of, relating to, or involving conscious mental activities (such as thinking, understanding, learning, and remembering)". HAL 9000 was cognitive software; "Did you mean" and "You might also like" are not cognition.  And enterprise search has always provided insights into content, so why the new 'insight engines'? 

Moving on, I agree with Forrester that Attivio, Coveo and Sinequa are among the leaders. Honestly, I wish Coveo was fully multi-platform, but they do have an outstanding cloud offering that in my mind addresses much of the issue.

However, unlike Forrester, I believe Lucidworks Fusion belongs right up there with the leaders. Fusion starts with a strong open source Solr-based core; an integrated administrative UI; a great search UI builder (with the recent acquisition of Twigkit); and multiple-platform support. (Yep, I worked there a few years ago, but well before the current product was created).

I count IDOL in with the 'Old Guard' along with Endeca, Vivisimo (‘Watson’) and perhaps others - former leaders still available, but offered by non-search companies, or removed from traditional enterprise search (Watson). And it will be interesting to see if Idol and its new parent, Microfocus, survive the recent shotgun wedding. 

Tier 2, great search but not quite “full” enterprise search, includes Elastic (which I believe is in the enviable position as *the* platform for IoT), Mark Logic, and perhaps one or two more.

And there are several newer or perhaps less-well known search offerings like Algolia, Funnelback, Swiftype, Yippy and more. Don’t hold their size and/or youth against them; they’re quite good products.

No, I’d say the Forrester report is limited, and honestly a bit out of touch with the real enterprise search market. I know, I know; How do I really feel? Stay tuned, I've got more to say coming soon. What do you think? Leave a comment below!

January 25, 2017

Lucidworks 3 Released!

Today Lucidworks announced the release Fusion 3, packed with some very powerful capabilities that, in many ways, sets a new standard in functionality and usability for enterprise search.

Fusion is tightly integrated Solr 6, the newest version of the popular, powerful and well-respected open source search platform. But the capabilities that really set Fusion 3 apart are the tools provided by Lucidworks on top of Solr to reduce the time-to-productivity.

It all starts at installation, which features a guided setup to allow staff, who may be not be familiar with enterprise search, to get started quickly and to built quality, full-featured search applications.

Earlier versions of Fusion provided very powerful ‘pipelines’ that allowed users to define a series of custom steps or 'stages' during both indexing and searching. These pipelines allowed users to add custom capabilities, but they generally required some programming and a deep understanding of search.

That knowledge still helps, but Fusion 3 comes with what Lucidworks calls the “Index Workbench” and the “Query Workbench”. These two GUI-driven applications let mere mortals set up capabilities that used to require a developer, and enables developers to create powerful pipelines in much less time.

What can a pipeline do? Let's look at two cases.

On a recent project, our client had a deep, well developed taxonomy, and they wanted to tag each document with the appropriate taxonomy terms. In the Fusion 2.x Index Pipeline, we wrote code to evaluate each document to determine relevant taxonomy terms; and then to insert the appropriate taxonomy terms into the actual document. This meant that at query time, no special effort was required to use the taxonomy terms in the query: they were part of the document.

Another common index time task is to identify and extract key terms, perhaps names and account numbers, to be used as facets.

The Index Workbench in Fusion 3 provides a powerful front-end to these capabilities that have long been part of Fusion; but which are now much easier for mere mortals to use.

The Query Workbench is similar, except that it operates at query time, making it easy to do what we’ve long called “query tuning”. Consider this: not every term a user enters for search is of equal important. The Query Workbench lets a non-programmer tweak relevance using a point-and-click interface. In previous visions of Fusion, and in most search platforms, a developer needed to write code to do the same task.

Another capability in Fusion 3 addresses a problem everyone who has ever installed a search technology has faced: how to insure that the production environment exactly mirrors the dev and QA servers. Doing so was a very detailed and tedious task; and any differences between QA and production could break something.

Fusion 3 has what Lucidworks calls Object Import/Export. This unique capability provides a way to export collection configurations, dashboards, and even pipeline stages and aggregations from a test or QA system; and reliably import those objects to a new production server. This makes it much easier to clone test systems; and more importantly, move search from Dev to QA and into production with high confidence that production exactly matches the test environment.

Fusion 3 also extends the Graphical Administrative User Interface to manage pretty much everything your operations department will need to do with Fusion. Admin UIs are not new; but the Fusion 3 tool sets a new high bar in functionality.

There is one other capability in Fusion 3 enabled by a relatively new capability in Solr: SQL.

I know what you’re thinking: “Why do I want SQL in a full-text application?”

Shift your focus to the other end.

Have you ever wanted to generate a report that shows information about inventory or other content in the search index? Let’s say on your business team needs inventory and product reports on content in your search-driven eCommerce data. The business team has tools they know and love for creating their own reports; but those tools operate on SQL databases.

This kind of reporting has always been tough in search, and typically required some customer programming to create the reports. With the SQL querying capabilities in Solr 6, and security provided by Fusion 3, you may simply need to point your business team at the search index, verify their credentials, and connect via OBDC/JDBC, and their existing tools will work.

What Else?

Fusion 3 is an upgrade from earlier versions, so it includes Spark, an Apache took with built-in modules for streaming, SQL, machine learning and graph processing. It works fine on Solr Cloud, which enables massive indices and query load; noit to mentin failover in the even of hardware problems. 

I expect that Fusion 3 documentation, and the ability to download and evaluate the product, will be on the Lucidworks site today at www.lucidworks.com. “Try it, you’ll like it”.

While we here at New Idea Engineering, a Lucidworks partner, can help you evaluate and implement Fusion 3, I’d also point out that our friends at MC+A, also Lucidworks partners, are hosting a webinar Thursday, January 26th. The link this link to register and attend the webinar: http://bit.ly/2joopQK.

 

Lucidworks CTO Grant Ingersol will be hosting a webinar on Friday, February 1st. Read about it here.

 

/s/ Miles