9 posts categorized "Federated Search"

April 25, 2012

Vivisimo: Another one bites the dust

Earlier today, IBM announced that it was acquiring Vivisimo for an undisclosed sum. Now the tough question: what’s it all about? For the answer, let's take a quick trip to the early years of the decade.

Vivisimo was founded in 2000 out of Carnegie Mellon University. The first time we saw them, in 2004, they were marketing 'Clusty', a web clustering product that could examine huge numbers of web pages and then associate - or cluster - documents on specific terms. They also had some really strong federation capabilities built in. And the product was highly scalable. In fact, Vivisimo had great success in a number of huge government sites including the US Social Security site, FirstGov, the Defense Intelligence Agency, and commercial sites such as Ely Lilly. One thing all of these sites have in common? Lots of data. We have a term for that now: 'big data'.

IBM has made huge investments in open source search over the last 10 years, specifically yin Lucene/Solr. Hadoop is the Apache answer for big data, and trust me; Hadoop is a hot topic this year.

What does Vivisimo bring IBM? Well... for one thing,  clustering algorithms (and probably patents); a reputation for being able to handle huge data sets; and federation.

What should Vivisimo customers do now? Well, based on IBM's strong customer ethic, I think the answer is "don't panic" = do nothing for now'. Assuming Velocity is working for you, this acquisition should cause you no concern.

If you are evaluating Vivisimo, that's a bit more difficult. Some acquisitions like Verity's acquisition by Autonomy resulted in a wholesale replacement of the platform. Some customers made the switch early on and were happy; others fought to make IDOL work like K2, even with the 'compatibility mode; and never succeeded. You'll also remember that Microsoft, after  acquiring FAST Search, dropped the entire non-Windows platforms a year later which impacted upwards of 70% of the FAST  installed base.

If you are willing to acquire a platform for a couple of years and see what happens, go for it. You may look back and discover you made the right choice. On the other hand, former President Reagan had a saying: Trust, but verify". You might take a look around to see what platform is right for you now and into the future.

 

August 09, 2011

So how many machines does *your* vendor suggest for 100,000,000+ document dataset?

We've been chatting with folks lately about really large data sets.  Clients who have a problem, and vendors who claim they can help.

But a basic question keeps coming up - not licensing - but "how many machines will we need?"  And not everybody can put their data on a public cloud, and private clouds can't always spit out a dozen virtual machines to play with, plus duplicates of that for dev and staging, so not quite as trivial as some folks thing.

The Tier-1 vendors can handle hundreds of millions of dcs, sure, but usually on quite a few machines, plus of course their premium licensing, and some non trivial setup at that point.

And as much as we love Lucene, Solr, Nutch and Hadoop, our tests show you need a fair number of machines if you're going to turn around a half billion docs in less than a week.

And beyond indexing time, once you start doing 3 or 4 facet filters, you also hit another performance knee.

We've got 4 Tier-2 vendors on our "short list" that might be able to reduce machine counts by a factor of 10 or more over the Tier-1 and open source guys.  But we'd love to hear your experiences.

January 31, 2011

Great new tool for Pharmaceutical researchers

Topic_Explorer Our partners over at Raritan Technologies Inc. have recently released a great tool they developed using the  Lexalytics, Inc. Salence toolkit. The product, Topic Explorer, provides a way for users to dig through content and explore concepts from Raritan's extensive knowledgebase of medical terminology, augmented by the text analytics capabilities provided by Lexalytics. Many of you will remember Lexalytics as the company that provided great sentiment analysis in the original FAST ESP product prior to the acquisition by Microsoft.

Raritan co-founder Ted Sullivan gives a great video demo of the product you should see.

What's really great about Topic Explorer is that it isn't limited to just pharma. With the right taxonomy, it can be a great research tool for just about any vertical - risk management, eDiscovery, patent research, and more.

Topic Explorer is a search technology neutral product, so it will work with your current solution whether you're using Lucene/Solr or a popular commercial technolgy. Contact Raritan at 908-668-8181 Extentsion 110. Tell them you read it here! 

December 21, 2010

A New Kind of Search Experience

Qwiki For a while, we've talked about the ways we think enterprise search can - and likely will - improve in the future. We're big fans of conversational search, a search experience currently implemented with facets and 'related links' technologies that draw the user into an interaction with the human... finding becomes an exploration, not a single-shot event.

We've talked to companies that wanted to create search results that look more like a newsletter or a data sheet and less like the output from 'DIR' or 'ls -l'. Well, this week I've been introduced to a new kind of search experience created by Qwiki.

Currently available by invitation only, Qwiki provides information and a search experience that you can watch. The material on Qwiki is machine generated, and then vetted by humans, presumably a team working at - or for - Qwiki.

My guess is Qwiki starts by federating material from trusted public sources.. I'd imagine they start by scraping content on Wikipedia and other useful information sites. They gather images, videos and text, which is read to you by a computer generated voice that sounds much like SAL 9000 in 2010, the sequel to '2001: A Space Odyssey'.

In a nice video from TechCrunch 2010, Doug Imbruce of Qwiki calls the service 'information and experience I can watch'. I think this could end up being an interested enterprise technology as more and more companies build terabytes of HD video content.

 

December 05, 2010

Share your successes at ESS East next May

ESSSpringLogo Our friends over at InfoToday who run the successful Enterprise Search Summit conferences have asked us  to announce that the date for submitting papers to their Spring show in New York in May 2011 has been extended until Wednesday, December 8. You can find out what they are looking for and how to submit your proposal online at http://www.enterprisesearchsummit.com/Spring2011/CallForSpeakers.aspx.

Michelle Manafy, who runs the program again next May, really likes to have speakers who have found creative and successful ways to select, deploy, or manage ongoing enterprise search operations. We've co-presented with several of our customers in the past, and trust me, it's great fun and not bad for your career! And - no promises - the weather at ESS East has been great for just about every year - and we've been there for nearly 6 years now!

A friend told me something years ago that I've always fond helpful; I hope you'll take it to heart: 'Everything you know, someone else needs to know'. Don't worry if your search project isn't perfect; or worry that someone will find fault with what you've done. Trust me: there are many organizations newer to enterprise search than you are, and anything you found helpful will sure be valuable for them as well. And you get to attend al of the sessions, so you might learn more as well! A 'win-win' situation if I've ever seen one!

See you in New York!

/s/Miles

 

 

November 08, 2010

Enterprise Search Summit DC November 15-18

The new home for the Fall ESS show is the Renaissance Hotel in downtown Washington, DC... so much for ESS-West! The new locale should bring a large number of new attendees and visitors, and a new co-located conference: SharePoint Symposium. InfoToday knows a trend when they see one!

In addition to the usual sessions provided to show sponsors, there are some interesting sessions by Tom Reamy of KAPS Group; Martin White of Intranet Focus; and eDiscovery expert Oz Benamram, CKO of White and Case LLP. Tony Byrne of Real Story Group will also be there, moderating the session I'll be participating in: Stump the  Search Consultant on Wednesday afternoon November 17th.

I really expect the show to have a large number of government folks in attendance, given how hard it's been for these good folks to travel to previous ESS conferences in New York and San Jose. InfoToday reports higher pre-registration this year than in the past; and I'll be happy to find out I'm wrong about most of the attendees being government or government-related folks.

Come by the session Wesnesday afternoon at 3PM; or leave a comment here if you want to get together.

 

 

September 01, 2010

Today's Search Term: hybrid search

hybrid search
Synonyms:  fielded search, filtered search
Related Terms:  taxonomy, parametric search, faceted search, scope of search
A search that includes both full-text and traditional database search criteria. For example, a tech support person could look for "installation errors" (full-text) within a particular product line (more like a traditional database field search). By combining together the additional criteria of "product='accounting software'", the tech support person gets a more targeted scope of search, and is more likely to find the installation error they were looking for. Another example, an analyst might search for "depreciation allowance" (the full-text) within a particular jurisdiction (a traditional database-like field). By adding the filter "state='FL'", the analyst gets a more targeted scope of search, and is more likely to find relevant documents.

December 02, 2009

Deep Web Sponsoring a federated search challenge

Abe and Sol Lederman over at Deep Web Technologies have announced the second annual contest to discover the best federated search methodology out there. The objective, from their FederatedSearchBlog web site:

Tell us about the most impressive federated search application you’ve ever seen, or about one you’ve dreamed up. How innovative can federated search be? What unique problems can it solve?

The first ten serious entries get an Amazon gift certificate or $25 via PayPal; and the top prices are $1000, $500, and $250 respectively. The winner will be a panelist at the April 2010 Computers in Libraries conference; and Deep Web will pick up the travel costs for the winner.

Federated Search is a hot topic, partly because nearly every organization wants to search content they may not have rights to index. Deep Web Technologies has some great examples of federated search and query time facets and clustering. Check out their web site, then write up a submission, win a few bucks, and speak at the Computers in Libraries conference next Spring! Do it now!

/s/Miles

January 12, 2009

Enterprise Search group on LinkedIn

There is a relatively active Enterprise Search Engine professional group over on LinkedIn - you might consider joining the group if you're there. The discussions have been about Open Source technology, visual search, Microsoft and FAST, federated search and more.

It's interesting that so many 'enterprise search' groups have grown so much in the last year or two including searchdev.org - I guess it reflects both the fact that it's finally being recognized as a mission-critical capability and that there is so few places to go for information. Hopefully we'll see even more discussion and user participation in the coming year!