10 posts categorized "Dieselpoint"

November 08, 2011

Are you spending too much on enterprise search?

If your organization uses enterprise search, or if you are in the market for a new search platform, you may want to attend our webinar next week "Are you spending too much for search?". The one hour session will address:

  • What do users expect?
  • Why not just use Google?
  • How much search do you need?
  • Is an RFI a waste of time?   

Date: Wednesday, November 16 2011

Time: 11AM Pacific Standard Time / 1900 UTC

Register today!

August 09, 2011

So how many machines does *your* vendor suggest for 100,000,000+ document dataset?

We've been chatting with folks lately about really large data sets.  Clients who have a problem, and vendors who claim they can help.

But a basic question keeps coming up - not licensing - but "how many machines will we need?"  And not everybody can put their data on a public cloud, and private clouds can't always spit out a dozen virtual machines to play with, plus duplicates of that for dev and staging, so not quite as trivial as some folks thing.

The Tier-1 vendors can handle hundreds of millions of dcs, sure, but usually on quite a few machines, plus of course their premium licensing, and some non trivial setup at that point.

And as much as we love Lucene, Solr, Nutch and Hadoop, our tests show you need a fair number of machines if you're going to turn around a half billion docs in less than a week.

And beyond indexing time, once you start doing 3 or 4 facet filters, you also hit another performance knee.

We've got 4 Tier-2 vendors on our "short list" that might be able to reduce machine counts by a factor of 10 or more over the Tier-1 and open source guys.  But we'd love to hear your experiences.

May 19, 2011

Content owners don't care about metadata

Or do they?

Our recent post about Booz & Company's 'men named Sarah' highlights just how important good metadata can be in order to provide a great search experience for employees and customers.

One of our customers who spoke at the recent ESS 2011 in New York provided some great insights into the problems organizations have getting employee content creators to include good metadata with their documents.

During the ESS talk, they report that content owners don't really seem motivated when asked to help improve the overall intranet site by improving document metadata. However - and this is a big one - when a sub-site owner sees poor results on their own site, they are willing to invest the time to provide really good metadata.

[A bit of background: This customer provides a way to individual site owners within the organization to add search to their 'sub site' pretty much automatically - sort of a 'search as a service' within the enterprise.]

So if you've been thinking of adding the ability to search-enable sub-sites within your organization, but solving the relevance problem is your first task, you might reconsider your priorities!

/s/Miles

December 05, 2010

Share your successes at ESS East next May

ESSSpringLogo Our friends over at InfoToday who run the successful Enterprise Search Summit conferences have asked us  to announce that the date for submitting papers to their Spring show in New York in May 2011 has been extended until Wednesday, December 8. You can find out what they are looking for and how to submit your proposal online at http://www.enterprisesearchsummit.com/Spring2011/CallForSpeakers.aspx.

Michelle Manafy, who runs the program again next May, really likes to have speakers who have found creative and successful ways to select, deploy, or manage ongoing enterprise search operations. We've co-presented with several of our customers in the past, and trust me, it's great fun and not bad for your career! And - no promises - the weather at ESS East has been great for just about every year - and we've been there for nearly 6 years now!

A friend told me something years ago that I've always fond helpful; I hope you'll take it to heart: 'Everything you know, someone else needs to know'. Don't worry if your search project isn't perfect; or worry that someone will find fault with what you've done. Trust me: there are many organizations newer to enterprise search than you are, and anything you found helpful will sure be valuable for them as well. And you get to attend al of the sessions, so you might learn more as well! A 'win-win' situation if I've ever seen one!

See you in New York!

/s/Miles

 

 

February 24, 2010

Enterprise Search Summit 2010 - DC

Even as we prepare for ESS East in New York (ESS NY from now on?), Information Today has issued its call for papers for the first ever ESS-DC to be held in Washington DC November 16-18 2010.

Follow this link to find background on what InfoToday is looking for; or jump right to the submissions page. Don't be shy: everyone who presents papers had, at one time, never done it before. What you know, someone else needs to know!

In our experience, the kind of content InfoToday likes is the information that can help an organization select or manage search and related technologies. Generally, real-world stories about how other companies and organizations have succeeded with search are the ones that attendees appreciate the most. 

We'll also be having a searchdev dinner at ESS DC this year. Details to come late in summer, but plan for it now!

Are you doing search now? Have you been successful getting it going on time and under budget? Tell your story. Submit your idea now!

June 08, 2009

Enterprise Search Engine Optimization: eSEO

Last week at the Gilbane Conference in San Francisco, I participated in a panel "Search Survival Guide: Delivering Great Results" moderated by Hadley Reynolds of IDC. In the presentation, I offered a new view on improving enterprise search engine relevancy that I call eSEO.

The term SEO is well understood by - and widely practiced in - the corporate world.  The concept of SEO, as summarized by one of the Gilbane talks, states that "Key to the value of any Web content is the ability for people to find it”. In the SEO world this is done by combining organic results and keyword placement - advertising - to improve placement, maintain ranking, and monitor search engine position - results- over time.

While we've been helping our customers improve their enterprise search results, it's hard to convince them that search results are not a problem they can solve once. I've decided to apply a new term to this process - Enterprise Search Engine Optimization, or eSEO. To paraphrase the role of SEO, eSEO is the process of combining organic results and best bets to deliver correct, relevant, timely content to enterprise search users - employees, customers, partners, investors, and others.

For both organic and best bets, the first step is to identify what we call the "top 100" queries. Start by creating a histogram that shows the top terms from your search engine. I hope you'll agree that if the top queries - whether 100, 50, or even 20 - deliver great results, you're on your way to having happy users. Talk to your content owners as you review the histogram, and ask them to identify the best result for each.

Once you have a list of queries and results, start the two step process: tune the search engine using its native query tuning capabilities. This will impact the shape of the histogram, and over time should start delivering better results. The bad news is tuning like this doesn't position all of your top terms, and it would be silly to try to micro-manage the results for each. That's why search engines have best bets.

When you feel pretty good about the curve through query tuning, it' time to start setting up best bets - the "ad words" of eSEO. Limit the number of bests bets to one or two at most - but remember that you can use other real-estate like the rightmost column of the screen to suggest additional content. Some guidelines for best bets:

  • Use one or at most two best bets
  • Don't repeat a document already at the top of the organic results
  • Make sure your best bets respect security

Once you have tuned your search engine, and set up best bets for the most timely and actionable result, you're ready to roll it out. But then the ongoing part comes in: you need to review your search activity and best bets periodically. Usually, we'd suggest once a month for a while, then perhaps quarterly thereafter. You may find seasonal variations, and if you're not watching you'll miss a golden opportunity.

In Summary

1. eSEO is just as critical as SEO

  • Lost time and revenue
  • Legal exposure

2. Watch for trends over time: Search is not "fire and forget"

3. Make sure SEO doesn't impact your eSEO

  • Use fielded data that web search engines ignore for your tuning (i.e., 'Abstract' rather than 'Description'.

This will get you started; but because your queries and your content changes over time, it's a never-ending story. Some companies - ours included - have tools that can help. But no matter what, hang in there!

s/Miles


June 06, 2009

Impressions of first Lucene/Solr SF Meetup

Kudos to Carl, our NIE Marketeer and defacto social director, for getting us to attend, well worth it, and conveniently coinciding with Gilbane.

The Good:

  • VERY entertaining, very informative.  Lots of good info about upcoming versions of Lucene and Solr, including additional performance tweaks.
  • A friendly, supportive bunch of like-minded nerds, and I mean this is the best possible way.
  • Also discussions of other related Apache projects.  We're all gonna need a cheat sheet pretty soon to keep track of it all.
  • Lucene/Solr will soon have implemented much of the core features of Autonomy IDOL, Endeca, FAST, etc.  They really ought to be spying.  :-)

Personally I think Otis & co. might wanna fly out for the next one.  I also think Dieselpoint ought to attend and talk about Open Pipeline.  If we get up enough energy maybe we could even volunteer to do that next time, we're on the board after all, but this is really Chris's baby.

The Not-so-Good:

  • About 50 terms that clients would not understand.  Don't get me wrong, we love the Map/Reduce, Bayesian, K-Means, SVD stuff, but most corporate clients would be lost.
  • Not much for Enterprise Packaging.  Ironically it's the mundane aspects of search, from a non-developer standpoint, that are still not on the horizon.  Not a criticism of the developers, they have what they need.
  • Not much about Nutch.  Nutch 1.0 is out, along with rumors of a revised admin GUI, but not much coverage here.

Impressions of Lucid Imagination:

This event was sponsored by Lucid, a company that recently got funding for bringing commercial packaging and services to the open source search world, and their senior staff includes quite a few of the core committers.

  • A very sincere bunch of guys.
  • They haven't sold their souls to corporate America, I think their "geek cred" is still well in tact.
  • Probably will not be filling in enterprise packaging pot holes any time soon.
  • Do they understand the Enterprise Market?

Also a shout out to LinkedIn and IBM for giving back to open source community.

There was also an "open mic" segment, and I'd like to give a shout to Avi Rappaport - I agree 1,000%, "stop words bad!" (or at least the blind use of index time stop words)


Surprises:

  • Not much of a threat to Google Appliance, due to packaging.  Yes, Google scales with their Map/Reduce and relevancy algorithms, and the open source guys have responded, but that's not the stuff that makes Google tick these days.
  • And despite the impressive and rapidly evolving core technologies, also not a real threat to the other Tier One vendors like FAST and Autonomy.  More on this seeming contradiction in a bit.
  • The Tier 2 vendors of the world, Attivio, Exalead, Dieselpoint, etc. DO need to pay attention.  There is a place for Tier 2 vendors, but they need to mind what the open source products do and do not provide more carefully.
  • It's really cool to see IBM willing to contribute so aggressively to the open source search engines, even though they sell several of their own.  A naive person might think they are competing with themselves, sabotaging their own sales guys, but they're a lot smarter than that.  They are selling their commercial search products as pure search, those technologies are always part of a larger (and more expensive) grand business solution.  They know what they're doing!

For similar reasons, still not a huge threat to Autonomy, MS/FAST, Endeca, etc. on corporate services.  I said earlier that the Apache projects are implementing a lot of the "secret sauce" that launched Autonomy and Endeca, etc, so you'd think this represents "a clear and present danger", but Mike Lynch's secret algorithms are not why people buy IDOL anymore.  Things like giant reference accounts, professional services, and commercial grade spiders have a lot more to with why big companies still pay six figures for search technology.

And speaking of surprises and Lucid Imagination, I wanna circle back to their PR a few months back when they got their funding and launched their company.  They talked about relevancy in their press releases!?  Wow... Yes, Lucene and Solr have some good traction there, but that specific competitive advantage has been used by almost every commercial search vendor in the past 15 years, including Verity, Autonomy and Google!

I would've expected them to say something like "we're gonna do for Lucene what RedHat did for Linux" - this would have been a very clear business-oriented proposition, though to be fair lots of companies have used that business model as well.  It wouldn't be original, but would be more business centric.  Then again, I'm not in Marketing, and their VC's obviously liked their pitch, so what do I know!

s/Mark

March 02, 2009

Enterprise Search Resources

Search Resources

There's a great deal of activity going on in the enterprise search market - groups and resources popping up everywhere. We thought we'd provide a list of the ones we know and respect best; feel free to add your own suggestions as comments and we'll post them in a follow up.

User Forums

SearchDev.org: The independent search developer's forum. A forum on the business and technology of search.

SearchDev also has two technical forums for detailed vendor-specific questions dealing with everything from coding and scripting to problem resolution, with more in the works:

autonomy.searchdev.org

fast.searchdev.org

LinkedIn Groups

Enterprise Search Engine Professionals Group: A fast-growing LinkedIn group for people working in or involved with enterprise search in corporate environments worldwide. Search for it under the Groups menu.

Enterprise Search Summit Group: A new group run by Michelle Manafy at Information Today which will provide industry news and information as well as details and podcasts about upcoming EDD events.

Newsletters

Enterprise Search Newsletter: Produced by New Idea Engineering, this newsletter covers both business and technical issues of search, generally at a more detailed technical level. It covers all vendors, provides advice for improving your search, and includes Ask Dr Search who answers technical questions from subscribers.

Blogs

Enterprise Search Blog: A blog produced by New Idea Engineering that covers all topics around the business and technology of enterprise search including opinion, news, events and more.

The Noisy Channel: This insightful blog, run by Daniel Tunkelang, CTO of Endeca, has a perspective on technology of enterprise search from someone who knows search from the ground up.

Beyond Search: Run by search guru Steve Arnold, Beyond Search contains news, interviews, and opinion on the search market delivered

SearchTools:  Avi Rappoport runs this blog which summarizes new content from her website http://searchtools.com/ which covers almost every search technology known to mankind!

SLI Systems Blog: Hosted search service SLI Systems provides a newsletter that talks about the kinds of problems they see in working with their customers. http://www.sli-systems.com/newsletter.php

FAST Forward Blog: A blog run by FAST Search staffed by FAST, Microsoft, and independent bloggers who write about search and IT issues at http://www.fastforwardblog.com/.

Attivio:The search vendor has a useful blog at  that had good general informaiton as well as Attivio-specific material.

Mark Logic Blog: Written by CEO Dave Kellogg, who shares interesting informaitn about technolgy. A fun read, and always informative.

Vivisimo Blog: Vivisimo runs the 'Search Done Right ' blog that provides grat background information on enterprise search. Like Attivio's blog, this has great background information that anyone can benefit from reading.

Flax Blog: From Lemur Consulting in the UK, the creators of the Flax open source search technology. You'll find more than just Flax here, though, with good coverage of issues relevant to enterprise search in general. 

Gilbane Search Practice Blog: Written by Lynda Moulton, this is a good background blog for enterprise search as well. Gilbane holds two interesting content management conferences a year that include a search track that can be worthwhile.

Two other blogs i find most interesting are not directly related to enterprise search, but I find good value when I follow them:

Andrew McAfee, a Professor at Harvard Business School. writes about IT issues, and he always has interesting material.

John Battelle, author of 'The Search...', has an interesting blog as well, and it's always fun to follow what he's doing.

Trade Shows

Enterprise Search Summit New York: Every May, Information Today sponsors the premier show for enterprise search in New York City. If you only go to one show a year, this is the one to go to. That's also the advice we give to new vendors entering the marketplace. We'll be back again this year, speaking about how you can save money by making your existing search engine work rather than replace it. By the way, you can listen to a preview of our talk, as well as talks by other speakers including Matt Brown of Forrester and Sid Probstein of Attivio.

Search Engine Meeting: Search Engine Meeting in an interesting show run by Infonortics from the UK. In its 14th year, this year's show returns to Boston in April 27-28; see you there!

January 12, 2009

Enterprise Search group on LinkedIn

There is a relatively active Enterprise Search Engine professional group over on LinkedIn - you might consider joining the group if you're there. The discussions have been about Open Source technology, visual search, Microsoft and FAST, federated search and more.

It's interesting that so many 'enterprise search' groups have grown so much in the last year or two including searchdev.org - I guess it reflects both the fact that it's finally being recognized as a mission-critical capability and that there is so few places to go for information. Hopefully we'll see even more discussion and user participation in the coming year!


October 13, 2008

Reviewing OpenPipeline

OpenPipeline is an initiative proposed by search engine company Dieselpoint to begin development of standards in the enterprise and customer facing search marketplace.

"Current solutions are proprietary and require that search administrators define and manage data source connectors, file filters, text analyzers, taxonomy, and dictionaries for each search engine technology," says Miles Kehoe, CEO of New Idea Engineering. "Defining once and maintaining a single source regardless of how many and which search engine you use is a big win for customers. We hope other search engine vendors will be adopting this strategy soon." 

"Enterprise search is not the same as web searching", Chris Cleveland, CEO of Dieselpoint says, "because it entails all of the nitty-gritty preparation for search—that is, it requires doing all of those things you need to do to get a document and standardize it before indexing. OpenPipeline, he says, aims to streamline the preparation process through its innovative document-processing capabilities."

Additional information ... 2008 Enterprise Search Vendors: The New Fab 4 ... and 1/2. (http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html)

OpenPipeline was created and by Chris and his team of developers at Dieselpoint, whose intranet and customer-facing search product is written in Pure Java. Dieselpoint Search is a powerful product, and has many of what we call 'Enterprise Search 2.0' capabilities designed in from the start. For example, it has a web-based control panel for business and IT managers, and provides great support for features like dynamic facets, activity reporting, and powerful data crawling capabilities. It has an elegant and clean interface which is extremely scalable. Dieselpoint Search integrates OpenPipeline for crawling, parsing, analyzing, and routing documents.

About Dieselpoint
Founded in 1999, Dieselpoint provides high-performance search, navigation, and discovery/information retrieval software for structured and unstructured data. Every day, Dieselpoint customers search millions of items and terabytes of data. Customers like The Nielsen Company, Northrop Grumman, Porsche, HMV, McGraw-Hill, ITT, Waterstone’s Books, and British Telecom use Dieselpoint software for corporate portals, intranet search, product catalogs, and engineering databases. Dieselpoint has developed industry-leading advances in faceted search and scalability. Coupled with a new Open Pipeline architecture and outstanding ease of implementation, Dieselpoint is the platform of choice for corporate search needs.  Further information can be found online at www.dieselpoint.com.