9 posts categorized "Taxonomy"

December 18, 2012

Last call for submiting papers to ESS NY

This Friday, December 21, is the last day for submitting papers and workshops to ESS in NY in May 21-22. See the information site at the Enterprise Search Summit Call for Speakers page.

If you work with enterprise search technologies (or supporting technologies), chances are the things you've learned would be valuable to other folks. If you have an in-depth topic, write it up as a 3 hour workshop; if you have a success story, or lessons learned you can share, submit a talk for a 30-45 minute session.

I have to say, this conference has enjoyed a multi-year run in terms of quality of talks and excellent Spring weather.. see you in May?



March 23, 2012

Webinar: Is bad metadata costing you money?

We've planned a webinar that will help identify whether you have a metadata problem, what you can do to fix it, and how to justify the cost.

Despite what some vendors claim, enterprise search platforms rely on good metadata in order to deliver quality results. Yet few organizations have the resources to attack their metadata problems, so findability suffers and users lament "Why don't we use Google?"  Search managers know that even the Google Search Appliance, without quality metadata, can't deliver the internet search experience end users know, love, and trust. Yet it's hard to justify the time and effort to improve metadata in hopes of a better search experience.

In this webinar we will consider the issue of bad metadata, ways to address the problem, and some ideas on what the ROI can be. We will discuss:

  • Do you have a metadata problem?
  • How much is it costing you?
  • What is the risk of bad metadata
  • What tools are available?
  • What will it cost to fix?  
  • What's the ROI of improved metadata?


We're hosting the webinar twice; Wednesday, April 11th at 11AM Pacific time (GMT-7); and again on Thursday, April 12th at 8:30AM Pacific time. Click the link on the appropriate session you'd like to attend.

See you then!

August 01, 2011

Google Refine, Google's open source ETL tool for data cleansing, with videos!

For any of you working with Entity Extraction this might be of interest.  Google has open sourced some software from their FreeBase acquisition, formerly called Gridworks.  It lets you interactively cleanup and transform data.  More importantly, it says these steps into a reusable sequence of steps in JSON format, so they could be reapplied to other data.

Here's the main page and wiki (and 3 intro videos):

It IS Open Source, here's the source code and license:

That type of UI makes me want to dust off our XPump code and retrofit into it...

January 31, 2011

Great new tool for Pharmaceutical researchers

Topic_Explorer Our partners over at Raritan Technologies Inc. have recently released a great tool they developed using the  Lexalytics, Inc. Salence toolkit. The product, Topic Explorer, provides a way for users to dig through content and explore concepts from Raritan's extensive knowledgebase of medical terminology, augmented by the text analytics capabilities provided by Lexalytics. Many of you will remember Lexalytics as the company that provided great sentiment analysis in the original FAST ESP product prior to the acquisition by Microsoft.

Raritan co-founder Ted Sullivan gives a great video demo of the product you should see.

What's really great about Topic Explorer is that it isn't limited to just pharma. With the right taxonomy, it can be a great research tool for just about any vertical - risk management, eDiscovery, patent research, and more.

Topic Explorer is a search technology neutral product, so it will work with your current solution whether you're using Lucene/Solr or a popular commercial technolgy. Contact Raritan at 908-668-8181 Extentsion 110. Tell them you read it here! 

December 05, 2010

Share your successes at ESS East next May

ESSSpringLogo Our friends over at InfoToday who run the successful Enterprise Search Summit conferences have asked us  to announce that the date for submitting papers to their Spring show in New York in May 2011 has been extended until Wednesday, December 8. You can find out what they are looking for and how to submit your proposal online at http://www.enterprisesearchsummit.com/Spring2011/CallForSpeakers.aspx.

Michelle Manafy, who runs the program again next May, really likes to have speakers who have found creative and successful ways to select, deploy, or manage ongoing enterprise search operations. We've co-presented with several of our customers in the past, and trust me, it's great fun and not bad for your career! And - no promises - the weather at ESS East has been great for just about every year - and we've been there for nearly 6 years now!

A friend told me something years ago that I've always fond helpful; I hope you'll take it to heart: 'Everything you know, someone else needs to know'. Don't worry if your search project isn't perfect; or worry that someone will find fault with what you've done. Trust me: there are many organizations newer to enterprise search than you are, and anything you found helpful will sure be valuable for them as well. And you get to attend al of the sessions, so you might learn more as well! A 'win-win' situation if I've ever seen one!

See you in New York!




November 08, 2010

Enterprise Search Summit DC November 15-18

The new home for the Fall ESS show is the Renaissance Hotel in downtown Washington, DC... so much for ESS-West! The new locale should bring a large number of new attendees and visitors, and a new co-located conference: SharePoint Symposium. InfoToday knows a trend when they see one!

In addition to the usual sessions provided to show sponsors, there are some interesting sessions by Tom Reamy of KAPS Group; Martin White of Intranet Focus; and eDiscovery expert Oz Benamram, CKO of White and Case LLP. Tony Byrne of Real Story Group will also be there, moderating the session I'll be participating in: Stump the  Search Consultant on Wednesday afternoon November 17th.

I really expect the show to have a large number of government folks in attendance, given how hard it's been for these good folks to travel to previous ESS conferences in New York and San Jose. InfoToday reports higher pre-registration this year than in the past; and I'll be happy to find out I'm wrong about most of the attendees being government or government-related folks.

Come by the session Wesnesday afternoon at 3PM; or leave a comment here if you want to get together.



August 22, 2008

Interpreting Your Search Activity Reports

Earley & Associates, Inc. sponsors monthly conference calls organized for the Taxonomy Community of Practice, open to any practitioner interested in learning more taxonomy development, content management, search, and more.

Miles Kehoe, President of New Idea Engineering, Inc., participated in the call on September 3 and spoke about the role of your search engine in taxonomies and business intelligence.

Contact Earley & Associates to access the call replay.

October 29, 2007

Google Appliance Growing Up?


The newest version of the Google Search Appliance (GSA) is available, and it's starting to look like a pretty decent solution for more and more corporations.

Google released Version 5 provides what they call “Universal Search"  in October. The newest release for the entire GSA line (except the Mini) includes a number of excellent enterprise features including enhanced security; parametric search, Wiki KeyMatch, a social tagging for best bets; and an application called One Box, a search federator tool.

GSA security now includes Windows Integrated Authorization (WIA) and includes a security API to customize special security needs. It handles security both at crawl time and at search time. It fully respects data store security from all sources, so users only see documents, best bets, parametric results, and features which they have permission to view.

The parametric code in Universal Search is based on open source code available from Google (http://code.google.com/p/parametric/). In demos, it looks like most of the parametric demos we've seen; so we'll have to say more once we have a chance to drill down.

The odd feature in this release is the Wiki KeyMatch feature. Essentially it lets any employee tag a search result list by add "best bet" suggestions to the top of the result list for a given query. It looks like anyone can suggest a related or better result for any query. Apparently this has worked well in Google for a while, and Google folks say it's great. Administrators are notified when new tags are added or updated, and the best bet does show who created the tag. As Jimmy Wales says about his Wikipedia product, anyone posting understands that if the best bet is not useful or appropriate it's going to be removed; so in a sense any author who wants his/her best bet to survive, it better be good. I have to admit the corporate manager I’ve talked to are a bit skeptical; but it can potentially start using the 'wisdom of the crowd' to get better results where it works.

OneBox is a search federation application that provides a way to combine results from a number of different corporate data sources, as well as from Google Apps. As one of the Google folks said recently, "One Box is a way of pulling in live data (such as employee info, salesforce.com data, business objects data) right into your search results."

Google has a solution for SharePoint, Documentum, Livelink, and FileNet, as well as to Google Apps. They provide an API so you can write your own, and we're sure third party developers are busy working on then now. The Google provided connectors are free; but third party connectors may be priced depending on how the developer wants to market it.

Finally, Google also seems to have improved their existing "data biasing" to allow administrators to 'query tweak' using URL patterns and document recency.

The only bad news for small users and corporate departments is that the new upgrade and features are not (yet) available for the popular Google Mini.

If you looked at the Google offerings a while ago and they didn’t meet your needs, you may want to take a second look. It looks like they’ve started to come of age in the enterprise search market.



April 25, 2007

The Most Important Taxonomy for your Web Site

Taxonomies continue to be popular in companies, but I have to wonder if they are really that useful for the majority of organizations.  I can’t tell you how many times I‘ve had otherwise intelligent people tell me “We plan to implement a corporate enterprise search solution – as soon as we finish our taxonomy project”. When I hear this, I know search won’t be happening for at last two years or more, and in the meantime every visitor to the web site suffers. Usually I spend a few seconds feeling sorry for their users and/or employees, but then I realize that innovative companies with real work to do are moving ahead full speed with Enterprise Search 2.0 platforms and I feel better.

Taxonomies generally fall into on of two categories: subject-based taxonomies and content based taxonomies.

Subject or "Domain" taxonomies attempt to completely describe all of the terms in a field, as well as the relationship between the terms. Typically these relationships are hierarchical, and they are the kind of taxonomies we use to classify knowledge - the kind of taxonomies your biology teacher would talk about. You need a real subject matter expert to create useful subject based taxonomy. And whatever you do, don't hire two (or more) subject experts, because they will never agree on the taxonomy.

Content based taxonomies are organized using existing content. Organization charts, computer directory/folder structures, or social tagging content is typically a 'content based' taxonomy. These taxonomies are often built by humans - you do it yourself when you decide what folders to use on your computer. But these can also be done automatically with tools many search and content management vendors sell.

Whether you go with a subject or a content taxonomy for your company, hooking it into your enterprise search technology will be a trick. This is the dirty little secret of the search software business: There are few, if any, commercial engines that can really take advantage of a complex taxonomy. What do you do with it, after all? Do you tag every document with the full taxonomy of terms in the hierarchy for every term in the document? Do you think that somehow the search engine will automatically know what to do with the taxonomy, and look up and down the taxonomy tree to find related terms? Verity had a great concept when they invented Topics in the late 80s, but since then even they have lost some of the taxonomy emphasis.

We think there is a third kind of taxonomy that is even more important that the traditional subject and content taxonomies: we call it a Behavior-Based Taxonomy.

Really, the reason most companies want a taxonomy is to help people find content. You can probably keep several experts and a bunch of computes working for years to anticipate every possible term and every possible hierarchy that someone on your internet or intranet site may use. But we think the most important taxonomy on any web site is the list of search terms that people actually use when they search a site.

If your search engine can provide great results for the ‘top 100’ queries on your site, you have a lot of happy users. Why do you think search experts at trade shows have finally started talking about your search logs? You can't know what your behavior-based taxonomy (BBT) is unless you are monitoring your search activity at least quarterly. Verify that the ‘top 100’ queries are working fine - either with organic search results or with featured links (or best bets or result promotion, depending on your search vendor).

You keep your Behavior Based Taxonomy up-to-date, and your search users will be satisfied!