« May 2008 | Main | July 2008 »

4 posts from June 2008

June 23, 2008

13 Powerful Entity Extraction Techniques

Modern Entity Extraction systems typically employ some combination of the following general techniques; this list is shown in approximately smallest to largest scope/complexity:

  1. Simple Pattern Based:
    Examples: 5 digits in a row might be a US postal zip code, 1 - (nnn) - nnn-nnnn could be a US phone number, and nnn-nn-nnnn could be a government Social Security number.
  2. Simple Dictionary/Thesaurus Based:
    Examples: IBM, Apple and Microsoft are all US companies.  Bill Gates, George Bush and Paul Revere were all famous people.
    - - - (basic toolkits end here) - - -
  3. Hybrid Pattern plus Dictionary:
    Example: A pattern finds a sequence of words that all have capital letters, so this is likely to be a proper name.  But to distinguish San Francisco, George Washington and Oracle Corporation as a place, person and company respectively, the system needs to consult some dictionaries.  And notice that "Washington" can be use as both a place and person name.  And if I see Flamingo Geodarney Foobazar, which uses either common words or words or names that are not in the dictionary, it may be even more difficult to disambiguate.

Continue reading "13 Powerful Entity Extraction Techniques" »

June 20, 2008

New Idea Engineering Selected by Endeca as Systems Integration Partner

With the opportunities we've seen recently in online commerce environments, we're glad to be able to announce we're part of the Endeca partner program. From the MarketWatch press release announcing the partnership between New Idea Engineering and Endeca:

SANTA CLARA, Calif. AND CAMBRIDGE, Mass. – June 20, 2008 – New Idea Engineering (NIE), a Santa Clara California based consulting group, today announced that it has been selected by Endeca Technologies, Inc., an information access software company, to design and deploy joint enterprise search and information access solutions based on the Endeca Information Access Platform (IAP).

"New Idea Engineering is among the world’s foremost authorities on enterprise search and information access solutions," said Matt Eichner, senior vice president, strategic development at Endeca.  "Our partnership will introduce the benefits of the Endeca platform to new markets and industries while ensuring the highest level of support and expertise.  As a result, NIE’s clients will be able to rapidly integrate data from any source and offer employees, partners and customers with new ways to find, analyze and understand that information.”

“Endeca has played a pioneering role in redefining search and its application areas," said Miles Kehoe, President of New Idea Engineering.   "By partnering with Endeca, we can introduce new types of enterprise search and discovery solutions to meet today’s most pressing client demands, while playing a leadership role in defining future market opportunities for information access and visibility.”

About Endeca
Endeca’s innovative information access software helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better decisions. The Endeca® Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence. More than 500 leading global organizations including ABN AMRO, Boeing, Cox Enterprises, the (US) Defense Intelligence Agency, Dell, Ford Motor Company, Hyatt, IBM, John Deere, the Library of Congress, Texas Instruments, and Walmart.com rely on Endeca to power business-critical applications that increase revenue, reduce costs, and streamline operations.

Headquartered in Cambridge, Mass., Endeca has operations in North America, Europe, and Australia. For more information: endeca.com or [email protected].

June 18, 2008

Search Quality: You Can't Improve What You Don't Measure

In our latest survey of new newsletter subscribers we found that 29% had no formal metrics for measuring quality of search results.  Search metrics allow you to keep search on the right track and can be a powerful tool for managing your systems.  They are a wonderful source for insights and trends.  We thought we would share a couple that we think work well. Many of these are covered in greater depth in Interpreting Your Search Activity Reports in the Enterprise Search newsletter.

  • Count the number of people who use search  
  • Count the total number of searches  
  • Count the number of zero search results  
  • User feedback on top 100 searches  
  • Track email complaints about search  
  • Measure number of clicks on navigators (navigation menu items)  
  • Business Goals  
    • Reduce call volume (normallized for growth in customer base) by enabling self-service from search: results are good enough to reduce calls.
    • Reduce e-mail volume (again adjusted for growth in customer base) by enabling self-service from search: results are good enough to reduce e-mails. 
    • Revenue       
    • Add-on revenue       

June 04, 2008

Tips For Building Drill Down Navigators

Taking a cue from tagging at social networking sites, about 6 tags can identify most documents. Here are a couple tips for building drill down navigators:

      Meeting Room
      White papers      
      Meeting Notes      
Why - Visitor's Goal
      Purchase Product      
      Find a Store
      HR Transactions
      Best Practices
      Service Manual


We have  blogged about implicit versus explicit tagging, a big difference between enterprise and public web sites. And our article 5 Steps to Better Tagging is online in the archives of our Enterprise Search newsletter.