7 posts categorized "Mobile Search / Cell / iPhone"

November 08, 2011

Are you spending too much on enterprise search?

If your organization uses enterprise search, or if you are in the market for a new search platform, you may want to attend our webinar next week "Are you spending too much for search?". The one hour session will address:

  • What do users expect?
  • Why not just use Google?
  • How much search do you need?
  • Is an RFI a waste of time?   

Date: Wednesday, November 16 2011

Time: 11AM Pacific Standard Time / 1900 UTC

Register today!

June 16, 2011

Context aware computing

The keynote speech at a recent Intel Developer Forum was on context aware computing driving the future of computing. PCMag has an article about it. The basic idea is to enhance devices so that they can act more like personal assistants and less like PDAs. The examples range from the rather outre such as using a EEG headset to take information directly from your brain, to a smart remote control recognizing who is holding it and altering your viewing experience, to more mundane examples such as sensing human gait and using GPS to determine your location in a smartphone.

It had a strong hardware focus (as expected), and didn't appear to mention existing location based search with smart phones. Instead, they demonstrated a prototype of Fodor's Travel application that combined location awareness with other data its learned about you (such as what cusine you prefer) when searching for a nearby restaurant.

Microsoft Research has a interesting article called Exploring New Frontiers of Search that describes concept based search using WS-LDA (Weakly Supervised Latent Dirichlet Allocation) and context aware search using Variable-length Hidden Markov Model (VLHMM) . They're both data mining technigues.

 

August 27, 2010

There's an Ant on your Southwest Leg!

The WSJ has an interesting article on how language effects how we think.  I particularly liked the example of a indigenous language where anything you discuss involves absolute cardinal directions (north, south, east, west etc.). You literally can't say "There is an ant on one of your legs". Instead you say something like "There's an ant on your southwest leg." To say hello you'd ask "Where are you going?", and an appropriate response might be, "A long way to the south-southwest. How about you?" If you don't know which way is which, you literally can't get past hello. 

Dr. Kevin Lim reviewed Search Engine Society , a book which explores the effect search engines have on politics, culture and economics. He is not your typical reviewer since he also mentioned in the book, due to his recording a large part of his life using cameras (one he wears, another at his desk points at him) while a GPS device tracks his movements.

Google throws its weight behind Voice Search by Stephen Lawson discusses how voice search is based on statistical models of what sequences of words are most likely to occur, and how they train a new language model. Another example of that would be Midomi , a web site where you search for music by singing a fragment of the song. 

Multilingual Search Engine Breaks Language Barriers discusses how the the UNL Society uses the pivot language UNL to return a precise answer in the language in which the question was formulated. This seems to be still a research project, with some related projects such as LACE trying to extract data from parallel corpora as a cheaper way to populate a lexical database.

XBRL Across The Language Divide by Jennifer Zaino discusses how XBRL (eXtensible Business Reporting Language) may be one of the few areas that benefits from the Monnet project , which attempts to "provide a semantics-based solution for accessing information across language barriers". It tries to "build software that breaks the link between conceptual information and linguistic expressions (the labels that point back to concepts in ontologies) for each language." When that works, it makes it easier and quicker to perform analytics across multiple languages.

The Cross-Language Evaluation Forum (CLEF) is working on infrastructure for testing, tuning and evaluation of systems that retrieve information in European languages, and benchmarks to help test it. One of its papers for example, compares lexical and algorithmetic stemming in 9 languages using Hummingbird SearchServer

August 06, 2010

Coveo Expresso - free Enterprise Search Lite for up to 50 users

Coveo released a beta version of Coveo Expresso . It is a free entry level enterprise search application "designed to allow users to search through corporate emails, SharePoint, network file servers and desktop files from their mobile device or desktop." It's free for up to 50 users, 1 million emails and attachments, and 100,000 documents. Each user can use a Outlook sidebar for searching from within Outlook, a floating search bar for the desktop, a classic search page using their browser and/or a Blackberry MIDlet (mobile search).

It is built on Coveo Enterprise Search Platform 6.0 and provides a simplified admin portal to centrally provision users with just a few clicks. The employee can then install or update Coveo Expresso to their desktop, Outlook and BlackBerry with one click. The company claims you can download and configure it in less than 45 minutes.

They sell several upgrade packs . The license can be expanded to 250 users, 5 million desktop files and email messages, and 1 million SharePoint and file share documents just by typing a new access code. Expresso can use Coveo’s Advanced Search Modules, which are highly configurable and scalable to billions of documents.

The free version of Coveo Expresso requires a permanent Internet connection to receive license keys, which are renewed every 7 days. It will go offline and users will be unable to do any searches if they are not renewed.

Barb Masher has a overview of the new features in Enterprise Search Platform 6.1. The two products share many features. A comparison of their features is available here .

Stephen Arnold recently posted about Coveo's Enterprise Search product winning the SIAA Codie award in the “Best Enterprise Search Engine Category” for the second time. 

John Ragsdale has an interesting post (this was in January, before version 2.0 of CIAS was announced) about his spending some time with Coveo, an "emerging customer information access vendor, whom I will never again refer to as a search vendor". He argues that their customer search product provides so many possibilities to retrieve, manipulate and display data that it is "much more than a search engine, or a dash boarding tool, or a reporting platform, though it can do all of these things well."

July 07, 2010

In Defense of "grep" / auto-substring Matching :-)

As some of you know, grep is the Unix utility that, in its simplest form, looks literal strings in a file and prints out any matching lines. The database equivalent is the LIKE operator with percent signs before and after the string.

For years all of us fulltext search engine snobs have been saying "grep is not a search engine" (and by extension, neither is the LIKE operator in SQL), and that this type of literal matching is insufficient for real searching. For example, this type of simple matching won't get word variations like "run" and "ran", nor synonyms like "cool" and "cold".

From an implementation standpoint, the problem with grep is performance related, it scans every line of every file to check each pattern. This is super slow if you have billions of documents. Instead search engines index all the documents ahead of time and create a highly optimized search index. It consults that index, not the original source documents, to search for specific words.

But I find myself doing substring searches in a few of the systems I frequently use. In our CRM, when I don't remember the specific spelling of a person or company or product, I type in just 3 or 4 letters. This doesn't always work, sometimes it brings back junk, other times it misses the mark. But it's an easy search to edit and resubmit, so I can fire off 2 or 3 variations in short order. I also use substrings quite a bit when searching through source code. OpenGrok is a very nice Lucene based search engine, and uses proper word breaks, but sometimes it actually doesn't find things I'm looking for because it's looking, by default, for complete words. Whereas when you're in the Eclipse editor, it uses substring searching by default, and you can lookup substrings without thinking about it. Email is yet another application that, at least on some systems, starts looking up matches after just 2 or 3 letters. There's a special case, some systems will only match those 2 or 3 characters if they're at the start of a word, similar to many autocomplete instances.

I can hear some of you yelling "what about wildcards!?" - most engines will let you put abc* and match everything starting with abc. Search engines differ on whether or not you can use wildcards in the middle or start of the word, and some engines can do it IF you enable it. This is close... it's an improvement in that it doesn't do a linear scan of all the documents, it still consults the fulltext search index. But most folks forget to put the asterisk... or is it a percent sign? And can you put it in the middle or beginning, in your particular engine and configuration? Who knows!

So what's to be done? The good news is you really can "have your cake and eat it too!". Highly configurable search engines can be told to index the same text in several different ways. One internal index can have tokens that are the exact words. Another index can normalize the words down to lower case and perform "stemming", to normalize all the plurals to singular form, etc. These engines should also be able to be coaxed into storing all of the smaller chunks of words in yet another index. Of course substrings aren't as good as a full match. But search engines have an answer for this too! You can set the relevancy for these different indices with different weights. A substring match is OK... if there's nothing else... but if the full word matches, it should get extra credit, or an exact match scores even higher. And keep in mind you're not paying the performance penalty, it's using the index and not doing a literal scan of every file.

All this techno-babel, let's walk through an example:

You're text has the term sentence "There were marks on the surface.", and let's focus on the third word "marks". Then another sentence has "Mark wrote this blog post."

The word "marks" gets indexed several ways:

Exact index: marks

Stemmed index: mark

Single index: m a r k s

Double index: ma ar rk ks

Triples: mar ark rks

Then the term "Mark" is indexed as:

Exact index: Mark

Stemmed index: mark

Tuple index (combines the 1, 2 and 3): m a r k ma ar rk mar ark

Kinda techie, but you can see that, as long as the same rules are applied to the search terms, we can easily matching something.  If somebody doesn't remember if my name ended in a "c" or a "k", they can find me with just "mar". Now, if there's a million documents, that search will bring back LOTS of other documents with the substring "mar", albeit very quickly!

But if somebody searches for mark or Mark, extra credit will be given for matching more precise indices. Actual implementations would probably leave off the single letter index, the m, a, r and k stuff, as almost every document would have those. And this implementation would take more disk space, more time to index, etc. And they'd tend to bring back a lot of junk. But the good news is that folks wouldn't have to remember to add wildcard characters. In techie terms we'd say this "helps recall, but hurts precision". Another idea would be to NOT apply the substring matching by default, but perhaps offer a clickable option in the results list to "expand your search", which re-issues the same search with the substring turned on, an let the user decide.

Index-based automatic substring matches have its place, along with all of the other tools in the search engine arsenal. It's a nice option to have when searching over names, source code, chemicals, domain names, and other technical data. Whether it's turned on by default, and how it's weighted against better matches, are choices to be carefully weighed.

May 11, 2010

Plink Acquired - Should Improve Google Goggles

Google Goggles is a visual search tool for smart phones. Its one of several recent search enhancements for Google Mobile mentioned in this post by Greg Sterling. A recent post of his discusses Google's acquisition of Plink (a UK based startup that developed a visual search engine for Android) four months after the company’s public launch, and Google's plan to use the team to improve Google Goggles.

PlinkArt won the ‘peoples choice’ award in the IQPrize last year and then $100,000 in Google’s second Android Developer Challenge.

January 20, 2010

Google I/I Open for registration!

Google has announced its Google I/O 2010 to be held in San Francisco May 19-20 at the Moscone Center.

I think this is their third such annual event, and it's always been a full two days of information. The good news is the price is $400 per person (until April 15), a bargain really. The bad news? You'll need to bring four or five people from your company to hit all of the sessions in each track!

This conference is VERY technical, VERY good. You get the most from it if you are a developer, you know Java, Ajax, Python, or the other technologies Google uses in its various products. You won't find much in the way of marketing fluff here: in our experience, most presenters are Google developers.

The conference is being held the same week that Gilbane content management conference comes back to San Francisco. Bad timing for them, but good for you: you can probably walk to the nearby Westin at lunch and maybe catch the exhibits.

Last year, attendees received a free phone for development purposes on the Android OpSys; who knows what they might give away this year - besides the expected cool T-shirt!

Register at https://code.google.com/events/io/2010/.