12 posts categorized "Google public search"

December 12, 2011

New Phrase for determining Sentiment Analysis / Customer Interest

If you lookup:

fedex "Package not due for delivery"

which is one of the status messages you can get when tracking a package, you'll see a lot of postings asking about it.

FYI: It means your new toy has arrived in the city you live in, but will NOT be delivered today, because they didn't promise to get it to you until tomorrow.  Whether this is to force customers into paying for express service, or simply a logistics issue, or a mix of the two, depends on your view of companies and I won't get into that here.

However, you'll notice a lot of the postings asking about it are from folks waiting for delivery of things they're very excited to get, often some big-ticket peice of shiny electronics.  They're dying for Fedex to deliver it - they're so anxious and upset about the delay that they motivated enough to go online and search, and make ranting posts - all because their "toy" is delayed.

So we have particular emotional response, often about an upscale product, with a reasonably distinct search phrase - cool!

Yes, yes, of course you could say that the customers are mad about the percieved injustice of it, the Occupy Wall Street spin, or that sometimes the package could be really important for other reasons, which are certainly valid points.  I'm not taking sides or passing judgement - and I found discovered this today looking for a friend's overdue toy - that's not the point.  I'm just saying that I bet there's a good statistical correlation, and of course it wouldn't apply 100% of the time - which would actually be quite rare in such things.

November 30, 2011

Odd Google Translate Encoding issue with Japanese

Was translating a comment in the Japanese SEN tokenization library.

It seems like if your text includes the Unicode right arrow character, Google somehow gets confused about the encoding.  Saw this on both Firefox and Safari.  Not a big deal, strangely comforting to see even the big guys trip up on character encodings.

OK: サセン
OK: チャセ
Not OK: サセンチャセ?

Google-translate-encoding

November 22, 2011

7 things GMail Search needs to change

My General Complaint:

If you've had a gmail account for many years, either for work or personal, it's getting large enough that GMail's search is starting to break.

Anything word you can think of to type in will match tons of useless results.  Eventually, as you try to think of more words to add, your results count goes to zero.

If you were lucky enough to have starred the email when you saw it, or can remember who might have sent it, or maybe the approximate timeframe, or maybe you think you might have sent the email in question from this account, you *might* have a chance.

A Tough Problem:

I realize this seems like classic precision and recall troubles, but Google is pretty smart, and they a fair amount of metadata, and a lot of context about me, so there's some potential fixes to hang a hat on.

And some of my ideas involve making labels/tags (Gmail's equivalent of folders), but that assumes that people are using labels, which I suspect many folks don't, or at least not beyond the default ones you get.  Well... sure, but they DO have them, and there's an automated rules engine in Gmail to set them, so presumably a few people use tags / labels?  (or maybe nobody does and, in hindsight, maybe a legacy feature!?) So, if you're going to have labels, and you've got even a few users who both with them, then make them as useful as possible.  AND maybe make Labels more visible, maybe easier to set, more powerful, etc.

On To The Ideas:

1: Make it easier to refine search results.

Let's face it, as you accumulate more and more email, the odds of finding the email you want on the first screen of search results goes WAY down.

Google wisely uses most-recent-first sorting in search results, vs. their normal relevancy, in the GMail search UI.  I'm not sure why, this seems like an odd choice for them given all the bravado about Google's relevancy, but I'm guessing it was too weird to have email normally sorted by date in most parts of the UI, but have it switch back and forth between relevancy and date as you alternate between search and normal browsing.  Also, maybe they found it's more likely you're looking f or a very recent email.  You could fold "freshness" into relevancy calculations, but just respecting date keeps it more consistent.

Yes, GMail does have some search options... I'll get to those, but suffice to say they are very "non iterative".

Other traditional filters should be facets as well.  "Sent" emails, date ranges, "has attachments" (maybe even how many, sizes, or types)

2: Promote form-based "Search options" to FULL Facets

You can limit your search to a subset of your email if you've Labeled it - this is the GMail equivalent of Folders.  But doing this is a hassle (see item 3), and you can't do this after the fact, once you're looking at results.

So, if you do normal text search, and then remember you labeled it, you can't just click on the tags on the left of the results.  Those are for browsing, and will actually clear out you search terms.  These should be clickable drilldown facets, perhaps even with match counts in the parenthesis, and maybe some stylizing to make it clear that they will affect the current search results.

Yes, there's a syntax you can use:

lebal:your-label regular search terms

It's a nice option for advanced users who are accurate touch typists and remember the tag name they want, but this should also be easy from the UI.  Yes, there is an advanced search / search options forms, but this brings me to item 3...

(read the rest of the ideas after the break)

Continue reading "7 things GMail Search needs to change" »

November 21, 2011

Google: Sometimes I really do want EXACT MATCHES

Disclaimer: Google only attracts my annoyances more because I use it so much.  And I'm confident they can do even better, and so I'm helping by writing this stuff down!

My Complaint:

Back in my day, when you typed something in quotes into a search engine, you'd get an exact match!  Well... OK, sometimes that meant "phrase search" or "turn off stemming"... but still, if it was only a ONE WORD query, and I took the time to still put it in quotes, then the engine knew I was being VERY specific.

But now that everyone's flying with jet-packs and hover boards, search engines have decided that they know more than I do, and so when I use quotes, they seem to ignore them!

I can't give the exact query I was using, but let's say it'd been "IS_OF".  Google tries to talk me out of it, doing a "Show results for (something else)", but then I click on the "Actually do what I said" hyperlink.  And even then it still doesn't.  In this false example, it'd still match I.S.O.F. and even span sentence gaps, as in "Do you know that that *is*?  *Of* course I do!"

The Technical Challenge:

To be fair, there's technical problems with trying to match arbitrary exact patterns of characters in a scalable way.  Punctuation presents a challenge, with many options.  And most engines use tokenization, which implies word breaks, which normally wouldn't handle arbitrary substring matching.

At least with some engines, if you want to support both case insensitive and case sensitive matching, you have two different indexes, with the latter sometimes being called a "casedex".  Other engines allow you to generate multiple overlapping tokens within the index, so "A-B" can be stored as both separate A's and B's, and also as "AB", and also as the literal "A-B", so any form will match.

Some would say I'm really looking for the Unix "grep" command, or the SQL "LIKE" operator.  And by the way, those tools a VERY inefficient because they use linear searching, instead of pre-indexing.  And if you tried to have a set of indexes to handle all permutations of case matching, punctuation, pattern matching, etc, you'd wind up with a giant index, maybe way larger than the source text.

But I do think Google has moved beyond at least some of these old limitations, they DO seem to find matches that go beyond simple token indices.

Could you store an efficient, scalable set of indices that store enough information to accommodate both normal English words and complex near-regex level literal matching, and still have reasonable performance and reasonable index sizes?  In other words "could you have your cake and eat it too"?  Well... you'd think a multi-billion-dollar company full of Standard smarties certainly could! ;-)  But then the cost would need to be justtified... and outlier use-cases never survive that scrutiny.  As long as the underlying index supports finding celebrity names and lasagna recipes, and pairing them with appropriate ads, the 80% use cases are satisfied.

May 21, 2011

Google and the official search blog

A couple of days ago, Google started Inside Search, the 'official Google search blog'. It's not really enterprise search news, but because so many knowledge workers compare the behavior of their internal search platform with the Google public search experience, it may be worth monitoring for those whose job it is to keep enterprise search going.

 

February 02, 2011

Make your search engine seem psychic

People tell us that Google just seems to know what they want - it's almost psychic sometimes. If only every search engine could be like Google. Well, maybe it can.

Over the years, the functions performed by the actual 'search engine' have grown. At first, it was simply a search for an exact match - probably using punch card input. Then, over time, new and expanded capabilities were added, including stemming... synonyms... expanded query languages... weighting based on fields and metadata.. and more. But no matter what the search technology provided, really demanding search consumers pushed the technology, often by wrapping extra processing both at index time and at query time. This let the most innovative search driven organizations stay ahead of the competition. Two great examples today: LexisNexis and Factiva.

In fact, the magic that makes public Google search so good - and so much better than even the Google Search Appliance - is the armies of specialists analyzing query activity and adding specialized actions 'above' the search engine. 

One example of this many of us know well: enter a 12 digit number. if the format of the number matches the algorithm used by FedEx in creating tracking numbers, Google will offer to let you track that package directly from FedEx. For example, search for 796579057470 and you see a delivery record; change that last 1 to a zero, and you get no hits. How do they know?

The folks at Google must have noticed lots of 12 digit numbers as queries; and being smart, they realized that many were FedEx tracking numbers. I imagine, working in conjunction with FedEx, Google implemented the algorithm - what makes a valid FedEx tracking number - and boosted that as a 'best bet'.

Why is this important to you? Well, first it shows that Google.com is great in part because of the army of humans who review search activity, likely on a daily basis. Oh, sure, they have automated tools to help them out - with maybe 100 million queries every day, you'd need to automate too. They look for interesting trends and search behavior that lets them provide better answers.

Secondly, you can do the same sort of thing at your organization. Autonomy, Exalead, Microsoft, Lucene, and even the Google Search Appliance, can all be improved with some custom code after the user query but before the results show up. Did the user type what looks like a name? Check the employee directory and suggest a phone number or an email address. Is the query a product name? Suggest the product page. You can make your search psychic.

Finally, does the query return no hits? You can tell what form the user was on when the search was submitted - rather than a generic 'No Hits' page. Was the query more than a single term? Look for any of the words, rather than all; make a guess at what the user wanted, based on the search form, pervious searches, or whatever context you can find.

So how do you make your search engine seem psychic? Learn about query tuning and result list pre-processing; we've written a number of articles about query tuning in our newsletter alone.

But most importantly: mimic Google: work hard at it every day.

/s/Miles

 

 

 

 

October 05, 2010

Google plans to make display ads as crucial as search advertisements

Google executives claimed that display ads will become as crucial to its business as search advertisements are during the keynote session of a international interactive advertising awards competition. They predicted that "smart and sexy" rich media ads will make the static ad banner become a thing of the past, and that in five years the online display market will grow from $20 billion to a $50 billion business, 75% of ads will be "social" (meaning that people can comment on them), and that people will be able to subscribe to them (receive notices when similar ads are available to watch).

Caroline McCarthy in Google: We're too sexy for your search talks about how Google is "unapologetically and enthusiastically optimistic about this space." Amiri Efrati in Google Wants to Make Online Display Ads ‘Sexy’ and Mike Shields in Google Sees 'Smart and Sexy' Future for Banner Ads describe a TrueView ad format for Youtube. Its designed to give viewers the option to skip an ad (after 5 seconds) that they don't want to watch, and to chose from multiple ads which one they want to watch (similar to Hulu). They will alter creative elements of an ad in real-time, depending on factors like the viewers location, the web sites content, and the time of day. Advertisers will only pay if a user decided to view their ad.

A YouTube executive stated that while television networks generally make more money by showing more ads, online video will reverse that trend. Google also predicted that 50% of all targeted ads will use a real time bidding system. In their case they will use technology from last years purchase of ad company Teracent .

September 08, 2010

Google Instant: Predictive queries

Google today announced a pretty cool capability that looks like instant results - as you type letters in the search box, the results show up immediately. I've liked this capability in Outlook for a while: in fact, sometimes I have found myself typing a query in Google Mail and waiting for results that never show up until I press Enter.

Actually, the new capability is based on predicting what the query will be, and displaying the results (and ads) for the words Google thinks you'll want. Try this query shown during today's announcement on YouTube:

Type the letters N and Y: Given those two letters, Google predicts that you will type 'Times' next, so it displays the results for the New York Times. However, if you were to hit Enter rather than Tab (to complete ther predictive query), you get a different set of results.

One thing that may impact SEO guys: as you type, the pay-to-click ads you see change along with the results.

Predictive entry is probably much easier to build than returning results based on a single initial letter. Still the guys at Google have done another pretty cool capability. Ajax again shows how useful it can be!

What was kind of funny was a quote they used more than a few times in the announcement: 'Never underestimate fast'. Well said...

August 27, 2010

There's an Ant on your Southwest Leg!

The WSJ has an interesting article on how language effects how we think.  I particularly liked the example of a indigenous language where anything you discuss involves absolute cardinal directions (north, south, east, west etc.). You literally can't say "There is an ant on one of your legs". Instead you say something like "There's an ant on your southwest leg." To say hello you'd ask "Where are you going?", and an appropriate response might be, "A long way to the south-southwest. How about you?" If you don't know which way is which, you literally can't get past hello. 

Dr. Kevin Lim reviewed Search Engine Society , a book which explores the effect search engines have on politics, culture and economics. He is not your typical reviewer since he also mentioned in the book, due to his recording a large part of his life using cameras (one he wears, another at his desk points at him) while a GPS device tracks his movements.

Google throws its weight behind Voice Search by Stephen Lawson discusses how voice search is based on statistical models of what sequences of words are most likely to occur, and how they train a new language model. Another example of that would be Midomi , a web site where you search for music by singing a fragment of the song. 

Multilingual Search Engine Breaks Language Barriers discusses how the the UNL Society uses the pivot language UNL to return a precise answer in the language in which the question was formulated. This seems to be still a research project, with some related projects such as LACE trying to extract data from parallel corpora as a cheaper way to populate a lexical database.

XBRL Across The Language Divide by Jennifer Zaino discusses how XBRL (eXtensible Business Reporting Language) may be one of the few areas that benefits from the Monnet project , which attempts to "provide a semantics-based solution for accessing information across language barriers". It tries to "build software that breaks the link between conceptual information and linguistic expressions (the labels that point back to concepts in ontologies) for each language." When that works, it makes it easier and quicker to perform analytics across multiple languages.

The Cross-Language Evaluation Forum (CLEF) is working on infrastructure for testing, tuning and evaluation of systems that retrieve information in European languages, and benchmarks to help test it. One of its papers for example, compares lexical and algorithmetic stemming in 9 languages using Hummingbird SearchServer

August 19, 2010

Microsoft has a ways to go in search...

So I discovered an article on a Microsoft forum today where someone was asking about the differences between the different versions of enterprise search. I posted a reply, with some link suggestions and a pointer to a previous posting here on our own blog.

Now, because of all the work we do with Microsoft's FAST product, some people think we see no wrong in Redmond. Well, read on.

A few minutes later, I wanted to go back and re-read the original posting; but, try as I might, I was unable to find the posting on the Microsoft forum search. The original question had a number of relatively unique terms, so I tried again. And again. No luck anywhere on the Microsoft MSDN site.

(By the wait, it sometimes took up to 30 seconds to get a result back- something on the system social.msdn.microsoft.com takes forever. But the search itself, when it came back, reported it only took 0.2 seconds' so i felt much better. NOT. I noticed that if I hit the 'Search' button again in frustration after a long wait, the result came back immediately. Someone at Microsoft needs to be looking at this!)

I went back to the Google public site and, by using a bunch of unique terms, found the original post. My search? fs4sp fs14 fsis fsia reference. Only one document comes back even in Google, which may be a record.

The same search on the Microsoft forums returns ZERO hits - ironic since the document is posted on the Microsoft discussion forum. Bing returns a Japanese language page; and, to no surprise, Yahoo returns the same page. Both, by the way, are an HTTP error 403 page.

So it looks like Microsoft has its work cut out for it in the public-facing web search arena: If it cannot locate a posting (from April!) on its own forums, how can it hope to compete with Google?