40 posts categorized "Web/Tech"

May 26, 2010

Aardvark's interesting blend of Search and Social Networking

Vark.com (Aardvark) doesn't use search to try and directly answer your questions, instead they just use search to route your question to humans who might be able to answer it.  So this is two levels removed from the classic search engine usage model:

1: It's not trying to answer your question directly with search, it's just trying to find a person who might be able to answer it.

2: It assumes a high initial raw failure rate (for example a high percentage of people are probably busy doing something else), so it builds-in retry routing logic. It even allows humans to help with routing.

There's a lot of hidden details in that retry logic, and a lot if it leverages social networking software, looking at user profiles, friends of friends, previous successes and failures, etc.  And then many of those steps also mix in keyword search and related algorithms. On the surface might seem like a "simple" hybrid, once you see it spelled out, but they've done a lot of nice work on the details, which I suppose are proprietary.

A bit of etiquette - there's an assumption that you've already tried to find the answer on your own, perhaps with a Google, Yahoo or Bing search. You shouldn't be wasting other humans' time with questions that machines could have easily answered. Vark could qualify as a "research" engine, not just a search engine.

What Vark's network of gray-matter resources is best reserved for is the "why" or "how" or "what's the difference between..." type questions. These are the types high level or "wisdom" type questions the keyword and NLP search engines still struggle with and Vark's come up with a nice compromise.

This type of "expert locator" system has certainly been tried before, especially in the Enterprise Search market. Those older systems had the end goal of "fixing you up" with an expert via email. Vark's managing of the actual questions and answers nice and I imagine this will be the norm in enterprise offers at some point, barring any intellectual property issues. Heck, I think Vark could offer their own enterprise version.

I'd be curious to see Vark actually give me the option of searching over the previously asked set of public questions and answers. Maybe if an immediate answer isn't forthcoming from the folks Vark has asked, it could come back and at least offer to run the search as a "plan B". If it's clearly labeled and optional, and  asking another human remains the primary objective, I think folks might like it.

They've also building a valuable database of questions and answers to analyze and learn from. Q&A search engines have a particular problem with Vocabulary Mismatch. The specific words people use to ask questions are different than people answering them. Some of this is linguistic in nature, and other times experts just use fancier or more precise terms than the novices asking the questions. I imagine they could mine their corpus and derive some useful relations. Even better, when vark lets a user re-ask a question, they have a chance to have multiple answers to the EXACT same question. And a multilingual vark could do this for multiple languages. Presumably this is all in their business plan, plus stuff I can't fathom. Cool!

I hope vark can continue to attract smart people!

May 17, 2010

Searching an Encrypted Cloud

Luke O'Connor has an interesting post about encrypted search . It discusses the “fully homomorphic” encryption system devised by IBM researcher Craig Gentry that was in the news last year, and the outlook for encrypted search. It was strictly a theoretical breakthrough, Gentry estimated that performing a Google search with encrypted keywords would increase the amount of computing time by about a trillion.

In A Step Toward Better Cloud Security: Searchable Encryption Abel Abram discusses a paper from the Microsoft Research Cryptography Group proposing a virtual private storage service as a solution.

Ray Lucchesi's post about securing the cloud also discusses several approaches towards searching encrypted data. So far there don't appear to be any viable approaches, all take tens of seconds for a single word search.

May 11, 2010

Plink Acquired - Should Improve Google Goggles

Google Goggles is a visual search tool for smart phones. Its one of several recent search enhancements for Google Mobile mentioned in this post by Greg Sterling. A recent post of his discusses Google's acquisition of Plink (a UK based startup that developed a visual search engine for Android) four months after the company’s public launch, and Google's plan to use the team to improve Google Goggles.

PlinkArt won the ‘peoples choice’ award in the IQPrize last year and then $100,000 in Google’s second Android Developer Challenge.

May 10, 2010

Google's Opt-Out Option for Behavioral Targeting

Last month Google announced that they would provide a browser plug-in to allow users to opt-out of Google Analytics tracking.  Joseph Stanhope's post explained why it was highly doubtful that this would do substantial harm to Google Analytics and its customers. Several posts such as one by Felipe Miyata suggested that this was an “insurance” move to silence opposition from privacy supporters, perhaps in preparation for doing more web analytics within the U.S. Federal Government.

Anil Batra's post has a quite different explanation - he suggests that it is really an attempt by Google to make more money by taking another step towards behavioral targeting.

March 25, 2010

The "Sliced Raw Fish Shoes it Wishes" - the Google Green Onion thing!

Google Translate is one example of how Google is successfully using brute-force computing power on complex problems. Google's Computer Might Betters Translation Tool

January 27, 2010

A new acquisition?

I don't like talking about rumors: they are often wrong to start with, and the deals are as delicate as eggshells until the deal is complete. And when you predict one, you look silly when you are wrong.

Given all that let me be as vague as I can...:)

Key folks at two different companies we work with have told me in the last few days that a well known search company is going to be acquiring a smaller consulting firm with deep connections in the US federal government market. The holdup seems to be with the legal team at another search company which the  consulting firm represents: apparently the second search company isn't wild about a major competitor being part of its partner program.

The funny part is that the company rumored to be the acquiring company may be more interested in the sales channel the consulting firm has, rather than its broad expert consulting group or its interesting new product line.

Stay tuned. When (if?) it breaks, all will become clear. And if it drops through, you'll hear it here. I promise.


(Just in case you're wondering, none of these parties is New Idea Engineering...)

January 20, 2010

Google I/I Open for registration!

Google has announced its Google I/O 2010 to be held in San Francisco May 19-20 at the Moscone Center.

I think this is their third such annual event, and it's always been a full two days of information. The good news is the price is $400 per person (until April 15), a bargain really. The bad news? You'll need to bring four or five people from your company to hit all of the sessions in each track!

This conference is VERY technical, VERY good. You get the most from it if you are a developer, you know Java, Ajax, Python, or the other technologies Google uses in its various products. You won't find much in the way of marketing fluff here: in our experience, most presenters are Google developers.

The conference is being held the same week that Gilbane content management conference comes back to San Francisco. Bad timing for them, but good for you: you can probably walk to the nearby Westin at lunch and maybe catch the exhibits.

Last year, attendees received a free phone for development purposes on the Android OpSys; who knows what they might give away this year - besides the expected cool T-shirt!

Register at http://code.google.com/events/io/2010/.

September 08, 2009

Do you drive on freeways?

We've worked with most of the major commercial search vendors for a long time. We can go back and talk about companies that were once leaders in the space, companies most people have never heard of: Excalibur... Conquest.. Fulcrum... Verity, and more. We continue to work with the best of commercial and open source technologies to give our customers solutions that meet their needs.

A major trend we've written about before and the we see continuing over the next couple of years is the significant reduction in price for what are now best in breed technologies in the space. This is being driven by of couple of factors, including increasing functionality in open source alternatives Lucene and Solr; and the acquisition of FAST by Microsoft, with the anticipated integration of FAST ESP into SharePoint, which many feel will result in a much lower price point.

Lately, we've seen a few major vendors engaging in some pretty severe obfuscation in their licensing parameters. I'm not sure it's a remnant of the 'good old days', or a last-ditch attempt to extract as much revenue as possible before the inevitable collapse in licensing costs we've talked about before. Let me explain, by way of analogy.

You want to buy a new car. You tell the sales person your budget range, and she shows you a model that is about three times what want to spend. When you point this out to her, she acknowledges the 'oversight' in passing, and suggests that if you don't need the backseat, she could take 10% off the cost. And if you insist, she could sell you a car with no reverse and save you maybe an additional 5%. And if you were willing to get in through the window, she could weld the door shut and reduce the price a bit more. Her final offer, still about 15% above your price range, would be for a car with no motor. Are you ready to buy?

Down the street at another dealer, you start over. His burning question is 'What kind of roads do you drive on?'. You see, if you never plan to exceed 55 miles per hour, he can sell you the car for just about what your budget is. You decide that's a pretty good deal, so you buy the car. A month later, you get a call from their global maintenance organization: it seems that you have actually driven your car closer to 65 MPH on several occasions, and your new price is 25% more than you paid. You have 30 days to send in the different, or your car will stop working. Can you hear me now?

A final dealership you now wish you had visited can pretty much give you the car for free. You'll have to assemble it, of course, but you can make it do anything you want. There's another company you can go to that will put it together for you - in fact, there are a number of them. One will even assemble it for you, and charge you an annual fee in case you have any issues. And their guys review the design on most of the parts, so you know they are pros and you can trust them.

Glad that search engines are not like cars? Or are they? This is one reason we really encourage you to use a skilled, competent partner to specify your requirements and to help you navigate the sometimes treacherous waters of acquiring enterprise search technology.



June 09, 2009

Enterprise search doesn't mean mortgaging the farm

Lynda Moulton, the Search Practice analyst at CMS firm Gilbane Group, really hit the mark on a recent blog post nominally about how advertising money can but editorial space. While it's true that many  publications (and analyst firms) are happy getting paid by both sides. (Note: I was called out on this by this by Theresa Regli of CMS Watch, so I no longer say 'all analysts' for anything!)

In my opinion, the real news in Lynda's post is this: "there are dozens of enterprise search solutions that will serve you extremely well, with much lower cost of ownership" than with the big industry players. In fact, open source is beginning to penetrate the corporate veil, and while Lucene and Solr are not right for everyone, it looks like they've just about implemented what Mark and I consider Verity's  "Topic 1.0" capabilities circa 1990. We went to a meet-up the other night that Mark has written about; and we were pretty impressed.

So before you decide you need to budget a half million dollars or more for search, consider what Walter Underwood, chief architect of Ultraseek and now search evangelist at Netflix once told me. Paraphrasing: "You can download Solr then spend a ton of money customizing it; or you can spend a ton of money licensing enterprise search software, then spend a ton of money installing and customizing it. Your call."

But get help!


June 08, 2009

Enterprise Search Engine Optimization: eSEO

Last week at the Gilbane Conference in San Francisco, I participated in a panel "Search Survival Guide: Delivering Great Results" moderated by Hadley Reynolds of IDC. In the presentation, I offered a new view on improving enterprise search engine relevancy that I call eSEO.

The term SEO is well understood by - and widely practiced in - the corporate world.  The concept of SEO, as summarized by one of the Gilbane talks, states that "Key to the value of any Web content is the ability for people to find it”. In the SEO world this is done by combining organic results and keyword placement - advertising - to improve placement, maintain ranking, and monitor search engine position - results- over time.

While we've been helping our customers improve their enterprise search results, it's hard to convince them that search results are not a problem they can solve once. I've decided to apply a new term to this process - Enterprise Search Engine Optimization, or eSEO. To paraphrase the role of SEO, eSEO is the process of combining organic results and best bets to deliver correct, relevant, timely content to enterprise search users - employees, customers, partners, investors, and others.

For both organic and best bets, the first step is to identify what we call the "top 100" queries. Start by creating a histogram that shows the top terms from your search engine. I hope you'll agree that if the top queries - whether 100, 50, or even 20 - deliver great results, you're on your way to having happy users. Talk to your content owners as you review the histogram, and ask them to identify the best result for each.

Once you have a list of queries and results, start the two step process: tune the search engine using its native query tuning capabilities. This will impact the shape of the histogram, and over time should start delivering better results. The bad news is tuning like this doesn't position all of your top terms, and it would be silly to try to micro-manage the results for each. That's why search engines have best bets.

When you feel pretty good about the curve through query tuning, it' time to start setting up best bets - the "ad words" of eSEO. Limit the number of bests bets to one or two at most - but remember that you can use other real-estate like the rightmost column of the screen to suggest additional content. Some guidelines for best bets:

  • Use one or at most two best bets
  • Don't repeat a document already at the top of the organic results
  • Make sure your best bets respect security

Once you have tuned your search engine, and set up best bets for the most timely and actionable result, you're ready to roll it out. But then the ongoing part comes in: you need to review your search activity and best bets periodically. Usually, we'd suggest once a month for a while, then perhaps quarterly thereafter. You may find seasonal variations, and if you're not watching you'll miss a golden opportunity.

In Summary

1. eSEO is just as critical as SEO

  • Lost time and revenue
  • Legal exposure

2. Watch for trends over time: Search is not "fire and forget"

3. Make sure SEO doesn't impact your eSEO

  • Use fielded data that web search engines ignore for your tuning (i.e., 'Abstract' rather than 'Description'.

This will get you started; but because your queries and your content changes over time, it's a never-ending story. Some companies - ours included - have tools that can help. But no matter what, hang in there!