« August 2010 | Main | October 2010 »

6 posts from September 2010

September 30, 2010

What we did over our summer vacation

We've been very fortunate that the last several months have been unusually busy for us - in fact, almost too busy. We've been active in a number of interesting and demanding customer projects, and have invented some very powerful and useful technology along the way which we'll have more to say about over the next month or two.

However, it's time for us to formally come clean about the thing that really took up our evenings and weekends for most of the last year: 'the book'.  Since early last year, we've been working on a book project with Jeff Friend (Microsoft) and Natalya Voskresenskaya (Acrovis). The book, Advanced Microsoft Search: FAST Search, SharePoint Search, and Search Server, is finally available. The ISBN number is 0470584661 and you can order it from Amazon today.

A few good friends have known of the book for a few months now, and while we all felt we finished the book in record time, we're sure it seemed like geologic time to our editors and some potential readers. And I'd be remiss if I did not give special thanks for John Kane (HP), Carl Grimm (Avanade), and Jason Noble (Neudesic) for their roles as technical reviewers.

We cover the entire Microsoft search family with technical coverage on both the SharePoint and the ESP product families. We also have a few chapters on the business side of search: Centers of excellence, operations, selection, and so forth. We even have chapters that deal with ESP under Linux, possibly a first for a Microsoft search book!

We invite you to have a look at our work; and let us know how we did. It took me 20 years to forget what a time sink it is to write a book; and yet now that it's done, we're already talking of a fourth. I guess my memory isn't what it used to be!

Enjoy!

/s/Miles

Solritas

Erik Hatcher wrote about Solritas: Solr 1.4′s Hidden Gem last year. Solritas is a fancy name for VelocityResponseWriter, derived from the the word Celeritas . It provides a simple Velocity template based translation layer that you can use to build a search user interface within a Solr environment.

Its enabled by default in LucidWorks for Solr 1.4. Eric Pugh discusses some of its improvements in Notes from using LucidWorks for Solr Distro. It doesn't support auto-completion out of the box. This thread gives some examples of how to use jQuery's auto-complete with it.

Solritas is also mentioned in Erik Hatcher's post on Solr Search User Interface Examples and in the slides for the Rapid Prototyping with Solr presentation.

September 08, 2010

Google Instant: Predictive queries

Google today announced a pretty cool capability that looks like instant results - as you type letters in the search box, the results show up immediately. I've liked this capability in Outlook for a while: in fact, sometimes I have found myself typing a query in Google Mail and waiting for results that never show up until I press Enter.

Actually, the new capability is based on predicting what the query will be, and displaying the results (and ads) for the words Google thinks you'll want. Try this query shown during today's announcement on YouTube:

Type the letters N and Y: Given those two letters, Google predicts that you will type 'Times' next, so it displays the results for the New York Times. However, if you were to hit Enter rather than Tab (to complete ther predictive query), you get a different set of results.

One thing that may impact SEO guys: as you type, the pay-to-click ads you see change along with the results.

Predictive entry is probably much easier to build than returning results based on a single initial letter. Still the guys at Google have done another pretty cool capability. Ajax again shows how useful it can be!

What was kind of funny was a quote they used more than a few times in the announcement: 'Never underestimate fast'. Well said...

September 04, 2010

Faster sorting for Farsi / "Iranian", Danish, Turkish, other atypical languages in Lucene/Solr

By default search engines sort results by relevance or "score", to try and bring the best match to the top of the results list. That's normally what users want, but occasionally you might want to sort by a different field, such as date, title or author. Lucene and Solr support this in various ways, as do many other search engines.

When it comes to sorting by titles or author names, most languages sort words with similar rules, and this is the character ordering that's built into Unicode. But a few languages are different, they may have different policies on accented characters, for example. Java includes to concept of "locale" to represent some language differences, such as currency and date formats, and it can also encode these differences in preferred order. However, apparently the performance isn't great, so sorting in some languages can be slow, or there may not be a locale for a specific language/dialect.

Lucene does include an alternate "collator" class that claims to fix this. It allows for non-default Unicode sorting rules, without the slowdown normally associated with locales. The doc mentions Farsi, Danish and Turkish as examples. Although I haven't tried it, since it's buried a bit in the code tree, I wanted to surface it in a post.

The top URL (in case formatting gets lost) is:

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/collation

Usage scenarios are given in package.html

 

September 03, 2010

Domain Name Registrar Search Tweak: Indicate that you already own It in Search Results

Many companies own lots of domain names, and manage them on one or two registrars.

When you do a search for a new domain it'd be nice if they listed domains that you already own differently from the domains owned by others. It's a little tricky for them sometimes, with different account associations or something.  It doesn't look like they do, at least the ones I've played with.

It's not really the main domains you'd need help with, most people know their key domains by heart, but it's all those other domain suggestions they mix into the results. Their results include different suffixes or word variations. Some of these are only suggested if they're available, but they also show the top level domains with a clickable check box or red X .

So a search on a registrar can show 30 domains on a screen, some with red X's, even if they're taken by you.

If some already do give us a comment.

September 01, 2010

Today's Search Term: hybrid search

hybrid search
Synonyms:  fielded search, filtered search
Related Terms:  taxonomy, parametric search, faceted search, scope of search
A search that includes both full-text and traditional database search criteria. For example, a tech support person could look for "installation errors" (full-text) within a particular product line (more like a traditional database field search). By combining together the additional criteria of "product='accounting software'", the tech support person gets a more targeted scope of search, and is more likely to find the installation error they were looking for. Another example, an analyst might search for "depreciation allowance" (the full-text) within a particular jurisdiction (a traditional database-like field). By adding the filter "state='FL'", the analyst gets a more targeted scope of search, and is more likely to find relevant documents.