Advanced Duplicate Detection (also related to spam detection and clustering)
We need to do a dedicated article about this area, but I wanted to share some material here that we have written about it, and that will likely re-appear in a future article.
In our recent newsletter article, we covered the problem of generic duplicate detection in search, and them duplicate detection in federated search.
A SearchDev posting Mark talked more about why checksums aren't always enough for duplicate detection, in messages 485 and 490
Comments