Tropo / Dave / Bookmarks : simhash

Locality Sensitive Hashing
    If you happened to have perused the paper Detecting Near-Duplicates for Web Crawling which I mentioned in an earlier post, you may be curious about exactly why the simhash algorithm actually works. The details of simhash are left for the reader to discover in a cited paper by Moses Charikar entitled “Similarity Estimation Techniques from Rounding Algorithms”. Unfortunately, Charikar doesn’t use the term simhash in his paper, and it might not be obvious to the reader how to connect the dots. Connecting those dots isn’t necessarily easy. But, here are a few bits and pieces of information that might help.
    http://www.coolsnap.net/kevin/?p=23
    tags: simhash

 


Search for simhash on del.icio.us