Discussion:
clustering info
Luke Tucker
2008-08-13 15:40:45 UTC
Permalink
Thanks to anil for sending this along

http://gatekeeper.research.compaq.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html/

- Luke



--
Archive: http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218642056754
To unsubscribe send an email with subject "unsubscribe" to melkjug-dev-***@public.gmane.org Please contact melkjug-dev-manager-ZwoEplunGu1pszqg2B6Wd0B+***@public.gmane.org for questions.
Anil Makhijani
2008-08-13 17:38:08 UTC
Permalink
More on this:

http://glinden.blogspot.com/2008/04/detecting-near-duplicates-in-big-data.html

http://www.conradweb.org/~jackg/pubs/SIGIR04_Conrad.pdf

http://code.google.com/p/simhash/
Post by Luke Tucker
Thanks to anil for sending this along
http://gatekeeper.research.compaq.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html/
- Luke
--
http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218642056754
To unsubscribe send an email with subject "unsubscribe" to
--
Archive: http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218649093016
To unsubscribe send an email with subject "unsubscribe" to melkjug-dev-***@public.gmane.org Please contact melkjug-dev-manager-ZwoEplunGu1pszqg2B6Wd0B+***@public.gmane.org for questions.
Loading...