Detecting short passages of similar text in large document collections(2001)
|
Reviews
[Write a review of this article]
There are no reviews of this article
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
AbstractThis paper presents a statistical method for fingerprinting text. In a large collection of independently written documents each text is associated with a fingerprint which should be di#erent from all the others. If fingerprints are too close then it is suspected that passages of copied or similar text occur in two documents. Our method exploits the characteristic distribution of word trigrams, and measures to determine similarity are based on set theoretic principles. The system was developed...
BibTeX record
RIS record