![]() |
CiteULike | ![]() |
stagediver's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Finding Similar Files in a Large File Systemby: U. Manber
In Proceedings of the USENIX Winter 1994 Technical Conference (JanuaryJuly--FebruaryJanuary 1994), pp. 1-10.
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractWe present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time for finding all groups of similar files, even for as little as 25% similarity, is on the order of 500MB to 1GB an hour. The amount of...
BibTeX record
RIS record