CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Finding Similar Files in a Large File System Export

In Proceedings of the USENIX Winter 1994 Technical Conference (JanuaryJuly--FebruaryJanuary 1994), pp. 1-10.

Citation Format

[Posts]

View FullText article


stagediver's tags for this article

data file filesystem search similarity

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

We present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time for finding all groups of similar files, even for as little as 25% similarity, is on the order of 500MB to 1GB an hour. The amount of...


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.