CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Compactly encoding unstructured inputs with differential compression Export

Journal of the ACM, Vol. 49 (2002), pp. 318-367.

Citation Format

[Posts]

View FullText article


dmeister's tags for this article

backup datadeduplication deltaencoding

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

The subject of this paper is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains dlog ne bits), as compared to algorithms that run in O(n) time and O(n) space or in O(n 2) time and O(1) space. We introduce two new techniques for differential compression and apply these to give additional algorithms that improve compression and time performance. We experimentally explore the properties of our algorithms by running them on actual versioned data. Finally, we present theoretical results that limit the compression power of differencing algorithms that are restricted to making only a single pass over the data.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.