CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source Export

Algorithmica, Vol. 31, No. 3. (21 July 2001), pp. 318-360.

Citation Format

[Posts]

View FullText article


steve_chestnut's tags for this article

aofa dst

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Abstract.    For a Markovian source, we analyze the Lempel—Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (i-1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, called the Gilbert—Kadota model, a fixed number of phrases is generated according to the Lempel—Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel—Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching, and pattern matching. In this paper we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution, which leads to bounds on the average redundancy of the Lempel—Ziv code. For the Markov Independent model, this finding is established by analytic methods (i.e., generating functions, Mellin transform, and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.