CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Information Theory and Algorithmic Complexity: Applications to Language Discourses and DNA Sequences as Complex Systems Part II: Complexity of DNa Sequences, Analogy with Linguistic Discourses Export

pp. 153-183.

Citation Format

[Posts]

View FullText article


zono's tags for this article

no-tag

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Linguistic discourses and DNA sequences in molecular biology are treated as complex adaptive systems with interacting coexisting elements of order and randomness. Following a prescription for ‘effective complexity' of a system by Gell-Mann, we defined earlier a complexity function C for a linguistic discourse. C depends on two ‘order' parameters x and a , which in turn depend on two kinds of entropies, Shannon entropy and Algorithmic (Kolmogorov) entropy. Algorithmic complexity is used to define an Optimum Meaning Preserving Code (OMPC) which preserves the ‘meaning' of a particular word sequence, unlike the Shannon entropy. C tends to be 0 for systems of low as well as high order and is maximum (C = 1) for a mixture of order and disorder. The starting point for our analysis is the distribution of word frequencies, Zipf's law, which is a power law ( W ( k ) = B k -2 ), where W ( k ) is the frequency of words occurring k times and B a constant). In earlier papers, we deduced a modified version of Zipf's law (MPL) which was in better agreement with data from natural languages. The model used physical principles of maximum entropy and degeneracy from classical and quantum statistical mechanics. The model was extended to speech, a small invariant set of phonemes to obtain a law similar to the MPL, called the Cumulative Modified Power Law (CMPL), which adequately fits the phoneme rank frequencies. It was shown that the near maximal value of complexity (~1) is a consequence of Zipf's law. In this paper, we extend the above concepts to DNA sequences treated as strings of symbols from a four-letter alphabet (bases A, G, C, U). The genetic code is examined at three hierarchical levels of codons (64, 26, 21). Codon rank frequencies of 20 different species are shown to follow the CMPL. Entropy, order and complexity parameters for DNA are numerically similar to those obtained for language. Complexity ~1 for all 20 species spanning a wide range of evolutionary age. Some parameters show significant correlation with evolutionary age of the species.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.