CiteULike is a free online bibliography manager. Register and you can start organising your references online.

A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. Export

BMC Evolutionary Biology, Vol. 8 (16 December 2008), 331.

Citation Format

[Posts]

View FullText article


BergmanLab's tags for this article

2008-2-mdb amino-acid-usage evolutionary-rate

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Background: Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modeled using a rates-across-sites distribution such as the gamma (Gamma) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of this phenomenon on phylogenetic estimation. Results: We analyzed 21 large protein alignments with two statistical tests designed to detect deviation of site-specific amino acid distributions from data simulated under the standard empirical substitution model: JTT+ F + Gamma. We found that the number of states at a given site is, on average, smaller and the frequencies of these states are less uniform than expected based on a JTT + F + Gamma substitution model. With a four-taxon example, we show that phylogenetic estimation under the JTT + F + Gamma model is seriously biased by a long-branch attraction artefact if the data are simulated under a model utilizing the observed site-specific amino acid frequencies from an alignment. Principal components analyses indicate the existence of at least four major site-specific frequency classes in these 21 protein alignments. Using a mixture model with these four separate classes of site-specific state frequencies plus a fifth class of global frequencies (the JTT + cF + Gamma model), significant improvements in model fit over the standard JTT + F + Gamma model for real data sets can be achieved. This simple mixture model also reduces the long-branch attraction problem, as shown by simulations and analyses of a real phylogenomic data set. Conclusions: Protein families display site-specific evolutionary dynamics, departing from standard protein phylogenetic models. Accurate estimation of protein phylogenies requires models that accommodate the heterogeneity in the evolutionary process across sites. To this end, we have implemented the class frequency mixture model proposed here in a freely-available program called QmmRAxML for phylogenetic estimation.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.