CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

by: Fernando I. Carrasco, Stephen Smith, Alexandros Stamatakis
BMC Bioinformatics, Vol. 12, No. 1. (13 December 2011), 470, doi:10.1186/1471-2105-12-470  Key: citeulike:10126566

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

BACKGROUND:The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood.RESULTS:We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Gamma model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems.CONCLUSIONS:We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.


nsm120's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.