| |
BMC Bioinformatics, Vol. 11, No. 1. (2010), 54.
Abstract
BACKGROUND:With the continued development of new computational tools for multiple sequence alignment, it is necessary today to develop benchmarks that aid the selection of the most effective tools. Simulation-based benchmarks have been proposed to meet this necessity, especially for non-coding sequences. However, it is not clear if such benchmarks truly represent real sequence data from any given group of species, in terms of the difficulty of alignment tasks.RESULTS:We find that the conventional simulation approach, which relies on empirically estimated values for ...
|
| |
Molecular Biology and Evolution, Vol. 24, No. 3. (March 2007), pp. 640-649.
|
| |
Genome Biology, Vol. 9, No. 10. (08 October 2008), R147.
Abstract
Controlled simulations of genome evolution are useful for benchmarking tools. However, many simulators lack extensibility and cannot measure parameters directly from data. These issues are addressed by three new open-source programs: GSIMULATOR (for neutrally evolving DNA), SIMGRAM (for generic structured features) and SIMGENOME (for syntenic genome blocks). Each offers algorithms for parameter measurement and reconstruction of ancestral sequence. All three tools out-perform the leading neutral DNA simulator (DAWG) in benchmarks. The programs are available at http://biowiki.org/SimulationTools. ...
|
| |
Genome Research, Vol. 14, No. 12. (1 December 2004), pp. 2412-2423.
Abstract
10.1101/gr.2800104 It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can expect to get 98% of the bases correct in reconstructing megabase-scale euchromatic regions of an eutherian ancestral genome from the genomes of ∼20 optimally chosen modern mammals. Using actual genomic sequences from 19 ...
|
| |
J Mol Biol, Vol. 229, No. 4. (20 February 1993), pp. 1065-1082.
Abstract
The exhaustive matching of the protein sequence database makes possible a broadly based study of insertions and deletions (indels) during divergent evolution. In this study, the probability of a gap in an alignment of a pair of homologous protein sequences was found to increase with the evolutionary distance measured in PAM units (number of accepted point mutations per 100 amino acid residues). A relationship between the average number of amino acid residues between indels and evolutionary distance suggests that a unit ...
|
| |
Molecular Biology and Evolution, Vol. 23, No. 11. (November 2006), pp. 2090-2100.
Note (first note only)
Not that interesting of a paper, but they did use a hacked version of PAML to simulate alignments alongside sequences...
|
| |
Genome Research, Vol. 18, No. 11. (10 October 2008), pp. 1829-1843.
Abstract
10.1101/gr.076521.108 Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will facilitate a greater knowledge of the events that have driven evolution. We present a new evolutionary alignment modeler, called “Ortheus,” for inferring the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions. Based on a multiple sequence probabilistic transducer model of the type proposed by Holmes, Ortheus uses efficient ...
|
| |
BMC Bioinformatics, Vol. 7, No. 1. (24 October 2006), 471.
Abstract
BACKGROUND:There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.RESULTS:We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary ...
|
| |
J Comput Biol, Vol. 14, No. 4. (May 2007), pp. 446-461.
Abstract
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing the most likely scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, that we called the Indel Maximum Likelihood Problem (IMLP), is an important step toward the reconstruction of ancestral genomics sequences, and is important for studying evolutionary processes, genome function, adaptation and convergence. We solve the IMLP using a new type of ...
|
| |
BMC Bioinformatics, Vol. 7, No. 1. (08 June 2006), 292.
Abstract
BACKGROUND:Non-coding DNA sequences comprise a very large proportion of the total genomic content of mammals, most other vertebrates, many invertebrates, and most plants. Unraveling the functional significance of non-coding DNA depends on how well we are able to align non-coding DNA sequences. However, the alignment of non-coding DNA sequences is more difficult than aligning protein-coding sequences.RESULTS:Here we present an improved pair-hidden-Markov-Model (pair HMM) based method for performing global pairwise alignment of non-coding DNA sequences. The method uses an explicit model of ...
|
| |
Bioinformatics, Vol. 23, No. 3. (1 February 2007), pp. 289-297.
Abstract
Motivation: A quantitative study of molecular evolutionary events such as substitutions, insertions and deletions from closely related genomes requires (1) an accurate multiple sequence alignment program and (2) a method to annotate the insertions and deletions that explain the gaps' in the alignment. Although the former requirement has been extensively addressed, the latter problem has received little attention, especially in a comprehensive probabilistic framework. Results: Here, we present Indelign, a program that uses a probabilistic evolutionary model to compute the most ...
|
| |
Bioinformatics, Vol. 21 Suppl 3 (1 November 2005)
Abstract
MOTIVATION: Relationships amongst taxa are inferred from biological data using phylogenetic methods and procedures. Very few known phylogenies exist against which to test the accuracy of our inferences. Therefore, in the absence of biological data, simulated data must be used to test the accuracy of methods which produce these inferences. Researchers have limited or non-existent options for simulations useful for studying the impact of insertions, deletions, and alignments on phylogenetic accuracy. RESULTS: To satisfy this gap I have developed a new ...
|
| |
BMC Bioinformatics, Vol. 6, No. 1. (2005)
Abstract
BACKGROUND:General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets.RESULTS:We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model ...
|
| |
Bioinformatics, Vol. 14, No. 2. (1 March 1998), pp. 157-163.
Abstract
10.1093/bioinformatics/14.2.157 ...
|
| |
Mol Biol Evol, Vol. 25, No. 8. (1 August 2008), pp. 1576-1580.
Abstract
Multiple sequence alignment is an essential tool in many areas of biological research, and the accuracy of an alignment can strongly affect the accuracy of a downstream application such as phylogenetic analysis, identification of functional motifs, or polymerase chain reaction primer design. The heads or tails (HoT) method (Landan G, Graur D. 2007. Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol. 24:1380-1383.) assesses the consistency of an alignment by comparing the alignment of a set ...
|
| |
Mol Biol Evol (12 January 2008), msn008.
Abstract
Phylogenetic reconstruction based upon multiple alignments of molecular sequences is important to most branches of modern biology and is central to molecular evolution. Understanding the historical relationships among macromolecules depends upon computer programs that implement a variety of analytical methods. Because it is impossible to know those historical relationships with certainty, assessment of the accuracy of methods and the programs that implement them requires the use of programs that realistically simulate the evolution of DNA sequences EvolveAGene 3 is a realistic ...
|