An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences.
Positive selection or adaptive evolution is thought to be responsible, at least some of the time, for the rapid accumulation of advantageous changes in protein-coding genes. The origin of new enzymatic functions, erection of barriers to heterospecific fertilization, and evasion of host response by pathogens, among other things, are thought to be instances of adaptive evolution. Detecting positive selection in protein-coding genes is fraught with difficulties. Saturation for sequence change, codon usage bias, ephemeral selection events and differential selective pressures on amino acids all contribute to the problem. A number of solutions have been proposed with varying degrees of success, however they suffer from limitations of not being accurate enough or being prohibitively computationally intensive. We have developed a character-based method of identifying lineages that undergo positive selection. In our method we assess the possibility that for each internal branch of a phylogenetic tree an event occurred that subsequently gave rise to a greater number of replacement substitutions than might be expected. We classify these replacement substitutions into two categories - whether they subsequently became invariable or changed again in at least one descendent lineage. The former situation indicates that the new character state is under strong selection to preserve its new identity (directional selection), while the latter situation indicates that there is a persistent pressure to change identity (non-directional selection). The method is fast and accurate, easy to implement, sensitive to short-lived selection events and robust with respect to sampling density and proportion of sites under the influence of positive selection.