Probabilistic Orthology Analysis
Orthology analysis aims at identifying orthologous genes and gene products from different organisms and, therefore, is a powerful tool in modern computational and experimental biology. Although reconciliation-based orthology methods are generally considered more accurate than distance-based ones, the traditional parsimony-based implementation of reconciliation-based orthology analysis (most parsimonious reconciliation [MPR]) suffers from a number of shortcomings. For example, 1) it is limited to orthology predictions from the reconciliation that minimizes the number of gene duplication and loss events, 2) it cannot evaluate the support of this reconciliation in relation to the other reconciliations, and 3) it cannot make use of prior knowledge (e.g., about species divergence times) that provides auxiliary information for orthology predictions. We present a probabilistic approach to reconciliation-based orthology analysis that addresses all these issues by estimating orthology probabilities. The method is based on the gene evolution model, an explicit evolutionary model for gene duplication and gene loss inside a species tree, that generalizes the standard birth–death process. We describe the probabilistic approach to orthology analysis using 2 experimental data sets and show that the use of orthology probabilities allows a more informative analysis than MPR and, in particular, that it is less sensitive to taxon sampling problems. We generalize these anecdotal observations and show, using data generated under biologically realistic conditions, that MPR give false orthology predictions at a substantial frequency. Last, we provide a new orthology prediction method that allows an orthology and paralogy classification with any chosen sensitivity/specificity combination from the spectra of achievable combinations. We conclude that probabilistic orthology analysis is a strong and more advanced alternative to traditional orthology analysis and that it provides a framework for sophisticated comparative studies of processes in genome evolution.