Full modeling versus summarizing gene-tree uncertainty: Method choice and species-tree accuracy
With the proliferation of species-tree methods, empiricists now have to confront the daunting task of method choice. Such decisions might be made based on theoretical considerations alone. However, the messiness of real data means that theoretical ideals may not hold in practice (e.g., with convergence of complicated MCMC algorithms and computational times that limit analyses to small data sets). On the other hand, simplifying assumptions made by some approaches may compromise the accuracy of species-tree estimates. Here we examine the purported tradeoff between accuracy and computational simplicity for species-tree analysis, focusing on the different ways the approaches treat gene-tree uncertainty. By considering a diversity of species trees, as well as different sampling designs and total sampling efforts, we not only compare the accuracy of species-tree estimates across methods, but we also partition the variation in accuracy across factors to identify their relative importance. This analysis shows that although the method of analysis affects accuracy, other factors – namely, the history of species divergence and aspects of the sampling design – have a larger impact. Despite a full modeling of gene tree uncertainty (e.g., using a Bayesian framework), species-tree estimates may not be accurate, particularly for recent diversification histories. Nevertheless, we demonstrate how factors within the control of the empirical investigator (e.g., decisions about sampling) improve the accuracy of species tree estimates, and more so than the method of analysis. Lastly, with much of the attention on species-tree analyses focused on the discord among loci arising from the coalescent, this work also highlights a previously overlooked key determinant of species-tree accuracy for recent divergences – the level of genetic variation at a locus, which has important implications for improving species-tree estimates in practice. âº Method of analysis is not the primary determinant of the accuracy of species trees. âº Methods that fully model gene-tree uncertainty are not necessary when loci are informative. âº Limited genetic variation is a key factor determining species-tree accuracy. âº Modeling gene-tree uncertainty improves accuracy, but species trees may be inaccurate.