Population genomic inference of recombination rates and hotspots
As more human genomic data become available, fine-scale recombination rate variation can be inferred on a genome-wide scale. Current statistical methods to infer recombination rates that can be applied to moderate, or large, genomic regions are limited to approximated likelihoods. Here, we develop a Bayesian full-likelihood method using Markov Chain Monte Carlo (MCMC) to estimate background recombination rates and hotspots. The probability model is inspired by the observed patterns of recombination at several genomic regions analyzed in sperm-typing studies. Posterior probabilities and Bayes factors of recombination hotspots along chromosomes are inferred. For moderate-size genomic regions (e.g., with <100 SNPs), the full-likelihood method is used. Larger regions are split into subintervals (typically each having between 20 and 50 markers). The likelihood is approximated based on the genealogies for each subinterval. The background recombination rates, hotspots, and parameters are evaluated by using a parallel computing approach and assuming shared parameters across the subintervals. Simulation analyses show that our method can accurately estimate the variation in recombination rates across genomic regions. In particular, clusters of hotspots can be distinguished even though weaker hotspots are present. The method is applied to SNP data from the HLA region, the MS32, and chromosome 19.