A Macaque's-Eye View of Human Insertions and Deletions: Differences in Mechanisms
Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments. Insertions and deletions (indels) represent a significant source of evolutionary change. In this manuscript, the authors investigate the patterns of genome-wide rate variation for indels that occurred in the human lineage since its divergence from chimpanzee. Earlier work suggested that insertion and deletion rates are correlated, implying that some genomic factors might affect both types of mutations and thus their patterns of variation across the genome. However, sequences evolving under and without selection were considered together. The present study represents the first attempt to quantify the levels of variation in neutral indel rates in the framework of multiple regression analysis. The finding that insertion versus deletion rates correlate with different genomic features suggests that these two types of mutation are caused in part by distinct molecular mechanisms. This conclusion has direct implications for understanding human genetic diseases, since a large number of them are caused by indels, and contributes to the growing recognition of the importance of fine-scale rearrangement in shaping genome evolution.