A Novel Computational Method Identifies Intra- and Inter-Species Recombination Events in Staphylococcus aureus and Streptococcus pneumoniae
Advances in high-throughput DNA sequencing technologies have determined an explosion in the number of sequenced bacterial genomes. Comparative sequence analysis frequently reveals evidences of homologous recombination occurring with different mechanisms and rates in different species, but the large-scale use of computational methods to identify recombination events is hampered by their high computational costs. Here, we propose a new method to identify recombination events in large datasets of whole genome sequences. Using a filtering procedure of the gene conservation profiles of a test genome against a panel of strains, this algorithm identifies sets of contiguous genes acquired by homologous recombination. The locations of the recombination breakpoints are determined using a statistical test that is able to account for the differences in the natural rate of evolution between different genes. The algorithm was tested on a dataset of 75 genomes of Staphylococcus aureus and 50 genomes comprising different streptococcal species, and was able to detect intra-species recombination events in S. aureus and in Streptococcus pneumoniae. Furthermore, we found evidences of an inter-species exchange of genetic material between S. pneumoniae and Streptococcus mitis, a closely related commensal species that colonizes the same ecological niche. The method has been implemented in an R package, Reco, which is freely available from supplementary material, and provides a rapid screening tool to investigate recombination on a genome-wide scale from sequence data. The extent to which recombination occurs in natural populations is either unknown or controversial but it is widely accepted that recombination plays a crucial role in the evolution of many bacterial species. Numerous methods have been developed for the investigation of recombination events, but most of them require expensive computations and are applicable only to a limited number of genomes or to short nucleotide sequences. Here we present a new algorithm designed to identify recombination events affecting a group of adjacent genes. The procedure is based on the comparison of gene sequences and requires as input the matrix of gene conservation of a test genome against a group of reference genomes. The method is fast, and has minimal computational requirements. Therefore, it can be applied to datasets composed of a large number of complete genomes, and can be easily adapted to analyze data directly from high-throughput sequencing projects. We applied the algorithm to a dataset of S. aureus and streptococcal genomes and we found evidence of yet undetected inter and intra-species recombination events, suggesting that the use of Reco will shed new light on the evolution of bacterial species, and provide important information to improve classification criteria of bacterial species.