Workshop: Comparative assembly of metagenomic sequences
Next-generation sequencing technologies permit metagenomic studies to characterize the entire bacterial community within an environment by producing a large amount of short noisy DNA reads. One of the most challenging computational tasks is to assemble millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. Several de novo assembly methods geared towards single genome have been tuned and applied to metagenomic data set, but very little progress has been made to the comparative assembly for metagenomics. In addition, more and more bacterial genome sequences become available and provide a great opportunity to conduct reference-assisted assembly. In this project, we introduce a computational tool for comparative assembly of metagenomic sequences. Our software first selects reference genomes based on taxonomic profiles estimated from MetaPhyler, and then metagenomic reads are quickly mapped to the reference genomes. When building contigs, we employ a greedy solution of the minimum setcovering problem to produce longer contigs. Furthermore, we propose a hybrid assembly approach, which shows significantly better results than either comparative or de novo assembly does individually. We analyzed two mock and 728 real metagenomic samples from the Human Microbiome Project, and achieved comparable results with the state-of-the-art de novo assemblers. Through our proposed hybrid approach, we assembled 79% of the reads into contigs longer than or equal to 300bp long contigs.