A quantitative comparison of DNA sequence assembly programs.
We have compared 11 sequence assembly programs for the accuracy and reproducibility with which they assemble DNA fragments into a completed sequence. To test the assemblers under controlled conditions, the rat multidrug resistance (RATMDRM) gene sequence was randomly divided into overlapping 200- to 400-base fragments. Various degrees of error, in the form of miss-identified bases, missed bases, and duplicated bases, were randomly added to these fragments. The probability of an error, and the type of error, was modified using an error distribution template that was developed by comparing the original fragments used to sequence RATMDRM with the final, edited sequence stored in GenBank. From 0 to 15% error was then added to independent sets of fragments, and assemblage was attempted. The quality of the assemblages was evaluated by comparing the number of differences between the assembled sequence and the original sequence. Tests were also done to determine if the order in which fragments were added to a project affected the final sequence and if the quality of assemblage was sequence dependent. Similar results were also obtained using other, unrelated sequences. The programs could be roughly divided into three groups based on the accuracy and reproducibility of assembly. Three (GCG, FAB, and AutoAssembler) consistently produced consensus sequences of low error and high reproducibility. Intermediate results were obtained with five other programs (Sequencher, AssemblyLIGN, XBAP, SeqMan, and AutoAssembler in a mode that made use of an external special processor). Less satisfactory results were obtained with the remaining three programs (GeneWorks, GENeration, and PC/Gene). The ability of the programs to edit the assembled sequence was also compared. Five of the programs were able to display and edit automatic sequencer trace files. The Sequencher program had a particularly well-designed sequence editor that allowed rapid examination and correction of assembly errors.