Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation
As researchers begin probing deep coverage sequencing data for increasingly rare mutations and subclonal events, the fidelity of next generation sequencing (NGS) laboratory methods will become increasingly critical. Although error rates for sequencing and polymerase chain reaction (PCR) are well documented, the effects that DNA extraction and other library preparation steps could have on downstream sequence integrity have not been thoroughly evaluated. Here, we describe the discovery of novel C > A/G > T transversion artifacts found at low allelic fractions in targeted capture data. Characteristics such as sequencer read orientation and presence in both tumor and normal samples strongly indicated a non-biological mechanism. We identified the source as oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. We show generation of 8-oxoguanine (8-oxoG) lesions during DNA shearing, present analysis tools to detect oxidation in sequencing data and suggest methods to reduce DNA oxidation through the introduction of antioxidants. Further, informatics methods are presented to confidently filter these artifacts from sequencing data sets. Though only seen in a low percentage of reads in affected samples, such artifacts could have profoundly deleterious effects on the ability to confidently call rare mutations, and eliminating other possible sources of artifacts should become a priority for the research community.