A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases. New sequencing methods can be used to study how genetic variation contributes to disease. For studies of rare variation, family designs are especially attractive because they allow even very rare variants to be observed in multiple individuals and because they can be used to study the impact of de novo mutation events. An important challenge is that most raw sequencing data include many errors. Here, we develop a new approach for interpreting sequence data. We show that by analyzing sequence data across many family members together it is possible to greatly reduce error rates (measured either as the number of true variants that are missed or the number of false variants that are claimed). In addition to facilitating detection and genotyping of SNPs, our methods can interface with existing tools to improve the accuracy of more challenging short insertion deletion polymorphisms and other types of variants. Our methods should make studies of families even more attractive because, in addition to making it easy to study rare variants and de novo mutation events, family studies will now be able to better transform sequence data into accurate genotypes.