Predicting Cell Types and Genetic Variations Contributing to Disease by Combining GWAS and Epigenetic Data
Genome-wide association studies (GWASs) identify single nucleotide polymorphisms (SNPs) that are enriched in individuals suffering from a given disease. Most disease-associated SNPs fall into non-coding regions, so that it is not straightforward to infer phenotype or function; moreover, many SNPs are in tight genetic linkage, so that a SNP identified as associated with a particular disease may not itself be causal, but rather signify the presence of a linked SNP that is functionally relevant to disease pathogenesis. Here, we present an analysis method that takes advantage of the recent rapid accumulation of epigenomics data to address these problems for some SNPs. Using asthma as a prototypic example; we show that non-coding disease-associated SNPs are enriched in genomic regions that function as regulators of transcription, such as enhancers and promoters. Identifying enhancers based on the presence of the histone modification marks such as H3K4me1 in different cell types, we show that the location of enhancers is highly cell-type specific. We use these findings to predict which SNPs are likely to be directly contributing to disease based on their presence in regulatory regions, and in which cell types their effect is expected to be detectable. Moreover, we can also predict which cell types contribute to a disease based on overlap of the disease-associated SNPs with the locations of enhancers present in a given cell type. Finally, we suggest that it will be possible to re-analyze GWAS studies with much higher power by limiting the SNPs considered to those in coding or regulatory regions of cell types relevant to a given disease.