Biomolecular Systems of Disease Buried Across Multiple GWAS Unveiled by Information Theory and Ontology.
A key challenge for genome-wide association studies (GWAS) is to understand how single nucleotide polymorphisms (SNPs) mechanistically underpin complex diseases. While this challenge has been addressed partially by Gene Ontology (GO) enrichment of large list of host genes of SNPs prioritized in GWAS, these enrichment have not been formally evaluated. Here, we develop a novel computational approach anchored in information theoretic similarity, by systematically mining lists of host genes of SNPs prioritized in three adult-onset diabetes mellitus GWAS. The "gold-standard" is based on GO associated with 20 published diabetes SNPs' host genes and on our own evaluation. We computationally identify 69 similarity-predicted GO independently validated in all three GWAS (FDR<5%), enriched with those of the gold-standard (odds ratio=5.89, P=4.81e-05), and these terms can be organized by similarity criteria into 11 groupings termed "biomolecular systems". Six biomolecular systems were corroborated by the gold-standard and the remaining five were previously uncharacterized. http://lussierlab.org/publications/ITS-GWAS.