Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies
Polymorphisms in the target mRNA sequence can greatly affect the binding affinity of microarray probe sequences, leading to false-positive and false-negative expression quantitative trait locus (QTL) signals with any other polymorphisms in linkage disequilibrium. We provide the most complete solution to this problem, by using the latest genome and exome sequence reference data to identify almost all common polymorphisms (frequency >1% in Europeans) in probe sequences for two commonly used microarray panels (the gene-based Illumina Human HT12 array, which uses 50-mer probes, and exon-based Affymetrix Human Exon 1.0 ST array, which uses 25-mer probes). We demonstrate the impact of this problem using cerebellum and frontal cortex tissues from 438 neuropathologically normal individuals. We find that although only a small proportion of the probes contain polymorphisms, they account for a large proportion of apparent expression QTL signals, and therefore result in many false signals being declared as real. We find that the polymorphism-in-probe problem is insufficiently controlled by previous protocols, and illustrate this using some notable false-positive and false-negative examples in MAPT and PRICKLE1 that can be found in many eQTL databases. We recommend that both new and existing eQTL data sets should be carefully checked in order to adequately address this issue.