Link functions in multi-locus genetic models: implications for testing, prediction, and interpretation.
"Complex" diseases are, by definition, influenced by multiple causes, both genetic and environmental, and statistical work on the joint action of multiple risk factors has, for more than 40 years, been dominated by the generalized linear model (GLM). In genetics, models for dichotomous traits have traditionally been approached via the model of an underlying, normally distributed, liability. This corresponds to the GLM with binomial errors and a probit link function. Elsewhere in epidemiology, however, the logistic regression model, a GLM with logit link function, has been the tool of choice, largely because of its convenient properties in case-control studies. The choice of link function has usually been dictated by mathematical convenience, but it has some important implications in (a) the choice of association test statistic in the presence of existing strong risk factors, (b) the ability to predict disease from genotype given its heritability, and (c) the definition, and interpretation of epistasis (or epistacy). These issues are reviewed, and a new association test proposed. © 2012 Wiley Periodicals, Inc.