A probabilistic model for secondary structure prediction from protein chemical shifts
Protein chemical shifts encode detailed structural information that is di_cult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMM) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of _rst and second order HMMs we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence-based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our _ndings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provide an adequate generative model of chemical shifts. Proteins 2012. © 2012 Wiley Periodicals, Inc.