Prediction of protein residue contacts with a PDB-derived likelihood matrix
Proteins with similar folds often display common patterns of residue variability. A widely discussed question is how these patterns can be identified and deconvoluted to predict protein structure. In this respect, correlated mutation analysis (CMA) has shown considerable promise. CMA compares multiple members of a protein family and detects residues that remain constant or mutate in tandem. Often this behavior points to structural or functional interdependence between residues. CMA has been used to predict pairs of amino acids that are distant in the primary sequence but likely to form close contacts in the native three-dimensional structure. Until now these methods have used evolutionary or biophysical models to score the fit between residues. We wished to test whether empirical methods, derived from known protein structures, would provide useful predictive power for CMA. We analyzed 672 known protein structures, derived contact likelihood scores for all possible amino acid pairs, and used these scores to predict contacts. We then tested the method on 118 different protein families for which structures have been solved to atomic resolution. The mean performance was almost seven times better than random prediction. Used in concert with secondary structure prediction, the new CMA method could supply restraints for predicting still undetermined structures.