Prediction of distant residue contacts with the use of evolutionary information.
In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising. Copyright 2005 Wiley-Liss, Inc.