Our group is interested in how single nucleotide polymorphisms (SNPs) affect the structure or function of proteins. We have developed a method of predicting changes in the stability of proteins caused by mutations. Site Directed Mutator (SDM) is a statistical potential energy function developed by Topham et al. [1] to predict the effect that SNPs will have on the stability of proteins. By analogy to the folding-unfolding cycle in Figure 1, the algorithm uses a set of conformationally constrained environment-specific substitution tables (ESSTs) to calculate the difference in the stability scores for the folded and unfolded state for the wild-type and mutant protein structures.

The thermodynamic cycle can be used to calculate protein stability changes between wild-type and mutant proteins

Figure 1: The thermodynamic cycle can be used to calculate protein stability changes between wild-type and mutant proteins.

The method was benchmarked against a set of mutant proteins with experimental thermodynamic measurements - 855 mutants comprising 17 proteins. In the task of predicting the amount of stability change upon mutation, SDM had a correlation coefficient of 0.58 (Figure 2).

SDM's performance

Figure 2: Scatter plot showing the experimentally measured energy changes versus the predicted energy changes made by SDM for the set of 855 mutant proteins. The correlation is 0.58 and the accuracy is 71%.

The statistical potential-based method, PoPMuSiC-2.0 was recently reported and achieved a correlation of 0.63 between measured and predicted stability changes (22). The predictive power of the method was shown to be significantly higher than that of other programs described in the literature. In order to compare the predictive power of SDM to PoPMuSiC-2.0 and the other tested methods, we used the same data set of 350 mutants. After the PoPMuSiC algorithms, SDM has the highest linear correlation between predicted and measured ΔΔG values (Table 1). It also has the benefit of making predictions for the entire data set of 350 mutants. It is encouraging that the performance of SDM is improved when considering only highly stabilizing or only destabilizing mutations – the correlation coefficient increases from 0.52 to 0.63 (Table 1).

Table 1: Comparison of the perofmance of different prediction methods using the PoPMuSiC-2.0 dataset of 350 mutations.

Method Number of predictions Pearson's correlation coefficient * Standard Error (kcal/mol) *
Automute 315 0.46 / 0.45 / 0.45 1.43 / 1.46 / 1.99
CUPSAT 346 0.37 / 0.35 / 0.5 1.91 / 1.96 / 2.14
Dmutant 350 0.48 / 0.47 / 0.57 1.81 / 1.87 / 2.31
Eris 334 0.35 / 0.34 / 0.49 4.12 / 4.28 / 3.91
I-Mutant-2.0 346 0.29 / 0.27 / 0.27 1.65 / 1.69 / 2.39
POPMUSIC-1.0 350 0.62 / 0.63 / 0.70 1.24 / 1.25 / 1.66
POPMUSIC-2.0 350 0.67 / 0.67 / 0.71 1.16 / 1.19 / 1.67
SDM 350 0.52 / 0.53 / 0.63 1.80 / 1.81 / 2.11
*Three values are given per column. The first corresponds to the whole validation set of 350 mutants with the unavailable ΔΔG predictions set to 0.0kcal/mol. The second corresponds to the 309 mutants for which a ΔΔG prediction is available for all predictors. The third corresponds to 87 mutants for which the experimental ΔΔG value causes more than 2 kcal/mol change and for which a ΔΔG prediction is available for all predictors.

In order to compare our method to other published methods, we also tested SDM using a set of mutants with thermodynamic measurements conducted under physiological conditions. SDM performs comparably or better than the other methods in the task of classifying mutations as stabilizing or destabilizing (Table 1). The sensitivity of predicting stabilizing mutations is poor. Nine out of the 12 methods incorrectly classify 69% or more of the stabilizing mutations (see red highlghted entries in Table 2). SDM however has a more balanced sensitivity in predicting both types of mutations, although the specificity of predicting destabilizing mutations is far better than that of predicting stabilizing mutations. Most mutations are destabilizing and this is reflected in the mutant thermodynamic datasets used for developing and testing such methods. Methods that assign all of the samples to the majority class (destabilizing mutations) will have high accuracy even though the performance is poor for the minority class (stabilizing mutations). This trend is observed for most of the methods reported in Table 2.

Table 2: Comparison of SDM with other methods using the set of 388 mutants with thermodynamic measurements conducted under physiological conditions.

Method MCC Accuracy Sensitivity stabilizing Specificity stabilizing Sensitivity destabilizing Specificity destabilizing
Automute S1227 0.31 0.87 0.36 0.42 0.94 0.92
FOLDX 0.25 0.75 0.56 0.26 0.78 0.93
DFIRE 0.11 0.68 0.44 0.18 0.71 0.90
POPMUSIC-1.0 0.20 0.85 0.25 0.33 0.93 0.90
POPMUSIC-2.0 0.32 0.86 0.35 0.44 0.94 0.91
I-MUTANT 0.25 0.87 0.21 0.44 0.96 0.90
MUpro-SO 0.26 0.86 0.30 0.40 0.94 0.90
MUpro-TO 0.28 0.86 0.31 0.42 0.94 0.91
MUpro-ST 0.27 0.86 0.31 0.40 0.93 0.91
MuX-S 0.39 0.88 0.29 0.67 0.94 0.91
MuX-48 0.68 0.89 0.29 0.67 0.98 0.91
SDM 0.28 0.71 0.70 0.24 0.71 0.94

We have applied SDM to the task of identifying disease associated mutations [2,3]. We have demonstrated that SDM is able to distinguish disease-associated mutations from non disease-associated mutations in terms of protein stability. When applied to a large dataset of disease-associated and non-disease-associated mutations, SDM had an accuracy of 61% [3]. The accuracy was comparable to the other methods tested but has the benefit of a much lower false-positive rate and therefore provides a high-quality set of predictions.

Conclusions:

SDM performs comparably or better to other published methods in the task of predicting the sign of DDG change. Additionally, SDM shows least bias in predicting stabilizing and destabilizing mutations - a general problem in the field. We have shown that SDM is able to distinguish disease-associated mutations from non-disesase mutations in terms of protein stability and therefore SDM may be of use in correlating SNPs with diseases caused by protein instability.

References:

[1] Topham CM, Srinivasan N and Blundell TL (1997) Prediction of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Eng. 10, 7-21.

[2] Worth CL, Burke DF and Blundell TL (2007) Estimating the effects of single nucleotide polymorphisms on protein structure: how good are we at identifying likely disease associated mutations? Proceedings of Molecular Interactions - Bringing Chemistry to Life.11-26.

[3] Worth CL*, Bickerton GRJ*, Schreyer A, Forman JR, Cheng TMK, Lee S, Gong S, Burke DF and Blundell TL (2007) A structural bioinformatics approach to the analysis of non-synonymous single nucleotide polymorphisms and their relation to disease. Journal of Bioinformatics and Computational Biology special issue: Making Sense of Mutations requires Knowledge Management vol.5 no 6.
*these authors contributed equally to this work

[4] Smith RE, Lovell SC, Burke DF, Montalvao RW, Blundell TL. (2007) Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities.Bioinformatics. vol. 23(9):1099-105.