- In this page you will find information about the predictor implemented in PRODIGY.
For information about the dataset used to train our predictor visit the PRODIGY dataset page.
Within PRODIGY our simple but robust descriptor of binding affinity is implemented which is based solely on
structural properties of the protein-protein complexes (Vangone & Bonvin, eLife 2015).
Using the protein-protein binding affinity benchmark of Kastritis et al. (2011)
we showed that the number of interfacial contacts (ICs) of a protein-protein complex correlates
with the experimental binding affinity. This information, combined with properties of the non-interacting surfaces (NIS),
which have been shown to influence binding affinity (Kastritis at el. (2014)),
has led to one of the best performing predictor so far which was also bebchmarked on such a large and heterogeneous set of data.
- The predictor
- The predictive model
- In our recent work, we found a direct correletion between the number of interfacial contacts (ICs) in a protein-protein complex,
and the binding strength. Our predictor is therefore based on a simple linear regression of ICs and some of the properties
of the non-interacting surfaces (NIS), which have been shown to influence the binding affinity
(check Kastritis at el. 2014).
ΔGpredicted = -0.09459 ICscharged/charged -0.10007 ICscharged/apolar +0.19577 ICspolar/polar -0.22671 ICspolar/apolar +0.18681 %NISapolar +0.13810 %NIScharged -15.9433
- Equation 1. Predictive model implemented in PRODIGY.
- ICsxxx/yyy is the number of Interfacial Contacts found at the interface between Interactor1 and Interactor2
classified according to the polar/apolar/charged nature of the interacting residues (i.e. ICscharged/apolar
is the number of ICs between charged and apolar residues). Two residues are defined in contact if any of their heavy atom
is within a distance of 5.5 Å.
- Our model predicts binding affinities with an unprecedented accuracy on such a large and heterogeneous dataset
of complexes (81 in total) (see Figure 1),
showing a Pearson’s Correlation of 0.73 (p-value < 0.0001)
between the predicted and the experimental values and a RMSE (Root Mean Square Error) of 1.89 kcal mol-1.
Figure 1. Scatter plot of predicted vs experimental binding affinities.
The predictions were made according to Equation 1 for the ‘cleaned’ dataset of 81 protein–protein complexes
(see PRODIGY dataset page).
The correlation for all 81 complexes yields an r = 0.73 (p-value < 0.0001) with a RMSE of 1.89 kcal mol-1.
Rigid complexes are reported in red triangles, flexible complexes as yellow diamonds (see "The effect of conformational changes on the
prediction accuracy" paragraph reported below). The x = y line is shown as reference; binding affinities are reported as absolute values.
- Our results have been cross-validated through a 4-fold cross-validation approach. More information can be found
online in Supplementarty file 3 of
Vangone & Bonvin, eLife (2015)
- Based on the predicted binding affinity (ΔG) according to Equation 1, the dissociation constant (Kd) is
calculated via the following formula:
- ΔG = RT lnKd
- where R is the ideal gas constant (in kcal K-1mol-1), T is the temperature (in K) and ΔG is the
predicted free energy. By default the Temperature is set at 298,15 K (25.0 ℃), but can be modified by the user.
- The effect of conformational changes on the prediction accuracy
In many assemblies, the structure of free monomers differs from their structure in the oligomeric state
(the ‘bound’ form) due to the association process. The affinity benchmark also reported the interface
RMSD (i_rmsd) between the unbound and bound structures. This is a measure of how much conformational change
takes place upon association. We investigated if our model had a higher predictive power when classifying the
complexes according to their amplitude of conformational change upon binding (see Figure 1). Predictions made with our
model are less effected by conformational changes than all previous models (see Figure 2),
with only minor differences in performance between rigid (i_RMSD ≤ 1.0 Å, r= 0.75 and RMSE = 1.88 kcal mol-1)
and flexible cases (i_RMSD > 1.0 Å, r= 0.73 and RMSE = 1.89 kcal mol-1).
This indicates that, in contrast to previous predictors, the number of interface residue contacts
is a rather robust predictor hardly influenced by conformational changes.
- The performance
- Our predictor shows a Pearson’s Correlation r of 0.73 (p-value < 0.0001) between the predicted and the
experimental values and a RMSE (Root Mean Square Error) of 1.89 kcal mol-1.
- In order to perform a fair comparison with other previously published methods, we calculated their
performance on the complexes that are in common between our clean dataset, the one reported by
Moal et al. (2011), and the
pre-calculated data on the Computational Characterisation of Protein–Protein Interactions (CCHarPPI) web-server,
ending up in 79 protein–protein complexes (Figure 2).
The considered models include the ‘global surface model’ of
Kastritis et al. (2014), the
BSA-based (Buried Surface Area) model of Horton and Lewis (1992) ,
the top three best performing methods reported by Moal et al. (2011) and the composite scoring functions reported by the CCHarPPI webserver.
As shown in Figure 2,
our model outperformed all other methods tested. It is also less sensitive to conformational changes.
Figure 2. Comparison of the performance of our predictor
(Equation 1) with other predictor models reported by Moal et al. (2011) and the CCHarPPI webserver.
The performance is expressed as Pearson’s Correlation coefficient between experimental
and predicted binding affinities. Predictions were made on the common set of 79 complexes between
our cleaned dataset (see PRODIGY dataset page),
the data tested by Moal et al. (2011), and the CCHarPPI pre-calculated data.
Correlations for the entire set and the rigid (43) and flexible (36) complexes are reported as absolute values for easier comparison.
- All associated data can be found online as Supplementary file 4
in Vangone & Bonvin, eLife (2015) . In addition to the composite scoring function of CCharPPI, none of the other 99
intermolecular parameters reported by CCharPPI outperformed our model.
- Number of Contacts at the Interface (ICs)
- We counted the number of contacts at the interface (ICs) for each protein-protein complex in the
benchmark and we correlated ICs with the binding strength of the complex. Two residues were considered in contact if a pair of (any)
atoms belonging to the two residues is within a defined cut-off distance. After systematic sampling of the threshold, we fix it at
5.5 Å (see Figure 3).
Further, residues were classified based on their physico-chemical properties as follows:
- polar: C, H, N, Q, S, T, W
- apolar: A, F, G, I, V, M, P
- charged: E, D, K, R
Figure 3. Example of ICs.
- Within our model we also used some of the NIS (Non Interacting Surface) properties.
Details about it are reported in Kastritis at el. (2014).