Scoring functions and binding affinity prediction: Can we do it?
The design of an ideal scoring
function for protein−protein docking that would also predict the binding affinity of a complex is one of the challenges
in structural proteomics. Such a scoring function would open the route to in silico, large-scale annotation and prediction
of complete interactomes.
Here we present a protein−protein
binding affinity benchmark consisting of binding constants (Kd’s) for 81 complexes. This benchmark was used to assess the performance
of nine commonly used scoring algorithms along with a free-energy prediction algorithm in their ability to predicting binding affinities.
Our results reveal a poor correlation between binding affinity and scores for all algorithms tested. However, the diversity and validity
of the benchmark is highlighted when binding affinity data are categorized according to the methodology by which they were determined.
By further classifying the complexes into low, medium and high affinity groups, significant correlations emerge, some of which are
retained after dividing the data into more classes, showing the robustness of these correlations.
accurate prediction of binding affinity remains outside our reach due to the large associated standard deviations of the average score
within each group. Therefore, improvements of existing scoring functions or design of new consensus tools will be required for accurate
prediction of the binding affinity of a given protein−protein complex.
In a collaborative effort with the groups of Prof. Janin (Université Paris-Sud), Dr. P. Bates (Cancer Research UK, London) and Prof. Z.
Weng (University of Massachusetts Medical School, Worcester) to extend this benchmark, we have discovered a number of discrepancies
corresponding to some of the reported values and entries in our previously published benchmark.
Accordingly, only 46
of the 81 reported binding affinity data can be considered fully accurate. We have reanalyzed the accuracy of the various scoring functions on this subset and, although correlations
are slightly improved (sqrt(R)<0.3), current functions still do not hold any predictive capacity.
We have developed the protein-protein binding affinity benchmark to be of general use to the docking community for the development of the scoring functions. We welcome all suggestions
aimed at improving or expanding the benchmark.
You can email suggestions to Alexandre M.J.J. Bonvin.
Benchmark versions and correlation studies
Updates for the protein-protein binding affinity benchmark and correlation studies of protein-protein structures and their affinities will be posted here.
- 23-03-2010 | The initial version of protein-protein affinity benchmark and the correlations of scoring functions to affinities are described in our original
- 03-11-2010 | Benchmark version 1.0: Major updates. A community-wide effort is on the loose, reviewing, extending and annotating the benchmark: 46/81 complexes
are considered fully accurate for benchmarking studies (either their affinity data or their 3D structures). PDB IDs of the 46 complexes along with their binding
affinities can be downloaded here
- 05-11-2010 | Reanalyzing the accuracy of scoring functions in binding affinity prediction: We have re-evaluated if scoring functions can predict the affinities of
protein-protein complexes, considering 46 high-quality complexes present in our updated benchmark. Although improvement of the correlations are evident, scoring is still
limited in predicting binding affinities. A Figure illustrating correlations of the scoring functions can be viewed here
Citing the benchmark
When using the protein-protein binding affinity benchmark or discussing binding prediction of protein-protein complexes, please cite using the following reference:
- Kastritis, P.L. and A.M.J.J. Bonvin (2010) Are Scoring Functions in Protein−Protein Docking Ready To Predict Interactomes? Clues from a Novel Binding Affinity Benchmark.
Journal of Proteome Research, 9(5) 2216-2225. Read more >>