FCC Clustering: a fast clustering algorithm for biomolecular complexes
e2a-hpr_clusters Structure prediction methods generate a large number of models of which only a fraction matches the biologically relevant structure. To identify this (near-)native model, we often employ clustering algorithms, based on the assumption that, in the energy landscape of every biomolecule, its native state lies in a wide basin neighboring other structurally similar states.

RMSD-based clustering, the current method of choice, is inadequate for large multi-molecular complexes, particularly when their components are symmetric.
We developed a novel clustering strategy that is based on a very efficient similarity measure - the fraction of common contacts. The outcome of this calculation is a number between 0 and 1, which corresponds to the fraction of residue pairs that are present in both the reference and the mobile complex.
Advantages of FCC clustering vs. RMSD-based clustering:
  • 100-times faster on average.
  • Handles symmetry by consider complexes as entities instead of collections of chains.
  • Does not require atom equivalence (clusters mutants, missing loops, etc).
  • Handles any molecule type (protein, DNA, RNA, carbohydrates, lipids, ligands, etc).
  • Allows multiple levels of "resolution": chain-chain contacts, residue-residue contacts, residue-atom contacts, etc.
Comparison of FCC clustering and RMSD-based clustering
You can email suggestions to Alexandre M.J.J. Bonvin.
Citing FCC
When using the FCC algorithm to cluster biomolecular structures, please use following reference:
  • Rodrigues JPGLM et al. (2012) Clustering biomolecular complexes by residue contacts similarity. Proteins 80:1810–1817. Read more >>