WHat information does Interface Conservation Yield?


WHISCY prediction

You need to supply the structure of your protein and a multiple sequence alignment of your protein sequence with homologuous sequences.

  1. You need the structure of your protein as PDB file
  2. You need to specify the protein chain. Many PDB files consists of multiple protein molecules (chains). Normally, you want to predict on only one chain.
  3. You need to specify the alignment of the protein (and when you upload a file the alignment format)


Interface propensities and surface smoothing are features that are used by WHISCY to improve the predictions.


PDB format

The structure needs to be in the PDB format (Protein Data Bank). The Protein Data Bank contains 3D biological macromolecular structure data.

You can supply a PDB code, causing a PDB structure to be fetched from the PDB before prediction. Alternatively, you can upload your own PDB file.


Protein chain

Many PDB files consists of multiple protein molecules (protein chains). Normally, you want to predict on only one chain. WHISCY extracts the chain that you specify from the PDB file before prediction. In a PDB file, the chain identifier for each atom is defined after the amino acid code:

ATOM    125  N   ASN A  17      -1.652  28.627  57.495  1.00 24.69           N  
ATOM 126 CA ASN A 17 -1.282 28.264 56.130 1.00 27.68 C
ATOM 127 C ASN A 17 0.046 28.805 55.634 1.00 27.56 C
ATOM 128 O ASN A 17 0.507 28.378 54.582 1.00 28.39 O

where A is the chain indentifier

If your protein structure contains no chain identifiers, or if you want to predict on all chains, specify None as chain. In that case, all chain indentifiers will be set to A before prediction.

Protein alignment

If your structure (or a close homolog) is deposited in the PDB, you can use the alignment from the HSSP database. The HSSP database contains an alignment for nearly each protein in the PDB. To use the HSSP alignment, specify the corresponding PDB code in the HSSP_id field.

Alternatively, you can supply your own alignment. WHISCY recognizes a number of alignment formats used by major alignment programs: .aln (default output of CLUSTAL), .fasta (default output of MUSCLE), PHYLIP and MSF. Take care: WHISCY assumes that the first sequence in the alignment corresponds to the biological sequence of your protein. This can be the sequence of the PDB structure, but only if it is not engineered!

If you want to generate an alignment but you are new to bioinformatics, a good place to start would be to run BLAST and then align your results using CLUSTAL. Always specify the format of the alignment you supply. If WHISCY somehow fails to read your alignment, please contact me

WHISCY is robust in terms of bad alignment quality, so manual correction is not essential.