Predicting binding sites from a protein's sequence has the potential for yielding high impact on life science research - if the predictions are specific and accurate enough to help addressing relevant biological questions. In CAMEO we plan to continuously assess ligand binding site predictions to evaluate the current state of the art of prediction methods, identify possible bottlenecks, and further stimulate the development of new methods.

In previous CASP experiments the very low number of challenging target structures with relevant ligands has been a major limitation to the assessment as it did not allow to draw significant conclusions on the specific strengths and weakness of different prediction methods. Further, the current ligand binding site prediction format used in CASP has a number of limitations. All ligands are treated uniformly, independent of their chemical type and all potential binding sites are treated uniformly, independent of their affinity for different ligands. Hence, in CAMEO we have modified the ligand binding site prediction format to allow a more fine-grained prediction and a more detailed assessment.

Format Definition

The format used by the prediction center during CASP9 is accepted, however there is a new format implemented, which follows the suggestions from the last assessment for the ligand binding category during CASP9.

This new format consists of three sections separated by the "|" symbol:

  1. The first section is a unique identifier for a residue or atom. It has two mandatory fields, the residue name ("r") and the residue number ("n"). In addition, two optional fields can be specified, the chain name ("c") and/or the atom name ("a").
  2. The second section contains predicted p-values for four ligand categories: ions ("I"), organics ("O"), polynucleotides ("N") and peptides ("P"). Predictions for all four categories are mandatory. The values are probabilities resembling the likelihood of a ligand belonging to a specific category. All ligands in the PDB are categorized into four classes based on the PDB ligand classification.
  3. The last section is optional and allows the specification of ligands (three letter code by PDB).

Format details:

Format

r=<resname>; n=<resnum>; [c=<chainname>;] [a=<atomname>;] | I=<ion prob>; O=<org prob>; N=<nucl prob>; P=<pep prob> | [<compound ID1>=<compound prob>;] [<compound ID2>=<compound prob>;] ...

Examples

1. ZN binding (3ZTT)

r=SER; n=198; | I=0.000; O=0.000; N=0.000; P=0.000; |
r=GLU; n=199; | I=1.000; O=0.000; N=0.000; P=0.000; |   
r=GLY; n=200; | I=0.000; O=0.000; N=0.000; P=0.000; |   
r=ALA; n=201; | I=0.513; O=0.000; N=0.000; P=0.000; |

2. ATP and Mg binding (3QAM)

r=GLU; n=170; a=N;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=GLU; n=170; a=CA;  | I=0.047; O=0.412; N=0.000; P=0.000; | ANP=0.412; MN=0.047; 
r=GLU; n=170; a=C;   | I=0.337; O=0.668; N=0.000; P=0.000; | ANP=0.668; MN=0.337; 
r=GLU; n=170; a=O;   | I=0.372; O=1.000; N=0.000; P=0.000; | ANP=1.000; MN=0.372; 
r=GLU; n=170; a=CB;  | I=0.249; O=0.424; N=0.000; P=0.000; | ANP=0.424; MN=0.249; 
r=GLU; n=170; a=CG;  | I=0.000; O=0.077; N=0.000; P=0.000; | ANP=0.077; MN=0.000; 
r=GLU; n=170; a=CD;  | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=GLU; n=170; a=OE1; | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=GLU; n=170; a=OE2; | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=ASN; n=171; a=N;   | I=0.331; O=0.353; N=0.000; P=0.000; | ANP=0.353; MN=0.331; 
r=ASN; n=171; a=CA;  | I=0.401; O=0.307; N=0.000; P=0.000; | ANP=0.307; MN=0.401; 
r=ASN; n=171; a=C;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=ASN; n=171; a=O;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=ASN; n=171; a=CB;  | I=0.528; O=0.251; N=0.000; P=0.000; | ANP=0.251; MN=0.528; 
r=ASN; n=171; a=CG;  | I=0.987; O=0.584; N=0.000; P=0.000; | ANP=0.584; MN=0.987; 
r=ASN; n=171; a=OD1; | I=1.000; O=0.939; N=0.000; P=0.000; | ANP=0.939; MN=1.000; 
r=ASN; n=171; a=ND2; | I=0.859; O=0.637; N=0.000; P=0.000; | ANP=0.637; MN=0.859; 
r=LEU; n=173; a=N;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=LEU; n=173; a=CA;  | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=LEU; n=173; a=C;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=LEU; n=173; a=O;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 
r=LEU; n=173; a=CB;  | I=0.000; O=0.175; N=0.000; P=0.000; | ANP=0.175; MN=0.000; 
r=LEU; n=173; a=CG;  | I=0.000; O=0.509; N=0.000; P=0.000; | ANP=0.509; MN=0.000; 
r=LEU; n=173; a=CD1; | I=0.000; O=0.898; N=0.000; P=0.000; | ANP=0.898; MN=0.000; 
r=LEU; n=173; a=CD2; | I=0.000; O=0.696; N=0.000; P=0.000; | ANP=0.696; MN=0.000;