General Workflow

The CAMEO Workflow - part of the weekly release procedure by the PDB is to publish the sequences of the entries to be released next week five days ahead (ie friday) of the actual release. CAMEO collects these sequences and submits them after - some pre-processing - to the registered servers. The Assessment can be performed, once the actual structure is released by the PDB - usually the following Wednesday.

The categories supported by CAMEO are the protein structure modeling (3D), protein model quality assessment (QE) and protein contact prediction (CP). Ligand Binding site prediction is currently in revision. Other categories will follow as requested by the community.

CAMEO servers can be registered as public server with its full name, anonymous server, where all scoring is performed but only the name is disguised ('serverx') or private servers. The latter will run all the evaluations, but no data is displayed on the website. See our complete list of registered servers.

A CAMEO target is based on the weekly pre-release of new PDB structures, which are submitted to all registered servers. Targets with templates covering at least 70% of the sequence with a sequence identity of more than 85% are not submitted for the 3D - protein structure category. This also applies for targets with sequence lengths > 250 and less than 40 amino acids uncovered.

The difficulty of a target is defined by evaluating the average accuracy (lDDT) over the models received from the servers. A low lDDT for all servers clearly indicates a hard target. Average lDDT >= 75 for Easy, average lDDT between 50 and 75 for Medium and average lDDT < 50 for Hard.

The fraction of atoms below a cut-off resembles the percentage of Cα atoms in the predictions deviating from the target by not more than a specified distance (in Å) cut-off for different sequence-dependent superpositions. The method uses distance cut-off values of 0.5, 1, 2, and 4 Å whereas the superposition is computed using the program LGA. [ref.: PMID: 12824330]

Scores

The CAD score (contact area difference) provides a single uniform framework for assessing single-domain, multi-domain, and even multi-subunit protein structural models of varying degree of accuracy and completeness. While being highly correlated with GDT-TS on single-domain structures, CAD-score displays a stronger emphasis on the physical realism of models and is superposition-free. [PMID: 22933340]. Values ranges between 0 and 1 (0 bad, 1 good).

The coverage of a model is the percentage of residues in the model, where structural information is available in the experimental structure with respect to the length of the submitted sequence. Several CAMEO targets turned out to be low coverage as they were not resolved protein segments in larger complexes. we reserve the right to remove these targets in the future.

The GDT_HA (Global Distance Test) score identifies sets of residues in the predictions deviating from the target by not more than a specified Cα distance cut-off for different sequence-dependent superpositions. The method uses distance cut-off values of 0.5, 1, 2, and 4 Å. Help section

The GDC score identifies sets of residues in the predictions deviating from the target by not more than a specified all-atom distance cut-off for different sequence-dependent superpositions. The method uses distance cut-off values of 0.5, 1, 2, and 4 Å. ( This score is calculated with LGA, Version 2009/5 "LGA is a method for finding 3-D Similarities in Protein Structures". [ref.: PMID:17894352] )

The lDDT score (Local Distance Difference Test on All Atoms) evaluates the quality of the local atomic environment of a model. lDDT rewards the fraction of correctly predicted inter-atomic distances in a model at different threshold levels. lDDT does not depend on a global superposition of the prediction and target structure.
Specifically, interaction distances (cutoff 15 Å) in the protein structure are compared with distances between corresponding atoms in the predictions. If the difference between the two distances is within a defined threshold, the interaction is considered to be preserved in the prediction. The final lDDT-all score is computed by averaging the fraction of correctly modeled interactions for the following four distance difference thresholds: 0.5, 1, 2, and 4 Å (the same thresholds as GDT_HA). CAMEO additionally offers a Cα - based lDDT score.[ref.: CASP9 TBM Assessment]

The lDDT-BS score (Local Distance Difference Test - Binding Site) is the average of the individual lDDT local scores of those residues which form the binding site(s) on the respective target. The binding site is determined by the contacts formed with the reference protein structure based on a 10 Å radius interaction shell from each atom of the ligand. When a binding site is at the interface of an oligomeric structure and the prediction does not show the same oligomeric state, the lDDT is not calculated and those predictions are not considered in the averaging procedure. This score is only calculated for targets where the experimental structure incorporates a ligand.

MM-align (MultiMer-align) is an algorithm for structurally aligning multiple-chain protein-protein complexes. The algorithm is built on a heuristic iteration of a modified Needleman-Wunsch dynamic programming (DP) algorithm, with the alignment score specified by the inter-complex residue distances. The assignment of matching multiple chains in each complex rely on our chain mapping alogrithm. [DOI:10.1093/nar/gkp318 ]. Values range between 0 (bad) and 100 (good).

The evaluation of Model Confidence Values assesses a posteriori the ability of individual 3D-structure modeling servers to assign realistic error estimates to their predictions. The error estimates are expressed as the expected distance (in Å) between the Cα positions of the model compared to the target structure.
For each target we calculate the model error for each predicted amino acid residue as the Cartesian distance between the model Cα coordinates and the experimental target structure in a global superposition with LGA [ref.: PMID: 12824330] using a 4 Å cut-off. The prediction results are analyzed using receiver operator characteristic curves (ROC). A residue is classified as correctly modeled if its Cα position error is less than 3.5 Å and incorrectly if its Cα position error is greater than or equal to 3.5 Å. For each prediction the error estimates are reranked between 0 and 1 and the enrichment of correctly identified model errors is plotted as the false positive rate (FPR) versus the true positive rate (TPR) by varying the discrimination threshold between 0 and 1. [ref.: PMID: 17894352] The local scores shown on the individual model pages are a measure of how well the model fits to the reference structure at a given residue position. Local scores are extracted from the output when calculating the average accuracy.

The current implementation of QScore based on the QScore, defined by Xu Q. et al [PMID: 18599072]. It is intended to reflect the similarity between two interfaces and represents a value ranging between 0 (bad) and 100 (good). This score was intended as an interim score for QS-score and will be deceased in the near future.

The QS-score considers the assembly interface as a whole and is suitable for comparing homo- or hetero-oligomers with identical or different stoichiometries, alternative relative orientations of chains, and distinct amino acid sequences (i.e. homologous complexes). To unequivocally identify the residues of all protein chains in complexes, QS-score first establishes a mapping between equivalent polypeptide chains of the compared structures by exploiting complex symmetries where possible. The resulting QS-score expresses the fraction of shared interface contacts (residues on different chains with a Cβ-Cβ distance < 12 Å) between two assemblies. A QS-score close to 1 translates to very similar interfaces, matching stoichiometry and a majority of identical interfacial contacts. A QS-score close to 0 indicates a radically diverse quaternary structure, probably different stoichiometries and potentially representing alternative binding conformations.

The Response Time is the time a model needs from submission to the server until reception by CAMEO. When averaging/aggregating we take only the first model into account (this is valid also for any other score calculated). The Response time strongly depends on the age of the hardware and load on the server and is hence not necessarily an indication of the efficiency of the algorithms. Currently the CAMEO workflow is as close as possible reflecting a users' experience submitting a modeling job and receiving the results by email. Deriving timings sometimes suffers from individual delays caused by MTA forwarding hosts being unavailable. Over a period of several months this effect is averaged out. The response times logged additionally may not reflect a typical user's experience.

The RMSD is calculated on all CΑ atoms in the target-model superposition with LGA ([ref.: PMID:17894352]), using a 4Å threshold for the sequence dependent superposition.

The TM-score (template modeling score) is a well established score reflecting the similarity of two proteins. Protein pairs with a TM-score >0.5 are mostly of the same fold.