CAMEO - Continuous Automated Model EvaluatiOn - Help page

General Workflow

The CAMEO Workflow - part of the weekly release procedure by the PDB is to publish the sequences of the entries to be released next week five days ahead (ie friday) of the actual release. CAMEO collects these sequences and submits them after - some pre-processing - to the registered servers. The Assessment can be performed, once the actual structure is released by the PDB - usually the following Wednesday.

The categories supported by CAMEO are the protein structure modeling (3D), protein model quality assessment (QE), and structures and complexes (Beta 3D). Protein contact prediction (CP) and ligand binding site (LB) have been discontinued.0

CAMEO servers can be registered as public servers with its full name and results available to everyone, or as development servers, where the name is disguised ('serverX') and all scoring is performed and visible to other method developers, but not to the public. See our complete list of registered servers.

A CAMEO target is a sequence to be released in the upcoming PDB release, based on the weekly pre-release of new PDB structures. Targets with templates covering at least 70% of the sequence with a sequence identity of more than 85% are not submitted for the 3D - protein structure category. This also applies for targets with sequence lengths > 250 and less than 40 amino acids uncovered. Up to 20 interesting targets are selected and sent to registered participants.

Targets are typically sent out to participants within a few hours after the pre-release of the PDB, which starts on Saturday from 3:00 UTC. The prediction window is open until the release of new structures on Wednesday at 00:00 UTC. Predictions submitted after the close of the prediction window will be ignored.

The difficulty of a target is defined by evaluating the average accuracy (lDDT) over the models received from the servers. A low lDDT for all servers clearly indicates a hard target. Average lDDT >= 75 for Easy, average lDDT between 50 and 75 for Medium and average lDDT < 50 for Hard.

The fraction of atoms below a cut-off resembles the percentage of Cα atoms in the predictions deviating from the target by not more than a specified distance (in Å) cut-off for different sequence-dependent superpositions. The method uses distance cut-off values of 0.5, 1, 2, and 4 Å whereas the superposition is computed using the program LGA. [ref.: PMID: 12824330]

Submission of predictions

Servers must return predictions by email to the address that was submitted to you by CAMEO with the target. This email address is different for each target.

In order to ensure your models are evaluated correctly, please make sure to follow the guidelines below:

Servers must return a separate email for every target, to the address that was provided by CAMEO during the submission.
Models must be included as attachments to the email. The names of the attached files do not matter.
Models must be valid PDB files. Residue and atom names must be set according to the PDB component dictionary. Check out the PDB-101 help page "Dealing with Coordinates" to learn more about the format.
Models must contain at least the protein backbone. C-alpha traces are not supported.
CAMEO 3D relies on the connectivity of the backbone atoms and presence of peptide bonds in early stages of the evaluation process. The connectivity is established with heuristics that look at atom radius and bond lengths. Models with an incomplete backbone or with peptide bonds that are too short/long may be rejected altogether by CAMEO.
Residue numbers should follow the target sequence (starting with residue number 1). CAMEO 3D will attempt to renumber residues if residue numbers don't match the target sequence. However, this is error-prone and likely to fail from time to time, and should be avoided.
The email can include up to 5 models (as separate attachments). All 5 models will be scored, but only the first one will be included in aggregated scores. The models are numbered according to the order of the attachment in the email (1st attachment = model1, etc.), not the file name.
A server can send multiple emails for a target. When that's the case, only the last email with valid predictions will be considered by CAMEO. This allows participants to quickly send initial or preliminary results, and replace them with better predictions later on if they could be improved.

Failure to follow these guidelines may result in models being ignored during the evaluation.

In addition, models should follow some good practices for best results:

Participants are encouraged to aim at full accuracy and model all heavy atoms, including side chains.
Models should contain non-0 values in the occupancy column (55 - 60).
Models should fill the B-factors column (61-66) with a best estimate of the quality of each residue. This data is used by the Model Confidence scores.

Resubmissions

Sometimes things go wrong, servers crash, connections break, and you didn't receive any or all targets for the week (typically 20). When that happens, you will receive an email informing you about the error. Please get in touch with us by replying to it to request a resubmission. Resubmissions are performed manually on a best-effort basis.

Scores

The CAD-score (contact area difference) provides a single uniform framework for assessing single-domain, multi-domain, and even multi-subunit protein structural models of varying degree of accuracy and completeness. While being highly correlated with GDT-TS on single-domain structures, CAD-score displays a stronger emphasis on the physical realism of models and is superposition-free. [PMID: 22933340]. Values ranges between 0 and 100 (0 bad, 100 good).

The coverage of a model is the percentage of residues in the model, where structural information is available in the experimental structure with respect to the length of the submitted sequence. Several CAMEO targets turned out to be low coverage as they were not resolved protein segments in larger complexes. we reserve the right to remove these targets in the future.

The GDT_HA (Global Distance Test) score identifies sets of residues in the predictions deviating from the target by not more than a specified Cα distance cut-off for different sequence-dependent superpositions. The method uses distance cut-off values of 0.5, 1, 2, and 4 Å. Help section

The GDC score identifies sets of residues in the predictions deviating from the target by not more than a specified all-atom distance cut-off for different sequence-dependent superpositions. The method uses distance cut-off values of 0.5, 1, 2, and 4 Å. ( This score is calculated with LGA, Version 2009/5 "LGA is a method for finding 3-D Similarities in Protein Structures". [ref.: PMID:17894352] )

The lDDT score (Local Distance Difference Test on All Atoms) evaluates the quality of the local atomic environment of a model. lDDT rewards the fraction of correctly predicted inter-atomic distances in a model at different threshold levels. lDDT does not depend on a global superposition of the prediction and target structure.
Specifically, interaction distances (cutoff 15 Å) between atoms in the reference protein structure are compared with distances between corresponding atoms in the predictions. If the difference between the two distances is within a defined threshold, the interaction is considered to be preserved in the prediction. The final lDDT-all score is computed by averaging the fraction of correctly modeled interactions for the following four distance difference thresholds: 0.5, 1, 2, and 4 Å (the same thresholds as GDT_HA). A filter based on the Engh and Huber bond lengths and angles removes stereochemical violations and steric clashes. CAMEO additionally offers a Cα - based lDDT score.[ref.: CASP9 TBM Assessment]

The lDDT-BS score (Local Distance Difference Test - Binding Site) measures the accuracy of residues which form binding site(s) on the target structure. Here binding site is defined as the set of amino acid residues in the reference protein structure which have at least one atom within a 4.0 Å radius of any atom of the ligand (3.0 Å for ions). Only ligands that form non-covalent interactions with the target are considered. Common solvent molecules are also excluded from the analysis, based on a blacklist approach. The lDDT score is calculated for contacts within the binding site only, with a custom inclusion radius R_o of 10 Å and the standard thresholds of 0.5 Å, 1 Å, 2 Å and 4 Å. When the reference structure contains several biologically relevant ligands, the lDDT-BS score is the average of the lDDT-BS scores of the individual binding sites. This score is only calculated for targets where the experimental structure incorporates a ligand. When a binding site is at the interface of a homo-oligomeric structure and the prediction does not show the same oligomeric state, all possible chain mappings are assessed and the highest-scoring one is retained. Targets which cover only part of a hetero-oligomeric complex are not evaluated since the modeled form might differ from its natural state and among evolutionarily related hetero-oligomeric complexes.

The evaluation of Model Confidence lDDT Values assesses the error estimates given as deviation in Å with a ROC AUC analysis. A residue is classified as correctly modeled if its local lDDT value is higher or equal 0.6. The local scores shown on the individual model pages are a measure of how well the model fits to the reference structure at a given residue position. Local scores are extracted from the output when calculating the average accuracy. Please note that the ROC AUC is not defined for too good or too bad models (all lDDT values greater or smaller than 0.6).

MM-align (MultiMer-align) is an algorithm for structurally aligning multiple-chain protein-protein complexes. The algorithm is built on a heuristic iteration of a modified Needleman-Wunsch dynamic programming (DP) algorithm, with the alignment score specified by the inter-complex residue distances. The assignment of matching multiple chains in each complex rely on our chain mapping alogrithm. [DOI:10.1093/nar/gkp318 ]. Values range between 0 (bad) and 100 (good).

The evaluation of Model Confidence Values assesses a posteriori the ability of individual 3D-structure modeling servers to assign realistic error estimates to their predictions. The error estimates are expressed as the expected distance (in Å) between the Cα positions of the model compared to the target structure.
For each target we calculate the model error for each predicted amino acid residue as the Cartesian distance between the model Cα coordinates and the experimental target structure in a global superposition with LGA [ref.: PMID: 12824330] using a 4 Å cut-off. The prediction results are analyzed using receiver operator characteristic curves (ROC AUC). A residue is classified as correctly modeled if its Cα position error is smaller than 3.5 Å and incorrectly if its Cα position error is greater than or equal to 3.5 Å. For each prediction the error estimates are reranked between 0 and 1 and the enrichment of correctly identified model errors is plotted as the false positive rate (FPR) versus the true positive rate (TPR) by varying the discrimination threshold between 0 and 1. [ref.: PMID: 17894352] The local scores shown on the individual model pages are a measure of how well the model fits to the reference structure at a given residue position. Local scores are extracted from the output when calculating the average accuracy. Please note that the ROC AUC is not defined for too good or too bad models (all Cα position errors greater or smaller than 3.5 Å).

The current implementation of QScore based on the QScore, defined by Xu Q. et al [PMID: 18599072]. It is intended to reflect the similarity between two interfaces and represents a value ranging between 0 (bad) and 100 (good). This score is no longer calculated on new targets and will be removed in the near future.

The QS-score considers the assembly interface as a whole and is suitable for comparing homo- or hetero-oligomers with identical or different stoichiometries, alternative relative orientations of chains, and distinct amino acid sequences (i.e. homologous complexes). To unequivocally identify the residues of all protein chains in complexes, QS-score first establishes a mapping between equivalent polypeptide chains of the compared structures by exploiting complex symmetries where possible. The resulting QS-score expresses the fraction of shared interface contacts (residues on different chains with a Cβ-Cβ distance < 12 Å) between two assemblies. A QS-score close to 1 translates to very similar interfaces, matching stoichiometry and a majority of identical interfacial contacts. A QS-score close to 0 indicates a radically diverse quaternary structure, probably different stoichiometries and potentially representing alternative binding conformations. Targets which cover only part of a hetero-oligomeric complex are not evaluated.

The Response Time is the time a model needs from submission to the server until reception by CAMEO. When averaging/aggregating we take only the first model into account (this is valid also for any other score calculated). The Response time strongly depends on the age of the hardware and load on the server and is hence not necessarily an indication of the efficiency of the algorithms. Currently the CAMEO workflow is as close as possible reflecting a users' experience submitting a modeling job and receiving the results by email. Deriving timings sometimes suffers from individual delays caused by MTA forwarding hosts being unavailable. Over a period of several months this effect is averaged out. The response times logged additionally may not reflect a typical user's experience.

The RMSD is calculated on all CΑ atoms in the target-model superposition with LGA ([ref.: PMID:17894352]), using a 4Å threshold for the sequence dependent superposition.

The TM-score (template modeling score) is a well established score reflecting the similarity of two proteins. Protein pairs with a TM-score >0.5 are mostly of the same fold.

CAMEOContinuous Automated Model EvaluatiOn

Help page - 3D Modeling

General Workflow

Submission of predictions

Resubmissions

Scores