Tutorial: Validation with MolProbity (2024)

  • Overview
  • Setup
  • File inputs
  • Validation tools overview
    • Polygon
    • Opening Coot
    • MolProbity Validation
  • Making general corrections
    • Rotamers
    • C-Beta Deviations
    • N/Q/H Flips
    • Clashes
    • Notes
  • Cis-nonProlines overview
  • Making cis-peptide corrections
    • Justified cis conformations
    • Mistaken cis conformations
    • Testing both cis and trans
  • References

This tutorial will show you how to do comprehensive model validation within thePHENIX graphical user interface (GUI).(Validation in the PHENIX GUI reviews many of the aspects of this tutorial.)

The example used for this tutorial is called Protein Kinase A (3dnd).To get the requisite files open the GUI and set up the tutorial, as described here, called Protein Kinase A (validation).This should setup the main/GUI window.On the right-hand side, under Crystals, click Validation and map-based comparisons and then select Comprehensive Validation (MolProbity).

Tutorial: Validation with MolProbity (1)

In the "Comprehensive Validation" window, browse to the tutorial directory that you specified above, and select 3dnd.pdb as the "Input model", 3dnd.mtz as the "Reflections file", and 3dnd.ligands.cif as the "Restraints (CIF)" file.This CIF file defines restraints for ligands in input PDB file.

For this model, adding certain validations to the Polygon representation is useful. Scroll down to find the button labeled "Polygon options" and click it. In the window that pops up, turn on the checkboxes for "Ramachandran favored" and "Rotamer outliers". Then hit OK.

Now you are ready to run the program, select Run from the top menu bar.

Polygon

When validation has run, click on the "Compare statistics" button.This will launch a tool called "Polygon" (it may take several seconds), which is used to simultaneously compare R-work, R-free, RMSD-bonds, RMSD-angles, Clashscore, and Average B factor.Well-built models will usually have a small, fairly equilateral polygon, whereas larger or significantly asymmetric deviations are indicative of model problems.

As you can see for 3dnd, both RMSD-bonds and RMSD-angles are a bit large, beyond the expected peaks for structures of similar resolution.This could indicate that misfit areas of the structure are causing geometric strain. We will visit some of these areas in the next sections.

From the measures we added, you can also see that Rotamer outliers are quite high.Ramachandran favored looks unusual, but this measure should have a high value, unlike the others.More importantly, this structure's Ramachandran favored score falls around the peak of Polygon's histogram for structures of similar resolution.This is important context for choosing which validations to focus on.

Tutorial: Validation with MolProbity (2)

You may close the "POLYGON" window and return to the comprehensive validation window.

Opening Coot

Click the "Open in Coot" button. When Coot loads, note that it says "Connected to PHENIX" in the toolbar - this allows the validation GUI to communicate with Coot.

If Coot does not load, go to Preferences->Graphics in the main Phenix window to specify a path to your Coot installation.

MolProbity Validation

Now select the "MolProbity" tab in the Phenix Comprehensive validation window.Under this tab, you will see sub-tabs for "Summary", "Basic geometry", "Protein", and "Clashes".If your model contained RNA, there would also be an "RNA" tab.

Summary

Under the "Summary" tab are most of the same overall statistics that you would find when running the MolProbity webserver.These statistics are colored stoplight-style (green, yellow, and red) as a rough guide the severity of a given result.

Notice the high percentage of rotamer outliers, and large number of C-beta deviations (those combine geometry problems around the C-alpha into a single measure, as explained in Lovell 2003).The Ramachandran favored percent appears a little low, and is colored yellow.However, as we saw in the Polygon above, this value is typical of structures around 2.26Å.The Ramachandran statistics might be improvable, but other validations take higher priority in this context.

This tab also displays Rama-Z score validation.This is a whole-model assessment of how realistic the Ramachandran distribution is and guards against overfitting to Ramachandran criteria.This structure has a reasonable Ramachandran distribution, with all its Rama-Z scores falling within +2 to -2.

Tutorial: Validation with MolProbity (3)

Basic Geometry

From the MolProbity tab, click on the "Basic geometry" tab.Here you find a summary of all bond, angle, dihedral, chirality, and planarity outliers.Outliers will be listed in the associated lists, and each item is clickable, which will center in Coot.Note the bond outlier for Ile A 163 - we'll be seeing this residue again later.

Protein-specific validations

Next, go to the "Protein" tab.This section contains validation information for Ramachandran and rotamer outliers, C-beta deviations, recommended Asn/Gln/His sidechain flips (these have NOT already been done for you, as you can tell if you click on His 39 in the flip list to see it in Coot), and non-trans peptide bonds.

RNA-specific validations

If this structure contained any RNA molecules, there would be an additional tab named "RNA".This section contains validation information for RNA nucleotides with covalent geometry outliers (also found in the Basic geometry tab), RNA nucleotides with sugar pucker outliers, and RNA suites with unexpected backbone conformations.

All-atom Contacts

Go to the "Clashes" tab. This section contains a list of all the steric overlaps >0.4Å (i.e. "clashes") in the structure, sorted by severity. Scroll down the list and note that almost all the clashes involve at least one Hydrogen atom. Hydrogens form most of the steric contacts in a structure and are crucial to contact analysis! Phenix, MolProbity, and Coot will generally ensure that hydrogens are added when you need them, but make sure to use phenix.reduce to add hydrogens manually if you are working outside these systems.

Refinement programs are excellent at optimizing details, but may struggle with large changes across energy barriers.Our corrections are therefore focused on getting residues over energy barriers and into the right local minimum.

Rotamers

Select the Protein tab and find the list of rotamer outliers.Click on Leu 27 from the A chain, which will center on this residue's CA in the Coot window.

As you can see in Coot, this orientation is not a terrible fit to the density - but it is a rotamer outlier and energetically unfavorable due to an eclipsed Chi angle.You can see the eclipsed conformation by looking down the CG-CB bond and noting that the CD1 atom nearly overlaps with the backbone CA.Recentering on CB may help you arrange this viewpoint.The residue also has a suggestive positive difference peak near the CD1.

We'll use the tools in Coot to fix this sidechain.Select the "Auto Fit Rotamer" tool.You can find it in the default toolbar or in Calculate-> Model/Fit/Refine.It looks like a sidechain surrounded by green.Then click on an atom in the Leu 27 sidechain.Did you see it rotate ~180 degrees?

To confirm the correctness of this change, use the "Undo" and "Redo" buttons to observe the change again.Look down the CG-CB bond as before. You should see that the sidechain has a favorable, staggered conformation after the change.Also note that the CD2 atom is now pointed towards the positive difference density peak - you've got the atom close enough to the correct position that refinement should be able to sort out the details.

Tutorial: Validation with MolProbity (4)

C-Beta Deviations

Return to the Validation GUI window, and find the list of C-beta position outliers.Select Ile 163 from the A chain, which will center on this residue's CB atom in the Coot window.This sidechain also appears in the rotamer outliers list on this tab and on the bond-length outliers list on the Basic Geometry tab.Misfit sidechains will often have multiple diagnostic indicators of a problem, which is useful in identifying the worst offenders.Also note the large blob of positive density near to the sidechain, further evidence that it may not be in an optimal position.

Select the "Auto Fit Rotamer" and then click an atom in the Ile 163 sidechain.Alternatively, you can define a Real Space Refine Zone around this sidechain, and drag the CG2 atom of the short arm into the large difference density peak.Use the "Undo" and "Redo" to toggle back and forth across the change.Note that the CA and especially the CB atoms have moved significantly.

Correcting misfit sidechains such as these can be tricky, as many rounds of refinement within the wrong local minimum have caused distortions in the model to accommodate the misfit.It would be important to revisit this region after another refinement.In this case, you can expect to find position of this Ile further improved, as the neighboring atoms are able to recover from the strain caused by the initial outlier.

Tutorial: Validation with MolProbity (5)

N/Q/H Flips

Return to the Validation GUI window, and find the list of Backwards Asn/Gln/His sidechains.

Asparagine, Glutamine, and Histidine sidechains are pseudo symmetric.They can easily be fit into electron density backwards, eliminating hydrogen bonds and introducing clashes.Hydrogen addition via Reduce (in Phenix, Coot, or MolProbity) will usually "flip" these N/Q/H residues to the positions with best steric contacts.Here, you can use Coot to force a necessary Histidine flip.

Select HIS 39 from the A chain, which will center on this residue in the Coot window.Note that the CD2 atom of His 39 and the N atom of Asp 41 are involved in a serious clash.If we flipped the His 180°, then the ND1 atom would be in a position to form a hydrogen bond with the backbone NH where this clash is.

Define a Real Space Refine Zone that includes HIS 39.In the refinement menu that pops up, look under "Active Refinement" and click the "Side-chain Flip 180°" button.(This flip only works on N/Q/H residues; use the Auto Fit Rotamer tool for other tasks.)

Tutorial: Validation with MolProbity (6)

Clashes

Return to the Validation GUI window, and navigate to the Clash tab.

Steric clashes usually serve as emphasis on other model problems.However, there are some errors that are clearly diagnosed by clashes alone, such as N/Q/H flips and certain misplaced water molecules.There are many tools for identifying misplaced waters, and clashes are excellent for finding the most problematic ones.

Select "A 177 GLN HG3 ___ A 554 HOH O" from the list of clashes, which will center on this clash in the Coot window.Scroll the 2Fo-Fc map contour down, and note that this water has only marginal density even at a low contour.There's no justification for keeping a water in this position.

To remove this water, select the "Delete item" tool.In the Coot sidebar, it looks like a trash can.In the Delete item tool, toggle the Water mode, then click on the offending water.

The next water clash on the list, "A 275 VAL HG21 ___ A 577 HOH O", is another good example of an unjustified water marked by a clash.Try using Coot to remove it as well.

Tutorial: Validation with MolProbity (7)

Notes

The kind of sidechain fixups you've done in KiNG or Coot can mostly be accomplished using real-space refinement in phenix.refine (which includes a rotamer correction component), up to somewhere between 2 and 2.5 Å (and of course NQH flips are done automatically by Reduce in either MolProbity or Phenix).At lower resolution, however, real-space refinement with rotamer correction cannot reliably discern between correct and incorrect rotamers.That's a hard job for people as well, but can often be done if you have the interactive information on clashes and H-bonds from the non-pairwise, H-aware all-atom-contact dots.

You can correct many of the other outliers in the GUI lists, especially the sidechains. Try a few!

You will need another set of tutorial data for this section.In the main Phenix window, go to New Project -> Set up tutorial data -> "Validation - xyloglucanase cis peptides".

If you are starting from the main Phenix window, select Comprehensive validation (X-ray/Neutron) as shown above.If you still have the comprehensive validation window open, return to the "Configure" tab.Select 2cn3.pdb as the input model and 2cn3.mtz as the reflections file.We will not use a restraints file this time, so clear that field if necessary.Now run the validation; this will take a few moments.

A type of outlier that can be very difficult for real-space refinement alone to resolve is an incorrect cis-peptide.The peptide bond between carbon and nitrogen joins adjacent amino acids.Due to the nearby carbonyl, the peptide bond has partial double bond character and does not rotate freely.Most peptides are trans, but genuine cis peptides occur preceding 5% of prolines and preceding about 1 in 3000 non-proline residues.Despite its rarity, a cis conformation can be tempting to model at lower resolutions because it may appear to fit constricted or patchy density.Once modeled, cis peptides are difficult to escape, since doing do would involve rotating 180 degrees through a high energetic barrier.Fortunately, there is a tool in Coot to simplify human-directed corrections.

Once the validation completes, open the model in Coot.Then move to the MolProbity tab of the validation output, and then the Protein tab within MolProbity.Scroll down to the bottom of the Protein tab to find Cis and Twisted peptides validation.Note the overall model statistics and the suspiciously high incidence of non-trans peptides in this model.

Peptide bonds necessarily span two residues, so both are listed in the validation output table.However, in shorthand, peptide bonds are properly associated with their following residue, due to the special relationship cis peptide bonds have with a following proline.Thus, "chain A GLY 161 to GLU 162" is a cis-Glu, and "chain A, GLY 293 to PRO 294" is a cis-Pro.

Tutorial: Validation with MolProbity (8)

This structure is a xyloglucanase.Real cis-nonProline conformations are more common in carbohydrate-active enzymes like this one than elsewhere.As a result, this structure contains both real and incorrect cis peptides.

Justified cis conformations

We'll look at a real example first.Click on "GLY 161 to GLU 162" in the table, which will center Coot on the N atom of the peptide bond of interest.Note that Coot marks cis peptides with a colored trapezoid (green for cis-Proline, red for cis-nonProline, yellow for twisted).Scroll the electron density to a higher contour to convince yourself that cis truly is the best conformation to fit this region.You need this kind of strong support - from experimental data, hom*ology, or chemistry - to justify cis-nonProlines.

Tutorial: Validation with MolProbity (9)

Mistaken cis conformations

Now we will fix a mistake.Click on "LYS 269 to GLY 270" to center the Coot window on this peptide bond.In addition to the red trapezoid marking the cis-peptide, there are large areas of difference density around the peptide bond.Also note the clash between the GLY 270 CA and the LYS 247 O.

Tutorial: Validation with MolProbity (10)

To perform the correction: Go to Calculate -> Other Modeling Tools from the Coot menu bar, and select Cis <-> Trans.Click on one of the atoms in the offending peptide bond, and Coot will flip the bond.The bond conformation has now been corrected, but the atoms have moved out of the density and geometry distortions have been introduced.Select Real Space Refine Zone, and then select atoms on either side of the peptide bond.This correction involved a large change, so be sure to give Coot at least two full residues on either side of the bond to work with.Once you have obtained a satisfactory local refinement, note that the GLY 270 N is now in position to form a hydrogen bond with the LYS 247 O, where the clash was.

ASN 298 to GLY 299 is another good example of a region with marginal electron density and a mistaken cis-peptide.Select it from the list and try fixing it.As a bonus, you can then move the ASN 298 sidechain into some highly likely positive difference density.

Testing both cis and trans

Finally, select "MET 348 to ASN 349" from the list.This region has no significant difference density, and the 2Fo-Fc map is fairly convincing.Nevertheless, let's test it.Try changing this peptide to trans and do a local refinement.

Tutorial: Validation with MolProbity (11)

It's not a good fit to the density!Convince yourself that there's no way to get even a marginal fit from the trans conformation, then hit "Undo" until you get the cis-Asn back.Trying both possibilities like this can be a useful strategy.

  • Structure validation by Calpha geometry: phi,psi and Cbeta deviation. S.C. Lovell, I.W. Davis, W.B. Arendall, P.I. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson, and D.C. Richardson. Proteins 50, 437-50 (2003).
  • MolProbity: More and better reference data for improved all-atom structure validation. C.J. Williams, B.J. Hintze, J.J. Headd, N.W. Moriarty, V.B. Chen, S. Jain, S.M. Prisant MG Lewis, L.L. Videau, D.A. Keedy, L.N. Deis, I.I.I. Arendall WB, V. Verma, J.S. Snoeyink, P.D. Adams, S.C. Lovell, J.S. Richardson, and D.C. Richardson. Protein Science 27, 293-315 (2018).
Tutorial: Validation with MolProbity (2024)

FAQs

How to interpret MolProbity score? ›

Summary statistics. Clashscore is the number of serious steric overlaps (> 0.4 Å) per 1000 atoms. In the two column results, the left column gives the raw count, right column gives the percentage. * 100th percentile is the best among structures of comparable resolution; 0th percentile is the worst.

What is the use of MolProbity? ›

MolProbity is most complete for crystal structures of proteins and RNA, but also handles DNA, ligands, and NMR ensembles. It works best as an active validation tool - used as soon as a model is available and during each rebuild/refine loop, not just at the end to provide global statistics before deposition.

What is the clash score in PDB? ›

All models submitted to the Protein Data Bank (PDB) are analyzed by MolProbity, which adds hydrogen atoms (where absent), determines a clash score ("clashscore"), and ranks the clashscore relative to other models of the same resolution. The clashscore is the number of clashes per 1,000 atoms (including hydrogens).

What does the Ramachandran plot tell you? ›

The Ramachandran plot shows the statistical distribution of the combinations of the backbone dihedral angles ϕ and ψ. In theory, the allowed regions of the Ramachandran plot show which values of the Phi/Psi angles are possible for an amino acid, X, in a ala-X-ala tripeptide (Ramachandran et al., 1963).

What is a good QMEANDisCo global score? ›

Local quality is either estimated using the raw QMEAN scoring function or one of the two specialized functions QMEANBrane and QMEANDisCo. They all provide scores in range [0,1] with one being good.

What is cablam validation? ›

It is a system designed to use protein CA geometry to evaluate mainchain geometry and identify areas of probable secondary structure.

What is the GDT HA score? ›

GDT-HA score (HA stands for High Accuracy) is calculated according to the GDT_TS formula (above), except that the distance cutoffs are halved: d1=0.5Е, d2=1Е, d3=2Е, d4=4Е. This score is slightly better suited for the evaluation of high hom*ology targets, which are usually very close to native conformations of proteins.

What are Ramachandran outliers? ›

Ramachandran outliers are those amino acids with non-favorable dihedral angles, and the Ramachandran plot is a powerful tool for making those evident. Most of the time, Ramachandran outliers are a consequence of mistakes during the data processing.

What is a pdb validation report? ›

wwPDB Validation Reports are available for every entry to provide an assessment of the quality of a structure and highlight specific concerns by considering the model coordinates, experimental data, and fit between the two.

What does clash score mean? ›

Clashscore. This score is derived from the number of pairs of atoms in the model that are unusually close to each other. It is calculated by MolProbity (Chen et al., 2010) and expressed as the number or such clashes per thousand atoms.

What is a clash in a protein? ›

Steric clash is one of the artifacts prevalent in low-resolution structures and hom*ology models. Steric clashes arise due to the unnatural overlap of any two non-bonding atoms in a protein structure.

What is the interpretation of GMQE score? ›

GMQE (Global Model Quality Estimation) is expressed as a number between 0 and 1, reflecting the expected accuracy of a model built with that alignment and template, normalized by the coverage of the target sequence. Higher numbers indicate higher reliability.

What is the Rama Z score? ›

The Rama-Z score describes how “normal” a model is compared with a reference set of high-resolution structures (Hooft et al., 1997).

What is clash score? ›

Clashscore is the number of serious steric overlaps (> 0.4 Å) per 1000 atoms. Protein. Geometry. Poor rotamers.

Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 5303

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.