CGenFF FAQ

CGenFF

This page was originally developed by Kenno Vanommeslaeghe here.

MacKerell Lab
Computer-Aided Drug Design Center

University of Maryland, Baltimore
Baltimore, MD 21201
USA

Technical questions
- How do I compile CHARMM with CGenFF support?
  What should I do if I get "LEVEL -4 WARNING FROM <RTFRDR> - LIMIT EXCEEDED"?
  What should I do if I get "LEVEL -3 WARNING FROM <PARRDR> - Maximum no. of dihedrals reached"?
Using CGenFF
Parameterization questions

Top - Back to CGenFF main page

Technical questions

How do I compile CHARMM with CGenFF support?
What should I do if I get "LEVEL -4 WARNING FROM <RTFRDR> - LIMIT EXCEEDED"?
What should I do if I get "LEVEL -3 WARNING FROM <PARRDR> - Maximum no. of dihedrals reached"?

CHARMM version 36 supports CGenFF by default. Version 35 should be compiled with the optional preflx keyword "CGENFF". One way to do this is

./install.com gnu medium +CGENFF

where "gnu" and "medium" should be modified according to architecture and desired CHARMM size, resp. Version 34b2 can be compiled by cd-ing into your c34b2 directory and untarring this file before compilation (this will overwrite ./install.com, ./source/fcm/dimens.fcm and ./doc/preflx_list.doc). Older CHARMM version are not supported. Be advised that the CHARMM build system will not recompile affected modules after changing preprocessor settings! Because of this, you cannot recompile CHARMM with CGenFF support in a directory in which you previously compiled CHARMM without CGenFF support unless you do

rm -r build/gnu lib/gnu

first (again, substituting "gnu" according to your needs). To test whether CHARMM was compiled with CGenFF support, run the following CHARMM script:


* check whether CHARMM was built with CGenFF support
*

echo ?CGENFF

If the CHARMM output file contains:

 RDCMND: can not substitute energy "?CGENFF"

then CGenFF support was not compiled in correctly.

Contents - Back to CGenFF main page

Using CGenFF

How do I apply CGenFF to a new molecule of my choice?
Where does the CGenFF program fit in the old tutorial?
Where does the lsfitpar program fit in the new tutorial?

The new tutorial is effectively the answer to this question. In a nutshell, the recommended course of action for the parameterization of a new drug-like molecule is to start by submitting it in its entirety to the CGenFF program. Then, apply the divide-and-conquer strategy (as also discussed in the new tutorial) to construct small model compounds in function of the charges and parameters that need to be optimized, as indicated by the penalty scores. Finally, submit these model compounds again to the CGenFF program to obtain initial guess stream files. Note that this makes it unnecessary to identify "missing parameters" as described in the old tutorial. Also, the xyz coordinates in the mol2 file can now be used as an initial geometry instead of constructing an IC table. The rest of the procedure in the old tutorial (starting from "Target data") remains the same, except that the optimization of the bonded parameters is now made easier by the lsfitpar program. How the latter fits into the new tutorial is explained by the lsfitpar tutorial, located in the "examples" subdirectory of the lsfitpar distribution; it even revisits one of the molecules of the new CGenFF tutorial ("SULFAZ")!

Where can I find the latest CGenFF version?
Can I download the complete CGenFF tutorials, including all scripts, for offline usage?

At the download page. CGenFF is an ongoing project that is updated regularly, and new releases will usually occur first on the download page.

Can I represent a biomacromolecule with CGenFF?

No! The Protein, Nucleic Acid, Carbohydrate and Lipid force fields are highly optimized for their specific purpose. The general force field is designed to cover any combination of chemical groups. This inevitably comes with a decrease in accuracy for representing any particular subclass of molecules, such as proteins, nucleic acids, carbohydrates and lipids. It is for this reason that we removed the amino acids, nucleotides,... from CGenFF.

How do I read in CGenFF alongside the CHARMM protein force field?

With the latest CHARMM force field release, you can just read in the topology and parameter files as follows:

read rtf card name @TOPPAR/top_all36_prot.rtf
read param card flex name @TOPPAR/par_all36_prot.prm
read rtf card append name @TOPPAR/top_all36_cgenff.rtf
read para card flex append name @TOPPAR/par_all36_cgenff.prm
stream toppar_water_ions.str

(optional) read in your own stream file:
```
stream my_ligands.str
```

Note: the latest CGenFF version is often ahead of the CHARMM force field release.

How do I cite CGenFF?

For the CGenFF force field:
- Mention the version of the CGenFF force field you used (eg. 3.0.1)
- Cite:
  - K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov, A. D. MacKerell Jr., J. Comput. Chem. 2010, 31, 671-690.
  - W. Yu, X. He, K. Vanommeslaeghe, A. D. MacKerell Jr., J. Comput. Chem. 2012, 33, 2451-2468.
For the CGenFF program:
- Mention the version of the CGenFF program you used (eg. 1.0.0)
- Cite:
  - K. Vanommeslaeghe, A. D. MacKerell Jr., J. Chem. Inf. Model. 2012, 52, 3144-3154.
  - K. Vanommeslaeghe, E. P. Raman, A. D. MacKerell Jr., J. Chem. Inf. Model. 2012, 52, 3155-3168.

Contents - Back to CGenFF main page

Parameterization questions

The online version of the old tutorial refers to file X, but I don't find any link to this file. How can I get it?

For actually running the tutorial (as opposed to just looking at it), it is strongly recommended to download the offline version from the download page, which contains all the necessary files.

I calculated charges. Specifically, I ran a semi-empirical (AM1, PM3, PM5,...), DFT (B3LYP, B97, M06,...) or ab initio (HF, MP2, CCSD,...) calculation and obtained Mulliken, NPA, AIM, Hirshfeld (aka stockholder) or Merz-Kollman (or CHELP or other (R)ESP fitted) charges. Can I use these charges as they are?

No, no, no! Any charges that are directly derived from the wavefunction (such as Mulliken, NPA, AIM, Hirshfeld,...) are generally considered incapable of reproducing accurate electrostatic interaction energies in the framework of an atom-centered point charge model (as used by Class I additive force fields such as CHARMM) without emiprical corrections such as the "BCC" scheme. Force fields that do use unmodified charges based on vacuum QM calculations typically use an indirect method, fitting the atomic charges to reproduce the ElectroStatic Potential (ESP) at many points around the molecule. However, CHARMM falls into neither of these categories. For charges to be compatible with the L-J (and dihedral) parameters in CHARMM, they must be optimized based on interactions with water, as described in the CGenFF paper and tutorials. The Merz-Kollman charges mentioned in said paper and in the old tutorial are only meant as an easy way to obtain an initial guess, and have been superseded in this role by the CGenFF charges produced by the CGenFF program. Same for other (R)ESP methods and AM1-BCC and the likes; they may be appropriate for other force fields, but not for CHARMM. Different force fields are based on different parameterization philosophies, and all parameters are interconnected and have to be consistent. If you don't have the time or resources to perform charge optimization based on interactions with water, it is strongly recommended to assign charges by analogy to existing CGenFF compounds, which were properly optimized. In fact, the CGenFF program does this in an automated fashion, and is presently the recommended source of initial guess charges (and parameters); see this related FAQ entry.

I got parameters by analogy from a different (non-CHARMM) force field. What do I do next?

Throw away all non-CHARMM parameters and replace them with CHARMM ones. Different force fields follow different parameterization philosophies and have different addition rules for the LJ parameters. And as all parameters are interconnected, this affects the bonded parameters as well. Different force fields are not compatible.
If you're absolutely sure you know what you're doing, you can use parameters from force fields that are close to CHARMM (such as OPLS) AS AN INITIAL GUESS. This means that they need to be thoroughly validated and re-optimized when necessary, using the methodology described in the CGenFF tutorials. In most cases, it is a better idea to take initial guess parameters (and charges) from the CGenFF program; see this related FAQ entry.

I can't find analogous improper dihedrals

Ususally, this is caused by defining too many impropers. In CGenFF, out-of-plane motions are often reproduced correctly by the valence angle and (proper) dihedral terms alone. Some moieties (most notably carbonyl groups) do consistently need impropers, but most centers don't. To determine whether your molecule requires improper dihedrals, look for analogous molecules in the main topology file or use the the CGenFF program and look for IMPR lines in the topology section of the output.

Why does the phase of all dihedral terms have to be 0 or 180?

This can be best illustrated with an example. Take butane and set the C-C-C-C dihedral in its gauche(+) conformation (ie. +60°). Measuring this dihedral will always yield +60°, whether one measures it as C1-C2-C3-C4 or C4-C3-C2-C1. Consequently, even though butane is not a chiral molecule, its gauche(+) conformation is an asymmetric conformation, its mirror image being the gauche(-) conformation with a C-C-C-C dihedral of -60°. Asymmetric dihedrals (ie. dihedral parameters with phases other than 0 or 180) will assign different energies and forces to these two mirror-image conformations, which, needless to say, is highly unphysical. And it can be mathematically proven that this asymmetry cannot be compensated using a finite number of dihedrals with different multiplicities.

Note that in a force field for asymmetric molecules that always have the same chirality, asymmetric dihedrals are not such a big problem. For instance, in a protein force field, they would cause D-amino acids to have different energetics than L-amino acids, but as long as a protein doesn't contain any D-amino acids, there is no problem. In fact, the CMAP term does exhibit this kind of asymmetry and consequently cannot be applied on D-amino acids; doing so would require a mirror-image CMAP term. However, CGenFF is supposed to support symmetric molecules as well as unnatural stereoisomers of asymmetric molecules, so asymmetric dihedrals should be kept out of CGenFF.

What's wrong with simultaneously fitting the 1-, 2-, 3-, 4-, 5- and 6-fold dihedral terms?

There are two schools of thought about the dihedrals. One could just see it as a Fourier-like expansion, and postulate that anything goes as long as the energy profile is reasonable. However, one could also see it as a correction term for the nonbonded 1-4 interactions (which is what it really is). From this point of view, dihedral contributions that don't reflect the symmetry (as apparent from a Newman-projection) are unphysical.

At first glance, this may seem a philosophical discussion, but there are implications. Imagine the CG331-CG321-CG2R61-CG2R61 dihedral in ethylbenzene. One might introduce a 3-fold to get a small detail in the profile right. However, for an ideal benzene geometry, the two threefolds of the two (cisoid and transoid) instances of the dihedral exactly cancel out for all values of the dihedral angle. Because the starting geometry of a minimization/simulation might be slightly off, what really happens is that the dihedrals excerpt an out-of-plane force on the benzene ring. Because of this, there will also be a net dihedral contribution that becomes quite significant for high values of the force constant. This is what may allow people to get small details in the profile right, but at the cost of pulling the benzene ring out-of-plane and/or deforming the angles and/or getting a strange vibrational spectrum, with apparently no way to make it better. There are more cases like that, for instance the C-O-P-O angle in organic phosphate, which should never contain 1- or 2-fold contributions. The bottom line is: when fitting parameters, always think about whether what you're doing is physical. If not, you will get in trouble further down the line.

A special case is the 5-fold term. I've never seen any molecule with a symmetry that justifies using it. 5-folds should be considered harmful!

How can dihedral contributions cancel out?

Why does a wind turbine or a propeller with 2 or 3 blades not wobble?

There are a number of common cases in which dihedral contributions cancel out:

If a dihedral involves an sp² center as one of the inner atoms, and both substituents on this center are identical or have identical dihedrals, all terms with an odd multiplicity will cancel out.
If a dihedral involves an sp³ center as one of the inner atoms, and all three substituents on this center are identical or have identical dihedrals, all terms with a multiplicity that is not a multiple of 3 (ie. all terms other than 3-fold and 6-fold) will cancel out.
Consequence 1: if both of the above conditions are satisfied (example: methyl rotation in toluele), only the 6-fold term will not cancel out.
Consequence 2: if the substituents are not identical but some of their dihedral terms have the same phase and multiplicity, there may be partial cancellation and some of the terms can be omitted. (In practice, one would typically choose to omit H-X-X-H terms because these are more prone to transferability problems.)

All these observations follow from the expression for the dihedral energy:
dihedral formula

Using this formula, we can make polar plots of the energy in function of the angle for a few common cases of cancellation:

Polar plot of two 1-fold dihedrals on substituents that are 180° apart; the sum is a constant.	Polar plot of two 3-fold dihedrals on substituents that are 180° apart; the sum is a constant.
Polar plot of three 1-fold dihedrals on substituents that are 120° apart; the sum is a constant.	Polar plot of three 2-fold dihedrals on substituents that are 120° apart; the sum is a constant.

If you don't believe your eyes, here's the corresponding math, assuming that the phase δ is 0, and let φ be the dihedral as measured using one of the substituents as a reference atom.
Warning: rumor has it that some browsers on some operating systems don't render the "loopy phi" (φ) and "stroked phi" (ϕ) correctly!

Two 1-fold dihedrals:
Substituent 1: ϕ=φ
Substituent 2: ϕ=φ+π
E = 1 + cos φ + 1 + cos(φ+π) = 2 + cos φ - cos φ = 2

Two 3-fold dihedrals:
Substituent 1: ϕ=φ
Substituent 2: ϕ=φ+π
E = 1 + cos 3φ + 1 + cos(3(φ+π)) = 2 + cos 3φ + cos(3φ+3π) = 2 + cos 3φ - cos 3φ = 2

Three 1-fold dihedrals:
proof for three 1-folds

Three 2-fold dihedrals:
proof for three 2-folds

Other cases...
...are left as an excercise for the reader.

My MP2 minimizations/potential energy scans are computationally too heavy. What should I do?
Why should I split my molecule into smaller subsystems for the purpose of parameterization?
What are these "contamination" and "secondary interactions" people sometimes talk about, and how can I avoid them?

There are several reasons why large compouds should be split into smaller subsystems:

One part of the molecule might sterically or electrostatically interfere with the water probe probing a different part of the molecule. For example, a steric clash may make the interaction energy less favorable, or a secondary hydrogen bond may make it more favorable than it should be. In both cases, this gives rise to difficult-to-satisfy target data, and if blindly optimized, may yield poorly transferable or even uphysical charges. Therefore, apart from splitting up the molecule, it is important to visually verify that all water probes are clear of "secondary interactions"!
Similarly, one part of the molecule might sterically or electrostatically interfere with another part of the molecule while it is being rotated around during a dihedral Potential Energy Scan. Again, this gives rise to a difficult-to-satisfy Potential Energy Surface and may yield poorly transferable or even uphysical dihedrals, and again, visual verification is important!
The MOLVIB output of large molecules becomes so unwieldy that it is nearly useless for the purpose of parameterization.
MP2 calculations on large molecules can become extrememly expensive.
Even if all of the above problems can be worked around, the resulting parameters will still be very specialized and poorly transferable to other molecules, or in many cases even to different conformations of the molecule of interest!

How should I assemble a molecule that consists of more than 2 chemical groups or subsystems?

Suppose your molecule consists of 3 groups, A, B and C, and can formally be written as A-B-C. After making sure that the 3 groups are parameterized separately, The model compound A-B should be constructed to optimize parameters (typically only dihedrals) associated with the A-B linkage. Then the same is done for for model compound B-C. This procedure ensures that C does not sterically or electrostatically interfere with the A-B scan, and A does not interfere with the B-C scan. Finally, A-B-C can be put together and submitted to bulk phase MD simulations of the user's preference for the purpose of validation.

Why can't I directly use QM or experimental bond lenghts and angles as reverence values in the parameter file?

Because intramolecular nonbonded interactions will typically stretch or compress bonds and angles by a significant amount. if you measure equilibrium values in a QM minimized structures and put them directly into the parameter file as reference values, the resulting MM minimized structure will have different bond lenghts and angles than the QM because of the force excerted by these nonbonded interactions. Therefore, the reference values in the parameter file should be iteratively optimized so that the MM minimized structure matches the QM minimized one.

I can't get an out-of-plane angle or an out-of-plane motion in the vibrational spectrum right

This is often caused by neglecting the sum of the equilibrium angles. The sum of the equilibrium angles (as found in the parameter file) around a planar center should be 360°. If the sum is smaller, the center will be pulled out-of-plane. This is what allows us to optimize a force field to reproduce out-of-plane angles. Similarly, a sum larger than 360° will result in a residual in-plane force. However, this mechanism should not be use to increase out-of-plane vibrational frequencies, as sums larger than 360° may give rise to non-transferable parameters and other nasty side effects. If an out-of-plane vibrational frequency is too low, this can be cured either by increasing the angular force constants (if this doesn't adversely effect the in-plane vibrations) or by introducing an improper dihedral.
Similarly, the sum of the inner angles in a planar 5-membered ring should be 540°, in a 6-membered ring 720°,... Making the sum smaller will make the ring nonplanar; making the sum larger will induce extra ring strain and thereby make the ring more rigid.

How can I create a covalent bond between the CHARMM biololecular force field and CGenFF?
Can I use CGenFF to represent a non-standard amino acid?

The standard methodology to do this is to cap the CGenFF part of your system with a relevant portion of the biomolecular part, and submit the result to the CGenFF program. For example, if the CGenFF part is a post-translational modification of an amino acid side chain, a relevant portion of the original side chain should be included; if the CGenFF part is a whole amino acid, relevant parts of the adjacent amino acids in the sequence (typically up to and including ^αC) should be added. If necessary, parameter optimization can be performed on this model compound. Then, typically, one would edit the CGenFF topology into a PRES or RESI that can be applied to the biomolecular part. Upon generating an actual biomolecular system than includes the newly created CGenFF portion, CHARMM will list "missing parameters" that involve atom types from both the biomolecular force field and CGenFF. These parameters can be transfered either from the CGenFF parameter file or from the biomolecular force field, depending on which source is most relevant for that particular parameter. However, care should be taken not to combine dihedral parameters from different sources around the same rotatable bond, and angles from different sources around the same trigonal planar center.

Contents- Back to CGenFF main page

This material is based upon work supported by the National Science Foundation under Grant No. 0823198. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Last updated Sunday, the 31st of May 2015