IMPORTANT NOTE: Significant portions of this tutorial are OBSOLETE! This tutorial has been superseded by the new tutorial available from the download page, which also contains an offline version of the present (old) tutorial that includes some input files and scripts that are missing from the present online version.

CHARMM General Force Field tutorial

By Kenno Vanommeslaeghe and Alexander D. MacKerell Jr.

Recommended reading: reference 1
Prerequisites: a UNIX-like operating system running CHARMM2,3
Also recommended: gnuplot



Parametrization of a small heterocycle: pyrrolidine

Atom typing and initial guess charges

To simulate the addition of pyrrolidine to the CHARMM General Force Field (CGenFF), the script remove_pyrrolidine can be used to create a version of the residue topology file (RTF) and parameter (PRM) file in which pyrrolidine and its parameters are missing. As outlined in the parametrization philosophy section of the CGenFF paper,1 the first step in treating the new compound (ie. pyrrolidine) is creation of the residue definition, as can be seen in the "toppar stream file file" toppar_prld1.str. As discussed elsewhere,2,3 the RTF includes the total charge, atom names (ideally IUPAC nomenclature), the CGenFF chemical atom types, the partial atomic charge for each atom, the covalent bonds in the molecule, any improper dihedrals used in that molecule and the internal coordinate (IC) table that is used to build the molecule from internal coordinates and for analysis (the IC table is not required for energy calculations if Cartesian coordinates for the molecule are present). In the case of pyrrolidine, the user initially assigns the appropriate CGenFF atom types, followed by the charges. Creation of the BOND list and the IC table (which may be performed via the IC GENErate command) will not be discussed further. Initial atom types for pyrrolidine were obtained by analogy with previously parametrized molecules that are chemically similar to the new molecule. Following assignment of the atom types initial charges were assigned to allow for a preliminary energy minimization from which a starting geometry may be obtained for the QM calculations.

Initial guess parameters

At this stage, when initially generating the structure and calculating its energy, CHARMM2,3 will present error messages for parameters that are missing. This is demonstrated by running
cd pyrrolidine
charmm str:toppar_prld1.str mispar: -i generate_tutorial.inp -o generate1.out
The output of this command (ie. generate1.out) will contain a block of missing bond and angle parameters, similar as in prld_missing1.txt. These are parameters for which
  1. initial estimates must be obtained or wildcard parameters applied,
  2. validation must be performed and
  3. optimization may be performed if deemed necessary.
It should be emphasized that the hierarchical development of CGenFF requires that only new parameters for a given molecule be optimized, thereby insuring that the integrity of previously optimized molecules is maintained. Alternatively, a user may choose to optimize selected parameters that are already available in the force field; however, this will comprise the quality of previously optimized molecules such that those parameters should only be used for the particular molecule under study. Assignment of initial guesses to the missing parameters is based on analogy. For a new bond parameter, an available parameter is selected that contains one of the atom types in the new bond, with the second atom type as similar as possible. For valence angles, initially, available parameters with the same central atom type are selected. This selection is refined by identifying parameters for which one of the two terminal atom types is the same. Similarly, for dihedral angles, available parameters that match the two central atom types are initially selected, followed by identification of parameters for which one of the two terminal atom types is the same. Finally, for improper dihedrals, available parameters for which the first and fourth atom types are the same should initially be identified, followed by more specific matches of the central two atom types. To facilitate this procedure, one may use the script mispar as follows:
../mispar ../par_all36_cgenff-prld.prm < prld_missing1.txt
The output of this command illustrates that in the majority of cases, there will be multiple choices of parameters for assignment of the initial guess. In certain situations, a parameter for which the central atom type(s) as well as one of the terminal types are the same may even not be the best initial guess; in such cases, the user must use their chemical intuition to select the best possible guess. It is mainly for this reason that mispar by design doesn't offer "approximate matches" for bonds. Indeed, for the particular case of bonds, using an existing parameter that has one atom type in common is often simply not satisfactory. For example, when searching for an initial guess parameter for a Caromatic-Sdisulfide bond, it is most probably better to copy a Calkene-Sthioether parameter than to copy a Caromatic-Oether parameter, even though the latter has one atom type in common. Thus, not offering approximate bond parameters again forces the user to use their chemical intuition. Doing so yields the initial guess bond and angle parameters at the end of toppar_prld2.str. Running CHARMM with this updated stream file
charmm str:toppar_prld2.str -i generate_tutorial.inp -o generate2.out
brings the missing dihedral parameters to light. This time, mispar offers good analogous parameters,
../mispar ../par_all36_cgenff-prld.prm < prld_missing2.txt
which can straightforwardly be integrated in the "toppar stream file", yielding toppar_prld3.str.

Target data

Initiation of parameter optimization involves generating target data. Step one is an MP2 optimization of the molecule. As pyrrolidine exhibits tetrahydrofuran-like conformational flexibility,4 and its NH group can undergo pyramidal inversion, we initially selected 4 starting conformations and subjected these to QM optimizations.
Pyrrolidine atom numbering and charges
Figure 1: pyrrolidine partial atomic charge optimization. a: atom numbering convention; b: Merz-Kollman charges; c: symmetrized Merz-Kollman charges with charges on aliphatic hydrogen atoms set to 0.09; d: optimized charges.
The absolute minimum from this simplistic conformational search was an asymmetric 1T2 conformation, as defined following the nomenclature of M. Sundaralingam,5 but using the standard atom numbering for the pyrrolidine ring as shown in Figure 1a rather than the atom numbering convention for furanose sugars. Conversely, building and minimizing pyrrolidine from the IC table and the initial guess parameters in the tutorial toppar stream file (toppar_prld3.str) results in a 3T2 geometry, as can be seen by running
charmm str:toppar_prld3.str -i generate_tutorial.inp -o generate3.out
This generates the Gaussian input file prld_opt_freq_mp2.gjf in the "gauss" subdirectory, which contains the minimized conformation as also found in prld_min.pdb . Luckily, re-minimizing this 3T2 conformation at MP2/6-31G(d) level using Gaussian6 will reproduce the desired 1T2 geometry. This can be demonstrated by adjusting the processor and memory requirements in prld_opt_freq_mp2.gjf and running Gaussian on it, which produces an output file similar to prld_opt_freq_mp2.log containing a number of properties calculated on the 1T2 conformation. This conformation, in which the proton on the nitrogen atom is in an axial position, is used for the optimization of the charges. This process initially involves obtaining the MP2 Merz-Kollman charges (copied from prld_opt_freq_mp2.log to Figure 1b), which are then symmetrized so that chemically equivalent atoms have the same charge. Also, per convention of the CHARMM all-atom force fields, aliphatic hydrogen atoms are assigned a default charge of +0.09 (see methodology). The resulting "normalized" charges (Figure 1c) are used as an initial guess for the charge optimization, as reflected in toppar_prld3.str. The target data for this optimization consists mainly of scaled HF minimum interaction energies and distances between the model compound and individual water molecules. In the case of pyrrolidine, 10 different monohydrates are generated, as shown in Figure 2; however, in many cases it may be better to only consider interactions with hydrogen bonding groups and non-polar moieties adjacent to a heteroatom. The HF/6-31G(d) water interactions are calculated using the Gaussian input files named prld*_hf*.gjf in the "gauss" subdirectory; corresponding sample output files (prld*_hf*.log) can be found at the same location. As pyrrolidine is a neutral molecule, the resulting water interaction energies (calculated as Ecomplex-Epyrrolidine-ETIP3 water, where ETIP3 water=-76.010530 Eh) are scaled by a factor 1.167 and used as target data for the optimization of the charges, together with the HF and MP2 dipole moments that came out of the MP2 optimization.
Pyrrolidine water interactions
Figure 2: interaction orientations of pyrrolidine with water molecules that were used for charge optimization. Note that only a single water molecule is interacting with pyrrolidine during each calculation; all water molecules are shown simultaneously only for convenience.

Initial charge optimization using the QM geometry

Once these target data are obtained, the corresponding empirical values are calculated and compared to the target values. For the optimization of the charges, this is done by the CHARMM script water_constr_tutorial.inp, which can be ran as follows:
charmm str:toppar_prld3.str -i water_constr_tutorial.inp -o water_constr1.out
This generates the human-readable text file prld_water_constr.ene , which contains the QM and MM dipole moments and water interaction energies and distances, as well as some relevant statistics. If this file shows a satisfactory level of agreement between the QM and MM data, then the charges as well as the LJ parameters are considered valid and no optimization is performed. If optimization is deemed necessary, the charges are manually adjusted to improve the overall agreement between the scaled QM and the empirical values, while maintaining the overall charge of the molecule, making sure that chemically equivalent atoms have the same charge, and constraining the charges of aliphatic hydrogen atoms or aromatic C-H moieties not adjacent to heteroatoms as described above. Following this procedure, the final set of charges shown in toppar_prld4.str and Figure 1d was obtained. Running
charmm str:toppar_prld4.str -i water_constr_tutorial.inp -o water_constr2.out
and looking at the new prld_water_constr.ene confirms that optimizing the charges yielded a significant improvement. The individual interaction energies and geometries with water as well as the dipole moment are in good agreement. This level of agreement may be considered a general rule for defining adequate agreement with the target data. The most favorable interaction, which is expected to dominate in aqueous solution, is well reproduced, as is the order of the interaction energies. In many cases the less favorable interactions, especially those involving nonpolar groups, are less well reproduced with respect to both interaction energies and geometries. Concerning the dipole moment, the magnitude of the CGenFF dipole moment is 22% higher than the HF dipole moment and 30% higher than the MP2 dipole moment. As discussed above, this overestimation of the dipole moment is desirable in order to correctly reproduce bulk phase properties. The Z-component of the dipole moment points in the opposite direction relative to the target data. As this component is relatively small, the direction of the dipole moment vector deviates only 23° and 26° from the HF and MP2 result, respectively. In accord with the iterative parameter optimization procedure,1 once the bonded parameters are optimized, the interactions with water and dipole moment should be re-considered using the CGenFF minimized conformation for the MM calculation. If deemed necessary, additional optimization of the charges could be undertaken, though typically this is not required.

Optimization of the equilibrium bond lengths and angles

The quality of the equilibrium bond lengths and angles can be judged by the equilibrium (ie. minimum energy) geometry. Analogous to the charge optimization, the CHARMM script quick_molvib_tutorial.inp will generate a side-by-side comparison of the QM and MM data as part of its output:
charmm str:toppar_prld4.str -i quick_molvib_tutorial.inp -o quick_molvib1.out
Specifically, the newly created file quick_molvib1.out, starting from line 457, contains a series of geometric measurements using CHARMM's "QUICK" command, each preceded by a comment that contains the MP2 value as found in prld_opt_freq_mp2.log. Just like with the charge optimization, the additional bond, valance angle and dihedral angle parameters that were added for this molecule (see "initial guess parameters") are optimized when the difference between the QM and the MM value for the corresponding bond or angle is not within acceptable range (typically 0.03Å for the bonds and 3° for the angles). In the particular case of pyrrolidine and other saturated 5-membered rings, there is an unusually prominent coupling between the angle and dihedral parameters as an indirect consequence of pseudorotation.4 For this reason, the newly introduced dihedral parameters were set to 0 while optimizing the bond and angle parameters; however, when removing the dihedral forces in this fashion, care should be taken that the MM minimized geometry stays reasonably close to the QM (ie. 1T2). The dihedrals will then be optimized at a later stage using Potential Energy Scans (PES).
After optimization of the equilibrium bond lengths and angles, the parameters in toppar_prld5.str were obtained. After running
charmm str:toppar_prld5.str -i quick_molvib_tutorial.inp -o quick_molvib2.out
and comparing quick_molvib1.out to quick_molvib2.out, the attentive reader will notice that, although agreement became better for a number of measurements, some of the new parameters seem to "overshoot" or even "move away from" their target. There are three reasons for this:
  1. Tradeoffs between the different target properties for one model compound often have to be made. For example, improving some of the angles in pyrrolidine may worsen others. The reader is welcome to experiment.
  2. Force field design is an iterative procedure by nature,1 and although the methodology for parametrizing CGenFF is designed so that it is possible to complete the parametrization in one iteration for the majority of compounds, pyrrolidine is one of the exceptions to this rule. To prevent the present tutorial from becoming excessively long, toppar_prld5.str contains equilibrium values that in reality were obtained after several optimization cycles. This demonstrates why, even though a single iteration often suffices to complete parametrization of a given model compound, one should always perform a second iteration to verify the self consistency of the parameters.
  3. In general, when introducing a new model compound, only the newly introduced parameters are optimized. However, if some of the existing parameters produce unacceptable results when applied to the new model compound, either a new atom type has to be introduced in order to increase the number of unique parameters, or the existing parameters have to be modified, taking all model compounds that are affected into account. Pyrrolidine is one of the rare cases where the latter approach was chosen. Consider for instance the angle parameter "NG3C51 CG3C52 HGA2". As can be seen in the comments in toppar_prld5.str, apart from pyrrolidine, this parameters also affects 2-pyrazoline, 2-pyrroline, 3-pyrroline and 2-imidazoline. The degree of pyramidization of the sp3 nitrogen in these three model compounds differs substantially, necessitating a compromise.

Optimization of the bond and valence angle force constants

Optimization of the bond stretching, angle bending and dihedral angle force constants is based on the reproduction of the scaled MP2 vibrational spectrum. In practice, this is done by creating a CHARMM input file (prld_molvib_qm.inp) that calls CHARMM's MOLVIB facility,8 providing it with Running CHARMM on this input file
charmm -i prld_molvib_qm.inp -o prld_molvib_qm.out
outputs the final QM spectrum as used in the optimization. To generate the corresponding MM spectrum, an identical MOLVIB input section can be found in quick_molvib_tutorial.inp, except that the CART and FMAT sections are absent because in this case, we're using the geometry and force constants generated respectively by the CHARMM minimization and the subsequent "VIBRAN" command. Thus, the optimization of the bond and valence angle force constants is driven by comparing prld_molvib_qm.out and quick_molvib2.out. Optimizing these force constants yields the parameters in toppar_prld6.str, which again can be verified:
charmm str:toppar_prld6.str -i quick_molvib_tutorial.inp -o quick_molvib3.out
Even though only 6 parameters were adjusted, there are substantial improvements in parts of the vibrational spectrum. The two lowest modes are torsional vibrations, which for most molecules reflect dihedral parameters. For rigid molecules, it is possible to optimize the dihedral parameters using only the equilibrium geometry and the vibrational spectrum. For flexible molecules, we usually ignore the torsional mode and only optimize the dihedral parameters in the next step using Potential Energy Scans (PES).

Optimization of the dihedral parameters

Final optimization of dihedral parameters associated with torsions involving only non-hydrogen atoms was based on adiabatic potential energy scans. Target data for these scans were obtained at the MP2 level on the improper dihedral that describes the NH pucker (prld_tscan1.gjf) and for the three ring torsions that are chemically different (prld_tscan2.gjf, prld_tscan3.gjf and prld_tscan4.gjf). The respective sample output files of these runs are prld_tscan1.log, prld_tscan2.log, prld_tscan3.log and prld_tscan4.log. The shell script extract_tutorial extracts the relevant information from these output files in a format that is convenient for the subsequent parameter optimization. Among other things, it creates a file scan/scan_prld.qme in the scan subdirectory that contains the QM energies and extracts the QM geometries at each scan point. Using these QM data, the CHARMM script scan/prld_allscan_tutorial.inp will generate the corresponding MM data after files such as scan/scan_prld.mme are initialized by the shell script scan/initialize. Finally, the shell script scanplot will set the lowest point in both the QM and MM PES to 0, respectively producing the normalized energy files scan/scan_prld.qne and scan/scan_prld.mne , then attempt to plot both surfaces simultaneously using gnuplot. In summary, one needs to issue the following commands:
cd scan
charmm str:../toppar_prld7.str -i prld_allscan_tutorial.inp -o prld_allscan1.out
../../scanplot -i 1 scan_prld.qme scan_prld.mme
If gnuplot is not available, the last line of scanplot will generate an error message, but all necessary files are produced so the user can plot the PES with the software of their choice. In the case of pyrrolidine, automatically scaling the axes of the plot to accommodate all the data (as gnuplot does by default) will yield a Y-axis that goes from 0 to 60 kcal/mol and an apparently excellent agreement between QM and MM. However, keeping in mind that high-energy regions and barriers will not be populated or crossed during a typical room temperature MD simulation, a more relevant picture should be generated by constraining the Y-axis from 0 to 8 kcal/mol:
Pyrrolidine PES using intial guess parameters

Pyrrolidine PES using optimized parameters
Figure 3: the 4 potential energy scans on pyrrolidine, pasted together in one plot. The X axis depicts the number of the scan point, which corresponds to the line number in the concatenated .qne or .mne file. The Y axis is the energy in kcal/mol. a: using the initial guess parameters; b: using the optimized parameters.
gnuplot scan_prld1b.gpt
The resulting plot, which should look similar to Figure 3a, shows substantial differences in the overall shape of the PES as well as in the locations of the minima, illustrating the fact that phenomena such as ring strain vary widely for small rings containing different atoms, making parameters that apply to these rings poorly transferable between rings. Moreover, as noted earlier, the bond and angle parameters used to generate this plot were fully optimized over several optimization cycles. If this were not the case, the initial guess PES would look even worse; see Figure 5 in the CGenFF paper.1 The latter also illustrates that for pyrrolidine and other saturated 5-membered rings, angular deformation forces play an important role in the torsional energy profile, as discussed above. After optimization, the parameters in toppar_prld8.str are obtained. The PES generated using these parameters
charmm str:../toppar_prld8.str -i prld_allscan_tutorial.inp -o prld_allscan2.out
../../scanplot -i 2 scan_prld.qme scan_prld.mme
gnuplot scan_prld2b.gpt
, which is reproduced in Figure 3b, shows excellent agreement between the MP2 and CGenFF surfaces; the minima are now in good agreement with the QM minima and the magnitude of the remaining energetic discrepancies is 0.5 kcal/mol. The evaluation considers the relative energies as well as the overall shape of the energy profiles. It is typically desirable to fit the lower energy regions of the surfaces (< 5 kcal/mol) as these are sampled to the largest extent in MD simulations. However, if conformational changes are to be studied, it is important that the barrier heights also be in satisfactory agreement with the target data. In addition, when the results from the studies to be undertaken are highly sensitive to the conformational energies, one may consider higher-level calculations for use as target data. In such situations the user should consider MP2/cc-pVTZ single point calculations on the MP2/6-31G* optimized geometry.
For the majority of model compounds, there are one or two rotatable bonds that need to be considered. In such cases, those torsions usually can be treated independently by performing potential energy scans for each dihedral while all remaining degrees of freedom on the surface are allowed to relax, then fitting the associated dihedral parameters. However, non-planar 5-membered rings as well as more complex systems require special attention. For example, with pyrrolidine, the three in-ring dihedrals as well as the amine out-of-plane surface are correlated, as are their parameters. In such cases, the individual dihedrals are still scanned at QM level with all other degrees of freedom, including other dihedrals that have to be fit, allowed to relax. However, when scanning the corresponding MM surfaces for a given dihedral, it is necessary that the remaining dihedrals associated with parameters that are being fit be restrained in the QM optimized structures from the QM PES (see the variables "con1" to "con6" in prld_allscan_tutorial.inp). The use of these additional restraints assures that the energy contributions from those dihedrals are taken into account when performing the fitting. In addition, in such situations, all the energy surfaces should be offset to the same global minimum so that these different energies are taken into account. The MCSA fitting program developed in our laboratory11 includes utilities to account for these issues and was in fact used to generate the parameters in toppar_prld8.str.

Charge re-optimization using the MM geometry

When optimizing the charges above, the CGenFF dipole moments and water interactions were calculated on the MP2 geometry. The reason for this is that the interaction energies with water were found to be very sensitive to the model compound's conformation. Taken together with the observation that the initial guess parameters often don't reproduce the MP2 equilibrium geometry well, it is necessary to constrain the model compound at its MP2 geometry in order to make a relevant comparison. However, as the MP2 geometry was later used as a target for the optimization of the bonded parameters, the optimized parameters generally reproduce the MP2 equilibrium geometry quite well. Taken together with the fact that CGenFF ultimately is expected to yield precise intermolecular interactions in a situation where no conformational constraints are present, we now can and should re-optimize the charges using the CGenFF geometry. Ideally, CGenFF at this point should reproduce the MP2 geometry so closely that no additional optimization is required, and this is very often the fact. Indeed, running
cd ..
charmm str:toppar_prld8.str -i water_tutorial.inp -o water1.out
produces a prld_water.ene file that is very similar to the file prld_water_constr.ene generated earlier and cannot be further improved by adjusting the charges.


At this point, the first optimization cycle is completed and the procedure should be re-ran starting from the optimization of the equilibrium bond lengths and angles, using the newly optimized parameters in toppar_prld8.str. This is equivalent to running the following sequence of commands
charmm str:toppar_prld8.str -i quick_molvib_tutorial.inp -o quick_molvib4.out
cd scan
charmm str:../toppar_prld8.str -i prld_allscan_tutorial.inp -o prld_allscan3.out
../../scanplot -i 3 scan_prld.qme scan_prld.mme
gnuplot scan_prld3b.gpt
cd ..
charmm str:toppar_prld8.str -i water_tutorial.inp -o water2.out
, carefully examining the output at every step. Of course, unless the validation fails and further parameter optimization is performed, the water interaction data will be identical to those obtained earlier. Similarly, the PES will be also be identical because we didn't change the charges during the last step of the previous round.
As the charges and parameters in toppar_prld8.str are identical to the ones in the main CGenFF topology and parameter files, one might as well reproduce the target data using those files. This can be accomplished by running
charmm  -i quick_molvib.inp -o quick_molvib.out
charmm  -i water.inp -o water.out
Examining the output of these CHARMM runs shows that most of the measurements describing the equilibrium geometry are in excellent agreement with the QM target data. Differences are typically lower than 0.03Å and 3° for the bonds and angles, respectively. If a similar level of agreement was obtained with the parameters directly assigned by analogy, parameter optimization would not have been required. The relatively large residual deviations of the dihedral angles and to a lesser extent the N-C-C angles can be explained by the fact that the CGenFF minimization, although starting from a 1T2 conformation, minimizes to a symmetrical 1E conformation after optimizing the parameters.
Pyrrolidine conformations
Figure 4: The 1T2 (carbon atoms in black) and 1E (carbon atoms in cyan) conformations of pyrrolidine.
As can be seen in Figure 4, these conformations are quite similar; 1E is an envelope conformation with the nitrogen atom at the tip, while 1T2 is a slightly twisted version of the same envelope conformation. Although the 1E conformation is a first order saddle point at MP2 level, its energy is only 0.005 kcal/mol higher than the 1T2 conformation. In fact, this slight deviation in the pseudorotational energy profile was a deliberate trade-off to improve other parts of pyrrolidine's potential energy surface as well as the potential energy surfaces of other molecules with which it shares parameters.
Agreement between the QM and MM vibrational spectra is generally fair, although matching the individual modes is complicated because, as discussed in the paper,1 mixing of the contributions from different internal degrees of freedom is different in the MP2 and the CGenFF spectra, a phenomenon that occurs in all but the simplest molecules. The considerable discrepancy in the lowest ring torsion is a consequence of the different conformations; indeed, this torsion is the sole coordinate along which the conformations differ. Additionally, the vibrational frequency along a degree of freedom is proportional to the second derivative (ie. curvature) of the PES along that degree of freedom. As can be seen in Figure 3, the QM and MM PES along some degrees of freedom differ in shape close to the minima. Although these differences may be considered insignificant, they will result in large differences in the vibrational frequencies. It is for this reason that torsions around rotatable bonds cannot be accurately parametrized based on vibrational frequencies alone.


Linking two rings: 3-phenoxymethylpyrrolidine

3-phenoxymethylpyrrolidine with some atom and dihedral labels; ethoxybenzene; 3-hydroxymethyltetrahydrofuran
Figure 5: 3-phenoxymethylpyrrolidine (a) and two "parent" model compounds from which it "inherited" parameters: ethoxybenzene (b) and 3-hydroxymethyltetrahydrofuran (c).
To demonstrate how a user can build a drug-like molecule out of the functional groups and heterocycles present in CGenFF, a benzene ring will be attached to the 3-position of pyrrolidine by means of an oxymethyl linker, yielding 3-phenoxymethylpyrrolidine (Figure 5a). This compound can be constructed by first removing H31 (defined as in Figure 1a) and summing its charge into C3 in order to preserve the integer charge. The atom types of C3 and H32 need to be reassigned to reflect the fact that C3 is now a tertiary carbon atom. Then, a phenoxymethyl group can be attached, using the same charges as found in the existing model compound ethoxybenzene (Figure 5b). This sequence of manipulations translates to the "patch residue" (CHARMM keyword "PRES") in toppar_3pomp1.str in the supplementary material. It should be noted that the phenoxymethyl group is treated as a separate charge group (using the "GROUP" directive in the CHARMM residue topology file) with a zero net charge. This modularity, which also applies to other chemical groups, greatly facilitates building new model compounds. Next, as with pyrrolidine, missing parameters are identified by running (from the main tutorial directory)
cd 3pomp
charmm str:toppar_3pomp1.str -i generate.inp -o generate1.out
, then creating toppar_3pomp2.str with initial guessed for the missing angle parameters, running
charmm str:toppar_3pomp2.str -i generate.inp -o generate2.out
and finally finding initial guesses for the missing dihedral parameters to obtain toppar_3pomp3.str. The relatively low number of missing parameter is due to the fact that most of the "new" parameters resulting from the attachment of the phenoxymethyl group to the 5-membered ring were already present in ethoxybenzene (Figure 5b) and 3-hydroxymethyltetrahydrofuran (Figure 5c), a model compound for the nucleic acid backbone. Most of the remaining missing parameters are in-ring dihedrals resulting from the introduction of a tertiary carbon type in the pyrrolidine ring and thus can directly be transferred from the parent compound pyrrolidine. Additionally, even though the N1-C2-C3-C31 dihedral (using the atom numbering in Figure 5a) involves an exocyclic non-hydrogen atom, it was copied from N1-C2-C3-H31 in pyrrolidine (Figure 1a) in order to avoid distorting the ring's torsional energy profile, which was carefully parametrized in the previous section. For the remainder of the missing parameters, existing parameters with excellent analogy were found.
At this point, it should be noted that with increasing system size, it becomes increasingly difficult to perform extensive parameter validation or optimization, as the computational cost of the QM calculations usually scales with the square of the system size (or worse). Moreover, vibrational analysis is rendered impractical by the extensive mixing that occurs in the vibrational spectrum of larger compounds, and the number of sites at which to perform water interactions as well as the number of dihedrals to scan become large. Therefore, it is not recommended to perform explicit parameter optimization on these larger compounds. Rather, one should focus on the regions of the molecule linking together ring systems, taking advantage of the availability of explicit parameters for a large number of ring systems already in CGenFF. Moreover, parameters for dihedrals associated with freely rotatable bonds often perform poorly when transferred to a different molecule. This is due to the dihedral parameters typically being the final term optimized, thereby accounting for the contribution of long range nonbond interactions to the PES. For example, if the charge on one of the rings on one end of a previously parametrized linker changes, the PES associated with the rotatable bonds in the linker may change, justifying validation of the PES and, if necessary, optimization of the relevant dihedral parameters. In addition, for molecules with three or more ring systems, it is suggested that validation and optimization be performed on smaller compounds that contain two rings connected by a single linker. This assumes that the dihedral parameters for the linkers between the different rings are independent, allowing for the resulting parameters to be combined to treat the full parent molecule.
In the particular case of 3-phenoxymethylpyrrolidine, potential energy scans are performed on the three rotatable dihedrals in the linker (Figure 5a). During the initial, fully relaxed QM scans, the pyrrolidine ring underwent several conformational changes, making comparison to the MM scan points problematic. To avoid this issue, the pyrrolidine ring conformation was constrained both during the QM and the MM scan. The corresponding Gaussian input files are 3pomp_pyscan.gjf, 3pomp_lscan.gjf and 3pomp_phscan.gjf, and the respective outputs are 3pomp_pyscan.log, 3pomp_lscan.log and 3pomp_phscan.log.
cd scan
charmm str:../toppar_3pomp3.str -i 3pomp_allscan.inp -o 3pomp_allscan1.out
../../scanplot -i 1 scan_3pomp.qme scan_3pomp.mme
The initial guess parameters in toppar_3pomp3.str for dihedrals I and II in Figure 5a were respectively copied from 3-hydroxymethyltetrahydrofuran (Figure 5c) and ethoxybenzene (Figure 5b). Since dihedral III involves exactly the same atom types as a dihedral in ethoxybenzene, the PES on this dihedral serves as a validation rather than being aimed at optimizing the associated dihedral parameter. Should the validation indicate that this parameter is not satisfactory, re-optimization may be considered for the purpose of performing simulations on this particular model compound (ie. 3-phenoxymethylpyrrolidine). In this case, the newly optimized parameter should not be added to CGenFF as this would compromise the force field's representation of ethoxybenzene and derivatives.
Pyrrolidine PES using QM and MM initial and optimized parameters
Figure 6: the 3 energy scans on 3-phenoxymethylpyrrolidine (dihedrals I, II and III in Figure 5a), pasted together in one plot. The X axis depicts the number of the scan point, which corresponds to the line number in the concatenated .qne or .mne file. The Y axis is the energy in kcal/mol. The red plot corresponds to the MP2/6-31G(d) data, the green plot reflects the initial guess parameters, and blue stands for the optimized parameters. For the scan on dihedral III, the two MM surfaces are identical.
As can be seen when plotting scan_3pomp.qne and scan_3pomp.mne (red line and green line in Figure 6, respectively), although the MM maxima and minima are at the right locations as compared to the QM PES, their heights show large deviations. After a few iterations of adjusting the dihedral parameters and redoing the MM calculation, toppar_3pomp4.str was obtained.
charmm str:../toppar_3pomp4.str -i 3pomp_allscan.inp -o 3pomp_allscan2.out
../../scanplot -i 2 scan_3pomp.qme scan_3pomp.mme
gnuplot scan_3pomp2b.gpt
The final MM PES (blue line in Figure 6) of dihedral I has improved considerably. The barrier at 120° (point 20 in Figure 6) is significantly too high, but this cannot be improved without sacrificing other parts of the PES. Such compromises are common during parameter optimization, requiring decisions by the user on which aspects of the model to sacrifice. The agreement for dihedral II is excellent, except for a slight, constant offset. This offset is caused by the fact that during most of this potential energy scan, dihedral I was at the local minimum at 180° (points 0 and 24 in Figure 6), the energy of which is slightly overestimated. Dihedral III displays the same constant offset. Although its barriers are overestimated, the discrepancy in barrier height is only 0.6 kcal/mol, which is more than satisfactory for a parameter that was not subject to optimization. It should also be emphasized that when performing PES on multiple dihedrals in a given molecule, the relative energies of each individual dihedral PES should be offset to the global energy minimum for all the PES, thereby assuring that the optimization of the different dihedral parameters satisfactorily reproduces the relative conformational energies of the molecule and not just of the individual dihedrals. This is the reason why in the present tutorial, all PES on the same molecule were plotted on one plot.

As a final note, the attentive reader might have noticed that the optimized parameters for 3-phenoxymethylpyrrolidine are not present in the main CGenFF topology and parameter files. This is because the "OG301 CG321 CG3C51 CG3C52" dihedral (ie. dihedral I in Figure 5a) can be anticipated to occur in a variety of sugar-like molecules, and that the optimized 3-phenoxymethylpyrrolidine parameters in toppar_3pomp4.str are too specialized to cover all these possibilities. For this reason, the parameter for dihedral I was set to a parameter from the CHARMM nucleic acid force field12,13,14 that coincides with the initial guess, and dihedral II was re-optimized in this context, yielding the parameters in toppar_3pomp5.str, which will probably be included in the main topology and parameter files in the near future. Plotting the PES using these generalized parameters is left as an exercise to the reader. Predictably, the result is significantly worse than the PES using the fully optimized parameters, yet significantly better than the PES using the initial guess. This demonstrates how general force fields have to sacrifice accuracy for transferability. The CHARMM general force field gives the user the choice of either applying the general parameters already present in the force field or using the procedure outlined in the present tutorial to generate specialized parameters for their own compounds of interest, at the expense of transferability.



Creation of this tutorial was made possible with financial support from the NSF (CHE-0823198) and the Samuel Waxman Cancer Research Foundation and computational support from NCI's ABCC.



1. K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov, A. D. MacKerell Jr., J. Comput. Chem. 2010, 31, 671-690.
2. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, M. Karplus, J. Comput. Chem. 1983, 4, 187-217.
3. B. R. Brooks, C. L. Brooks III, A. D. MacKerell Jr., L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodošček, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York, M. Karplus, J. Comput. Chem. 2009, 30, 1545-1614.
4. V. M. Rayón, J. A. Sordo, J. Chem. Phys. 2005, 122, 204303.
5. M. Sundaralingam, J. Am. Chem. Soc. 1971, 93(24), 6644-6647.
6. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, Nanayakkara,A. ; M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, J. A. Pople, Gaussian, Inc.: Wallingford, CT, 2004.
7. W. L. Jorgensen, J. Tirado-Rives, J. Am. Chem. Soc. 1988, 110, 1657-1666; A. D. MacKerell Jr., M. Karplus, J. Phys. Chem. 1991, 95, 10559-10560; W. E. Reiher III, Theoretical Studies of Hydrogen Bonding, Ph.D. Thesis, Harvard University, 1985; W. L. Jorgensen, J. Phys. Chem. 1986, 90, 1276-1284.
8. K. Kuczera, J. K. Wiorkiewicz, M. Karplus, CHARMM, Harvard University, 1993.
9. P. Pulay, G. Fogarasi, F. Pang, J. E. Boggs, J. Am. Chem. Soc. 1979, 101, 2550-2560.
10. A. P. Scott, L. Radom, J. Phys. Chem. 1996, 100, 16502-16513.
11. O. Guvench, A. D. MacKerell Jr., J. Mol. Mod. 2008, 14(8), 667-679.
12. A. D. MacKerell Jr., J. Wiórkiewicz-Kuczera, M. Karplus, J. Am. Chem. Soc. 1995, 117, 11946-11975.
13. N. Foloppe, A. D. MacKerell Jr., J. Comput. Chem. 2000, 21, 86-104.
14. A. D. MacKerell Jr., N. K. Banavali, J. Comput. Chem. 2000, 21, 105-120.