Generation of Gaussian 09 Input Files for the Computation of 1 H and 13 C NMR Chemical Shifts of Structures from a Spartan’14 Conformational Search CURRENT STATUS:

This protocol describes an approach to preparing a series of Gaussian 09 computational input files for an ensemble of conformers generated in Spartan’14. The resulting input files are necessary for computing optimum geometries, relative conformer energies, and NMR shielding tensors using Gaussian. Using the conformational search feature within Spartan’14, an ensemble of conformational isomers was obtained. To convert the structures into a format that is readable by Gaussian 09, the conformers were first exported to a single “.sdf” file. A Python script was used to (i) read the structural information of each conformer within the “.sdf” file and (ii) write the corresponding atomic coordinates into a series Gaussian 09 input files. This approach decreases the amount of active effort required to compute NMR chemical shifts of a structure that populates an ensemble of conformers.


Introduction
NMR spectroscopy is the most useful tool for determining the structure of an unknown organic molecule. By coupling this approach with other analytical techniques (e.g. mass spectrometry) the structure of an unknown organic molecule can be elucidated. However, molecules of greater complexity continue to be isolated and/or prepared, and their associated analytical data are increasingly convoluted. Consequently, the assigned structures of these newly isolated compounds are sometimes incorrect, which leads to years of misguided effort "chasing molecules that were never there" 1 . Modern computational chemistry software packages (e.g., Spartan 2,3 , Gaussian 09 4 , and Jaguar 5,6 ) have enabled the routine use of density functional theory (DFT) calculations for predicting spectroscopic properties of organic molecules. For example, one of us recently reported a protocol that described an approach using Gaussian 09 to compute NMR data for molecules that adopt conformational isomers 7 . An important, early part of this protocol required the use of the software application, MacroModel 8 (part of the Schrödinger suite) to carry out a stochastic conformational search using the OPLS molecular mechanics force field. For each structure resulting from this conformational search, free energies and NMR shielding tensors were calculated. Using the free energy data, a Boltzmann factor was determined for each conformer, which was, in turn, converted into the relative mole fraction. The computed NMR data are averaged (using the mole fraction of each conformation), referenced, and scaled to generate a set of Boltzmann-weighted average chemical shifts.
Due to the widespread use of Spartan for molecular mechanics calculations, we have prepared an addendum to this protocol that utilizes the structures resulting from a Spartan Conformer Distribution calculation. As discussed in our original protocol, 7 molecules of increasing complexity are often accompanied by many conformational isomers. We have developed a Python script (e.g., "write-g09inputs-sdf.py") that generates two Gaussian 09 input files for each structure resulting from the conformational search. For convenience, we have provided an additional script (e.g., "write-g03inputs-sdf.py") that prepares Gaussian 03 input files. These input files include an "-opt_freq" file for determining the optimal geometry and free energy along with an "-nmr" file for calculating NMR shielding tensor data. The Python script expedites the DFT computations by greatly simplifying the preparation of the Gaussian input files. More specifically, the script extracts structural information from a ".sdf" file generated in Spartan, and the coordinates of each conformation are written into the Gaussian input files. The ".sdf" file type is routinely used for storing molecular information for multiple structures and can be produced by myriad software applications. The script provided in this protocol will be useful for writing Gaussian input files from ".sdf" files prepared in other chemistry software applications.

Software required to carryout Python scripts
• Command-line interface application (Terminal in Mac OS X or Linux; or Command Prompt in Windows) • Python, version 2 or 3 (included with Mac OS X and Linux operating systems) • Python script editor (e.g., IDLE (see "http://www.python.org/download/":http://www.python.org/download/)) • Text editing application (e.g., TextEdit in Mac OS X or Notepad in Windows)

Software requirements for calculations
• This protocol has been written for use with Spartan'14 2,3 ; however, we have tested earlier versions of Spartan (e.g., Spartan'08) and found that they are also compatible with the following Procedure.
• The approach described in the Procedure is amenable to any software application that is capable of performing a conformational search and exporting the family of conformers as a ".sdf" file (e.g., MacroModel 8 and ChemBio3D 9 ).

Hardware requirements for use of Python scripts
• Most standard personal computers built after 2008 are capable of executing the Python scripts included in this protocol. group. This will add an oxygen atom to the cyclohexane ring to give cis-3-methylcyclohexanol. Ensure that the overall structure is cis-3-methylcyclohexanol before continuing. the right so that they display Molecular Mechanics and MMFF. Check the box next to Maximum: and change the conformers examined to "1000". Click Submit and a Save As window will appear. Change the computational filename to "cis-3-methylcyclohexanol", change the directory (i.e. folder) to a location that is convenient for storing the associated computational files, and click Save. Click OK in the window that appears, which indicates that the conformational search has started. This step will export all structures from the conformational search to a single ".sdf" file-"cis-3methylcyclohexanol.sdf"-located in the same directory as the conformational search output file. 8 | Examine the resulting ".sdf" file to ensure that the results of the conformational search were correctly exported (Optional). Open the ".sdf" file in a text-editing application (e.g., TextEdit in Mac OS X or Notepad in Windows) and check that an entry is included for all unique conformations. A unique entry typically begins with the text "Spartan" followed by a series of numbers. Additionally, structures are usually systematically labeled, for example, the first conformation is by default titled "M0001".

| Editing the "write-g09-inputs-sdf.py" Python script to change the memory and number of processors used in Gaussian calculations (optional).
To accommodate different users' needs, we have prepared the "write-g09-inputs-sdf.py" Python script so that it is convenient to change the amount of memory and the number of the processors allocated to the computationally intensive Gaussian 09 jobs. Open the "write-g09-inputs-sdf.py" Python script in IDLE or any other Python script editor. Adjust the amount of memory used in the Gaussian 09 optimization/frequency and NMR jobs by changing the number to the right of "%mem=" on line 86 and 113, respectively. Adjust the number of core processors used in the Gaussian 09 optimization/frequency and NMR jobs by changing the number to the right of "%nproc=" on line 87 and 114, respectively. Save the edited script file in the same directory as the ".sdf" file created in Step 9.

| In a command line interface application (i.e. Terminal for Mac OS X or Linux or Command Prompt
for Windows) navigate to the directory that contains the ".sdf" file, the "write-g09-inputs-sdf.py" Python script and the associated computational files.
12 | Execute the "write-g09-inputs-sdf.py" Python script (or the edited script that may have been created in Step 10 by entering the following command:

> python write-g09-inputs-sdf.py cis-3-methylcyclohexanol.sdf
The script will request the name of the candidate structure by displaying the following prompt: _Enter the name of the candidate structure:_ Enter "cis-3-methylcyclohexanol" as the candidate structure name. Avoid using spaces when entering the name of the candidate structure. If the script executes successfully, the following message will be displayed: The script successfully performed the task of creating Gaussian input files for each unique structure within the cis-3-methylcyclohexanol.sdf file and moved these input files to the cis-3-methylcyclohexanol-gaussian_files directory.
For each unique conformation within the associated ".sdf" file, the script will create two Gaussian input files. The script also creates a new directory labeled "cis-3-methylcyclohexanol-gaussian_files" and moves all of the Gaussian input files into this newly created directory. The Gaussian input files labeled "cis-3-methylcyclohexanol-opt_freq-conf-#.com" are the input files for geometry optimization and frequency calculation. The Gaussian input files labeled "cis-3-methylcyclohexanol-nmr-conf-#.com" are the input files for NMR shielding tensor calculations.

Perform DFT calculations in Gaussian 09 (cf. Procedure in ref. 7) • TIMING 1 h (Step 13)
13 | To obtain the computed NMR data for the candidate structure, consult the Procedure in ref. 7 for instructions on using the resulting input files from Step 12 within Gaussian 09 to calculate (i) DFToptimized conformer geometries, (ii) free energies using the "opt_freq-conf" input files, and (iii) NMR shielding tensors using the "nmr-conf" input files. Additionally, the Procedure in ref. 7 includes Python scripts and detailed instructions for (i) assembling the free energy and NMR shielding tensor data into well-organized spreadsheet files, (ii) referencing and scaling the NMR data, (iii) determining the Boltzmann weighting factors of all conformers, and (iv) applying these weighting factors to generate the set of Boltzmann-weighted chemical shifts for the candidate structure. Details with regard to the choice of computational methodology (e.g., DFT functional and basis set preferences) are discussed in ref. 7. Additionally, the previously reported protocol 7 highlights methods for determining the "best fit" for a candidate structure when comparing experimental spectral data to the computed NMR chemical shifts. Alternative approaches to determining the "best fit" have recently been reported by Goodman 10,11 and Sarotti 12,13 , and more traditional approaches are described in several excellent reviews 14,15 . Timing A novice user can complete the Procedure described above in less than one hour. The time required to complete the molecular mechanics conformational search will increase with molecular complexity.
However, in our experience this increase has not been substantial. Subsequent Gaussian computations will require significantly more computational time to complete, but the amount of active effort by the user is minimized because several steps have been automated with the use of Python scripts. A summary of the time required to complete various steps in the Procedure is shown below.
Steps 1-4: <10 min of active effort; ca. 1-30 minutes to complete the conformational search depending on the structural complexity of the candidate structure.
Steps 5-8: 15 min Steps 9-12: 15 min Step 13: ca. <60 min for the 3-methylcyclohexanols; timing depends on the number of conformational isomers and the structural complexity of the candidate structure.

Anticipated Results
Following successful completion of the steps of the Procedure, six conformations of cis-3methylcyclohexanol will be generated from the conformational search in Spartan, and the structure coordinates for each conformation will be exported to a ".sdf" file. Following execution of the Python script, "write-g09-inputs-sdf.py", the directory "cis-3-methylcyclohexanol-gaussian_files" will be created, which will contain two Gaussian 09 input files for each conformation of the candidate structure. Once submitted to Gaussian 09, the input files having "opt_freq" in their title will instruct Gaussian to perform a geometry optimization and frequency calculation of the included structural coordinates. Additionally, the input files having "nmr" in their title will instruct Gaussian to calculate NMR shielding tensors of the optimized geometry. For reference, we have provided the Spartan conformational search files and the ".sdf" file as "Supplementary Data