Thursday, December 1, 2016

Computational Chemistry Highlight: November issue

The November issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost

Calculation of NMR Spin–Spin CouplingConstants in Strychnine

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily


This work is licensed under a Creative Commons Attribution 4.0

Tuesday, November 1, 2016

Computational Chemistry Highlight: October issue

The October issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Automatic chemical design using a data-driven continuous representation of molecules


Expanding DP4: application to drug compounds and automation

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily


This work is licensed under a Creative Commons Attribution 4.0

Saturday, October 1, 2016

Computational Chemistry Highlight: September issue

The September issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Multiscale Quantum Mechanics/Molecular Mechanics Simulations with Neural Networks




Friday, September 2, 2016

Computational Chemistry Highlight: August issue

The August issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler, and Jan Jensen:

Ab Initio Calculation of Rate Constants for Molecule–Surface Reactions with Chemical Accuracy


A Total Synthesis of Paeoveitol

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily


This work is licensed under a Creative Commons Attribution 4.0

Monday, August 1, 2016

Computational Chemistry Highlight: July issue

The July issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Diverse Optimal Molecular Libraries for Organic Light-Emitting Diodes



Friday, July 1, 2016

Computational Chemistry Highlight: June issue

The June issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler, and and Jan Jensen:

Systematic Error Estimation for Chemical Reaction Energies

Different approaches to creating (meta-GGA) DFT functionals

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily


This work is licensed under a Creative Commons Attribution 4.0

Wednesday, June 1, 2016

Computational Chemistry Highlight: May issue

The May issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Tobias Schwabe, and and Jan Jensen:

Navigating molecular space for reaction mechanisms: an efficient, automated procedure



Wednesday, May 25, 2016

Automating calculations: pKa predictions

I want to compute the standard free energy difference for proton exchange between heliotridane and a molecule which I'll call compound 1 (comp1)

heliotridaneH+ + comp1 <=> heliotridane + comp1H+

To do this I need to geometry optimize in the gas phase and then do a single point calculation using a solvation model, all using GAMESS. I want to automate this so I can do it for comp2, comp3, ...

To do this I use bash and the python programs listed below

Step 1 make a file file called smiles.list which contains the name and SMILES strings, which defines both the molecule, the protonation state, and the conformation around the protonated N. I still need to automate this step.

Step 2  make so-called header files for the geometry optimization (pm6_opt.header) and solvation energy (pm6_sol.header) calculations.  These files contain the relevant keywords for the calculations.

Step 3 convert the SMILES strings to 3D coordinates (xyz files).

python smiles2coord.py smiles.list

This also creates som sdf files which I can now remove (rm *.sdf).


Step 4 convert the xyz files to GAMESS input files for geometry optimization
When naming the molecules in smiles.list I was carefull to label all the protonated forms with a "+".

for x in *.xyz; do babel -ixyz $x -ogamin ${x%.*}_opt.inp -xf pm6_opt.header; done

I could have modified smiles2coord_lis.py to generate these files directly, but I like to keep things modular.  At this point I can also get rid of the xyz files (xyz files).

Step 5 change the charge for the protonated forms.

sed -i 's/icharg=0/icharg=1/g' *+*.inp


Step 6 submit the GAMESS calculations.

for x in *_opt.inp; do submit_gamess $x; done

My GAMESS submit script is called "submit_gamess"

Step 7 wait til all the optimizations are done and that they all converged.

grep EQUILIBRIUM *.log

Step 8 transfer the optimized geometry to a new input file for the solvation energy calculations

for x in *_opt.log; do babel -igamout $x -ogamin ${x%.*}_sol.inp -xf pm6_sol.header; done

Step 9 change the charge for the protonated forms.

sed -i 's/icharg=0/icharg=1/g' *+*_sol.inp

Step 10 submit the GAMESS calculations.

for x in *_sol.inp; do submit_gamess $x; done

Step 11 wait til all the optimizations are done and that the calculations finished normally.

grep NORMALLY *_sol.log

Step 12 extract the last heat of formation in each file and put them in a file called energies

python readv2.py *._sol.log > energies

Step 13 find the lowest energy for each conformer and compute the relative energies and pKa values
python pka.py energies

Files referred to above





This work is licensed under a Creative Commons Attribution 4.0

Thursday, May 5, 2016

A brief introduction to SMILES and InChI



SMILES and InChI are two notations that that can be used to compactly describe molecules.  As molecules get complicated SMILES and InChI are are great alternative to a chemical names but there are many other uses (see the end of this post for an example).  Here is a brief intro to a few of the basics.

Ethane
SMILES: CC
InChI: InChI=1S/C2H6/c1-2/h1-2H3

SMILES is pretty straight forward: ethane is two C atoms singly bonded to each other.  Two atoms next to each other imply a single bond and programs that read SMILES strings are smart enough to know that each C atom needs 3 H atoms to fill the valence shell.

InChI requires a little more explanation: "1S" is version 1 of Standard InChi (more about that later). C2H6 is the empirical formula and tells you that the first two atoms are C atoms. c1-2 means that atom 1 and 2 are connected. h1-2H3 means that atom 1-2 each have 3 H atoms. Programs that read SMILES strings are smart enough to know that this means that the two C atoms are connected by a single bond.

So, SMILES defines the bond types from which one can infer protonation, while InChI defines protonation from which one can infer the bond types (key differences indicated in bold):

Ethene and ethyne
SMILES: C=C and C#C
InChI: InChI=1S/C2H4/c1-2/h1-2H2 and InChI=1S/C2H2/c1-2/h1-2H

SMILES: double and triple bonds are indicated by "=" and "#", respectively.
InChI: "CH2" and "CH" are indicated by "H2" and "H", respectively.

Ethanol and dimethyl ether
SMILES: CCO and COC
InChI: InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 and  InChI=1S/C2H6O/c1-3-2/h1-2H3

SMILES: pretty self explanatory I think (if not leave a comment).
InChI: "C2H6O" means that the first and second atoms (1 and 2) are C atoms and the third (3) is an O atom.  The connectivity is 1-2-3 for ethanol and 1-3-2 for dimethyl ether.  For ethanol atom 3 has 1 H atom, atom 2 has 2 H atoms, and atom 1 has 3 H atoms.  For dimethyl ether atom 1-2 have 3 H atoms, while atom 3 has none.

Note that InChI keeps the atom ordering the same in the two molecules while SMILES changes it, i.e. the oxygen is atom number 3 for both ethanol and dimethyl ether defined by InChI, while it is atom number 3 and 2, respectively, when defined by SMILES.

Benzene and toluene
SMILES: c1ccccc1 and Cc1ccccc1
InChI: InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H and InChI=1S/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3

SMILES: in the case of benzene atom 1 is connected to both atom 2 and atom 6, i.e.  a ring is formed. "1" is a label and does not refer to atom number 1 (see toluene). A lower case "c" is used to indicate aromatic carbons, meaning they should be singly protonated.  For toluene, the methyl group is bonded to atom number 2, which is also bonded to atom number 7.

InChI: In the case of benzene aromaticity is inferred from the fact that all 6 carbon have 1 H atom (h1-6H).  For toluene, the methyl group is bonded to atom number 7, which is also bonded to atom number 6.

Ethylamine and ethylammonium
SMILES: CCN and CC[NH3+]
InChI: InChI=1S/C2H7N/c1-2-3/h2-3H2,1H3 and InChI=1S/C2H7N/c1-2-3/h2-3H2,1H3/p+1

SMILES: pretty self explanatory I think (if not leave a comment).
InChI: The protonation (h2-3H2,1H3) is identical in both cases an corresponds to ethylamine. For ethylammonium "p+1" indicates that an extra proton is added, but it doesn't specify where (more on that later).

Acetic acid and acetate
SMILES: CC(=O)O and CC(=O)[O-]
InChI: InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4) and InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)/p-1

SMILES: pretty self explanatory I think (if not leave a comment).
InChI: In the case of acetic acid InChi recognises that there are two "tautomers", i.e. that both oxygens can have the hydrogen atom.  In this particular case, there is free rotation about the CC bond so the oxygens are equivalent and we don't really think about acetic acid having tautomers, but see next case.

Formamide
SMILES: C(=O)N
InChI: InChI=1S/CH3NO/c2-1-3/h1H,(H2,2,3)

Here SMILES and InChI start to differ a bit in the chemistry.  InChI recognises that formamide can exist in two tautomeric states, HC(=O)NH2 and HC(OH)=NH, while SMILES only specifies the dominant one, HC(=O)NH2, by default.

Both SMILES and InChI can be used to specify particular tautomers.  For SMILES the other tautomer is "C(OH)=N"while for InChI it is InChI=1/CH3NO/c2-1-3/h1H,(H2,2,3)/f/h2H2 and InChI=1/CH3NO/c2-1-3/h1H,(H2,2,3)/f/h2,3H.  Notice that the S is missing in InChI=1 because the use of a fixed H layer (indicated by /f) is a "non-standard" InChI.

cis- and trans-2-butene
SMILES: C/C=C\C and C/C=C/C
InCHhI: InChI=1S/C4H8/c1-3-4-2/h3-4H,1-2H3/b4-3- and InChI=1S/C4H8/c1-3-4-2/h3-4H,1-2H3/b4-3+

L- and D-alanine
SMILES: C[C@@H](C(=O)O)N and C[C@H](C(=O)O)N
InChI: InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1 and InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m1/s1

Building molecules using SMILES and InChI
One use of SMILES and InChI is to quickly and automatically build molecules. 

Furthermore, it is not too difficult to write scripts that generate SMILES strings based on some template.  For example, you can define a template c1ccc(c(c1)X)Y and substituents X, Y = O, F, and N and generate all possible combinations using a bit of python.  The resulting list of SMILES strings can then be converted to coordinates using Cactus.  Imagine the tedium of building all these molecules in a builder.

Learning more about SMILES and InChI
Theres a lot more to SMILES and InChI than covered in this post and plenty more info online (e.g. here for SMILES and InChI).  However, personally I learned most from generating a bunch of examples using this cactus site, which can convert chemical names to SMILES or InChI as well as SMILES and InChI to a 3D structure (pick TwirlyMol in the menu).

Related posts
A simple script to get molecular coordinates from a chemical name using Open Babel
Automating calculations: pKa predictions


This work is licensed under a Creative Commons Attribution 4.0

Saturday, April 16, 2016

A simple script to get molecular coordinates from a chemical name using Open Babel


Here is a simple cshell script to get coordinates from a chemical name using Open Babel.

Acknowledgements (Jimmy's "friends" please read carefully): I learned about curl from Jimmy who didn't contribute to the code in any way (and hates cshell anyway).


This work is licensed under a Creative Commons Attribution 4.0

Tuesday, March 1, 2016

Computational Chemistry Highlights: February issue

The February issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Alán Aspuru-Guzik, and Jan Jensen:

From C60 to Infinity: Large-Scale Quantum Chemistry Calculations of the Heats of Formation of Higher Fullerenes




Saturday, February 20, 2016

Making your computational chemistry data available as supplementary material

It is fairly normal (and good practise!) to make atomic coordinates available as supplementary material when submitting a paper.  From what I can tell this is typically done by copying the coordinates into a text editor and making a pdf file.  This can be a tedious, time consuming, and error prone process and the irony is that the intended user will have to reverse the process to actually use the data. There is a better way.

Use an online digital repository
tl;dr: upload your files to an online digital repository, such as Figshare, and simply provide the link to the data in the supplementary material.

To use Figshare you simply make an account, click "Create a new project", and choose "File set". Then you can upload the files you want to share and describe what you are sharing.  Once you have everything the way you want it, you can make it public (you even get a DOI). Here I describe what I did for my latest paper (supplementary material on FigShare here)

Sharing coordinates
Most people will only be interested in the coordinates so I decided to make them available in xyz format, since this format can be read by almost any molecular visualization program.  So I copied the output files that contained the coordinates I wanted to share into a single folder and used OpenBabel to extract the coordinates and convert them into xyz format.

For this I used a bash command.  If you use another shell you can temporarily switch to bash by typing /bin/bash (to get out of bash later type exit).  First make a new folder called "coordinates". Then convert Gaussian log files to xyz files and place them in the new folder (all one line)

for i in *.out; do babel -ig09 "$i" -oxyz coordinates/"${i/.out}".xyz; done

If you want you can add "opt" to the xyz files to indicate that they are optimized by (all one line)

for i in *.out; do babel -ig09 "$i" -oxyz coordinates/"${i/.out}"opt.xyz; done

When doing the calculations I used a different naming scheme for the systems so I added a text file called README to the folder where I describe the change. Then I created a zip file of the folder

zip -r coordinates_for_supmat.zip coordinates

which I then uploaded to FigShare and made public to get the DOI.

Sharing everything else
One or two people may be interested in more than coordinates.  So I also make a zip file of the entire project folder and upload that to the same FigShare fileset.  Actually, calculations were done by three different people so I made zip files for each of their project folders separately.  This only takes a few minutes not counting compression and upload time.  These folders also contain calculations that did not make it into the manuscript but I am not willing to spend much time weeding them out because it is quite likely that no one will ever look at it.  Anyway, if there are questions they can always ask me. I also used a Google sheet for the data analysis (don't judge me!) so I included a link to that on FigShare.  If you used Excel you could just upload that file.

Sharing Gaussian output
Some of the calculations were done with Gaussian09, so I checked with Gaussian about whether they allow sharing output files.  They wrote that "yes, you may include the relevant parts of the output in the manuscript. We just ask that you not include any timing information".  I had a look at the output files and concluded that if I delete lines with the word "cpu" I should be OK.

You can do this for one file (here called "file.out") by

grep -v "cpu" file.out > temp && mv temp file.out

and all the files in a project (here located in folder "project") using bash (all one line)

for i in project/*.out project/**/*.out project/**/**/*.out; do grep -v "cpu" "$i" > temp && mv temp  "$i"; done

This  command goes three folders deep.  If you have more layers add "project/**/.../**/*.out" as needed.  There are also many other ways of going this.  Just to be safe I also deleted lines containing "terminated" because they contain a time and date stamp.



This work is licensed under a Creative Commons Attribution 4.0

Monday, February 8, 2016

DFTB3 in GAMESS

I recently did my first DFTB3 calculations in GAMESS.  The process wasn't completely obvious, hence this post.

First I got the parameter files from dftb.org.  You have to fill in and email a registration form to get a login so that might take a few days.  I downloaded the 3ob-3-1 parameter set.

The DFTB3-related input can be found below.  My molecule contains C, N, O, and H atoms.  If there are additional atoms then you need to define additional pairs.  It is OK to specify atom pairs for atoms that are not in the molecule, so the easiest might be to define all relevant pairs once and for all and use it in all input files.

The value for dampex and hubder are found in the README file that is located in the 3ob-3-1 folder I downloaded.  hubder defines parameters for N, C, O, and H - in that order.  It was not clear from the manual what the order should be.  I found the correct order by running the input file without hubder. This run prints out the default hubder values in the order  N, C, O, and H and it turns out that is the order you need to use in the input file.

2016.02.09 Update: Yoshio Nishimoto has told me that the order of the hubder parameters is determined by the order in which they appear in $DATA.  Also using MODIO=31 in $SYSTEM will speed up the calculation.

 $basis gbasis=dftb $END
 $dftb ndftb=3 DAMPXH=.t. dampex=4.0
       hubder(1)=-0.1535,-0.1492,-0.1575,-0.1857 $end
 $dftbsk
 C C "/Applications/gamess/3ob-3-1/C-C.skf"
 C H "/Applications/gamess/3ob-3-1/C-H.skf"
 C N "/Applications/gamess/3ob-3-1/C-N.skf"
 C O "/Applications/gamess/3ob-3-1/C-O.skf"
 H C "/Applications/gamess/3ob-3-1/H-C.skf"
 H H "/Applications/gamess/3ob-3-1/H-H.skf"
 H N "/Applications/gamess/3ob-3-1/H-N.skf"
 H O "/Applications/gamess/3ob-3-1/H-O.skf"
 N C "/Applications/gamess/3ob-3-1/N-C.skf"
 N H "/Applications/gamess/3ob-3-1/N-H.skf"
 N N "/Applications/gamess/3ob-3-1/N-N.skf"
 N O "/Applications/gamess/3ob-3-1/N-O.skf"
 O C "/Applications/gamess/3ob-3-1/O-C.skf"
 O H "/Applications/gamess/3ob-3-1/O-H.skf"
 O N "/Applications/gamess/3ob-3-1/O-N.skf"
 O O "/Applications/gamess/3ob-3-1/O-O.skf"
 $end

I would like to thank +Anders Steen Christensen for his help in figuring this out.


This work is licensed under a Creative Commons Attribution 4.0

Friday, January 1, 2016

Computational Chemistry Highlights: December issue

The December issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler, Grant Hill and Jan Jensen:

Scalable Quantum Simulation of Molecular Energies






Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily


This work is licensed under a Creative Commons Attribution 4.0