import gzip
from pathlib import Path
from ase.build import molecule
import matplotlib.pyplot as plt
from chemparseplot.parse.orca import geomscan
Geometry Scan¶
Often it is of interest to scan for energies over a particular aspect of molecular geometry. Here we consider the example of scanning over the bond length of the H2 molecule.
System construction¶
For simplicity, we will just use ase
for this:
h2 = molecule("H2")
h2.write("data/h2_base.xyz", comment="\n")
This can be visualized via ase gui h2_base.xyz
or similar, to see[1]:
Note that for ORCA
, we need the “plain” .xyz
file, so we need to remove the
metadata added by ase
, hence the newline comment.
ORCA Input¶
To run orca
, we can use an input file such as:
!OPT UHF def2-SVP
%geom Scan
# B <atmid1> <atmid2> = init, final, npoints
# Converted from Angstrom to Bohr
B 0 1 = 7.5589039543, 0.2116708996, 33
end
end
*xyzfile 0 1 h2_base.xyz
Execution¶
Following the best practices for running ORCA, we have[2]:
export PATH=$PATH:/blah/orca_5_0_4_linux_x86-64_openmpi411/
mkdir uhf
cp orca.inp h2_base.xyz uhf
cd uhf
($(which orca) orca.inp 2>&1) | tee scan_uhf
Also because it is annoying to keep blobs of text, and because it is large:
gzip -9 -c scan_uhf > scanuhf.gz
mv scanuhf.gz ../data/
Analysis with chemparseplot
¶
Now we can finally get to the good bit. Rather than painstakingly parsing “by
eye” the scan_uhf
file, we will simply use chemparseplot
:
orcaout = gzip.open(Path("data/scanuhf.gz"), 'rt').read()
act_energy = geomscan.extract_energy_data(orcaout, "Actual")
print(act_energy)
(<Quantity([7.55890395 7.32930292 7.09970189 6.87010086 6.64049982 6.41089879
6.18129776 5.95169672 5.72209569 5.49249466 5.26289362 5.03329259
4.80369156 4.57409053 4.34448949 4.11488846 3.88528743 3.65568639
3.42608536 3.19648433 2.9668833 2.73728226 2.50768123 2.2780802
2.04847916 1.81887813 1.5892771 1.35967606 1.13007503 0.900474
0.67087297 0.44127193 0.2116709 ], 'bohr')>, <Quantity([-7.42398620e-01 -7.43499390e-01 -7.44674460e-01 -7.45933440e-01
-7.47288480e-01 -7.48755240e-01 -7.50354120e-01 -7.52111840e-01
-7.54063390e-01 -7.56254250e-01 -7.58743070e-01 -7.61604610e-01
-7.64933080e-01 -7.68845710e-01 -7.73486280e-01 -7.79028130e-01
-7.85676030e-01 -7.93666670e-01 -8.03268580e-01 -8.14784250e-01
-8.28557780e-01 -8.44988870e-01 -8.64547310e-01 -8.87779890e-01
-9.15309070e-01 -9.47797640e-01 -9.85735890e-01 -1.02886903e+00
-1.07506391e+00 -1.11633350e+00 -1.12403918e+00 -9.84131910e-01
-7.37660000e-04], 'hartree')>)
Note that there are units attached, which makes subsequent analysis much easier,
since pint
will ensure that the correct units are always used.
act_energy[0].to('angstrom')
Magnitude | [3.9999997097520588 3.8785000770759783 3.757000444399898 |
---|---|
Units | angstrom |
Plotting¶
This is simple enough to plot.
fig, ax = plt.subplots(figsize=(4, 4), dpi=100)
ax.scatter(act_energy[0].magnitude, act_energy[1].magnitude)
<matplotlib.collections.PathCollection at 0x7f6e001af230>
However, this is not super satisfying, and since this is a “supported” workflow, we can leverage chemparseplot
instead.
# Create an instance of TwoDimPlot
twodim_plot = TwoDimPlot()
# Set the units for the plot
twodim_plot.set_units('angstrom', 'hartree')
# Add data (EnergyPath instances) to the plot
energy_path1 = EnergyPath('Path 1', np.linspace(0, 10, 30) * ureg.angstrom, np.random.uniform(-1, 0, 30) * ureg.hartree)
energy_path2 = EnergyPath('Path 2', np.linspace(1, 9, 25) * ureg.angstrom, np.random.uniform(-0.5, 0.5, 25) * ureg.hartree)
twodim_plot.add_data(energy_path1)
twodim_plot.add_data(energy_path2)
# Display the plot
twodim_plot.show_plot("Energy Paths")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 2
1 # Create an instance of TwoDimPlot
----> 2 twodim_plot = TwoDimPlot()
4 # Set the units for the plot
5 twodim_plot.set_units('angstrom', 'hartree')
NameError: name 'TwoDimPlot' is not defined
twodim_plot.set_units('bohr', 'electron_volt')
twodim_plot.fig