import gzip
from pathlib import Path
from ase.build import molecule
import matplotlib.pyplot as plt

from chemparseplot.parse.orca import geomscan

Geometry Scan

Often it is of interest to scan for energies over a particular aspect of molecular geometry. Here we consider the example of scanning over the bond length of the H2 molecule.

System construction

For simplicity, we will just use ase for this:

h2 = molecule("H2")
h2.write("data/h2_base.xyz", comment="\n")

This can be visualized via ase gui h2_base.xyz or similar, to see[1]:

H2 Molecule

Note that for ORCA, we need the “plain” .xyz file, so we need to remove the metadata added by ase, hence the newline comment.

ORCA Input

To run orca, we can use an input file such as:

!OPT UHF def2-SVP
%geom Scan
 # B <atmid1> <atmid2> = init, final, npoints
 # Converted from Angstrom to Bohr
 B 0 1 = 7.5589039543, 0.2116708996, 33
 end
end
*xyzfile 0 1 h2_base.xyz

Execution

Following the best practices for running ORCA, we have[2]:

export PATH=$PATH:/blah/orca_5_0_4_linux_x86-64_openmpi411/
mkdir uhf
cp orca.inp h2_base.xyz uhf
cd uhf
($(which orca) orca.inp 2>&1) | tee scan_uhf

Also because it is annoying to keep blobs of text, and because it is large:

gzip -9 -c scan_uhf > scanuhf.gz
mv scanuhf.gz ../data/

Analysis with chemparseplot

Now we can finally get to the good bit. Rather than painstakingly parsing “by eye” the scan_uhf file, we will simply use chemparseplot:

orcaout = gzip.open(Path("data/scanuhf.gz"), 'rt').read()
act_energy = geomscan.extract_energy_data(orcaout, "Actual")
print(act_energy)
(<Quantity([7.55890395 7.32930292 7.09970189 6.87010086 6.64049982 6.41089879
 6.18129776 5.95169672 5.72209569 5.49249466 5.26289362 5.03329259
 4.80369156 4.57409053 4.34448949 4.11488846 3.88528743 3.65568639
 3.42608536 3.19648433 2.9668833  2.73728226 2.50768123 2.2780802
 2.04847916 1.81887813 1.5892771  1.35967606 1.13007503 0.900474
 0.67087297 0.44127193 0.2116709 ], 'bohr')>, <Quantity([-7.42398620e-01 -7.43499390e-01 -7.44674460e-01 -7.45933440e-01
 -7.47288480e-01 -7.48755240e-01 -7.50354120e-01 -7.52111840e-01
 -7.54063390e-01 -7.56254250e-01 -7.58743070e-01 -7.61604610e-01
 -7.64933080e-01 -7.68845710e-01 -7.73486280e-01 -7.79028130e-01
 -7.85676030e-01 -7.93666670e-01 -8.03268580e-01 -8.14784250e-01
 -8.28557780e-01 -8.44988870e-01 -8.64547310e-01 -8.87779890e-01
 -9.15309070e-01 -9.47797640e-01 -9.85735890e-01 -1.02886903e+00
 -1.07506391e+00 -1.11633350e+00 -1.12403918e+00 -9.84131910e-01
 -7.37660000e-04], 'hartree')>)

Note that there are units attached, which makes subsequent analysis much easier, since pint will ensure that the correct units are always used.

act_energy[0].to('angstrom')
Magnitude
[3.9999997097520588 3.8785000770759783 3.757000444399898
3.6355008117238174 3.5140011737559647 3.392501541079884 3.271001908403804
3.1495022704359514 3.0280026377598706 2.9065030050837906
2.7850033671159378 2.6635037344398573 2.542004101763777
2.4205044690876965 2.2990048311198437 2.1775051984437637
2.056005565767683 1.9345059277998302 1.8130062951237498
1.6915066624476693 1.570007029771589 1.4485073918037363 1.327007759127656
1.2055081264515755 1.0840084884837227 0.9625088558076422
0.8410092231315618 0.7195095851637091 0.5980099524876287
0.47651031981154823 0.3550106871354678 0.2335110491676152
0.11201141649153473]
Unitsangstrom

Plotting

This is simple enough to plot.

fig, ax = plt.subplots(figsize=(4, 4), dpi=100)
ax.scatter(act_energy[0].magnitude, act_energy[1].magnitude)
<matplotlib.collections.PathCollection at 0x7f6e001af230>
../../_images/de8d5cd0b18667ff293b88312d768648bcb5f6ec339bf67fb74aec3095fcc97f.png

However, this is not super satisfying, and since this is a “supported” workflow, we can leverage chemparseplot instead.

# Create an instance of TwoDimPlot
twodim_plot = TwoDimPlot()

# Set the units for the plot
twodim_plot.set_units('angstrom', 'hartree')

# Add data (EnergyPath instances) to the plot
energy_path1 = EnergyPath('Path 1', np.linspace(0, 10, 30) * ureg.angstrom, np.random.uniform(-1, 0, 30) * ureg.hartree)
energy_path2 = EnergyPath('Path 2', np.linspace(1, 9, 25) * ureg.angstrom, np.random.uniform(-0.5, 0.5, 25) * ureg.hartree)
twodim_plot.add_data(energy_path1)
twodim_plot.add_data(energy_path2)

# Display the plot
twodim_plot.show_plot("Energy Paths")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 # Create an instance of TwoDimPlot
----> 2 twodim_plot = TwoDimPlot()
      4 # Set the units for the plot
      5 twodim_plot.set_units('angstrom', 'hartree')

NameError: name 'TwoDimPlot' is not defined
twodim_plot.set_units('bohr', 'electron_volt')
twodim_plot.fig