AFQMC Tutorials¶

Below we will run through some full AFQMC workflow examples. The necessary scripts can be found in the qmcpack/examples/afqmc directory.

Example 1: Neon atom¶

In this example we will go through the basic steps necessary to generate AFQMC input from a pyscf scf calculation on a simple closed shell molecule (neon/aug-cc-pvdz).

The pyscf scf script is given below (scf.py in the current directory):

from pyscf import gto, scf, cc
from pyscf.cc import ccsd_t
import h5py

mol = gto.Mole()
mol.basis = 'aug-cc-pvdz'
mol.atom = (('Ne', 0,0,0),)
mol.verbose = 4
mol.build()

mf = scf.RHF(mol)
mf.chkfile = 'scf.chk'
ehf = mf.kernel()

ccsd = cc.CCSD(mf)
ecorr_ccsd = ccsd.kernel()[0]
ecorr_ccsdt = ccsd_t.kernel(ccsd, ccsd.ao2mo())
print("E(CCSD(T)) = {}".format(ehf+ecorr_ccsd+ecorr_ccsdt))

The most important point above is that we create a scf checkpoint file by specifying the mf.chkfile mol member variable. Note we will also compute the CCSD and CCSD(T) energies for comparison puposes since this system is trivially small.

We next run the pyscf calculation using

python scf.py > scf.out

which will yield a converged restricted Hartree–Fock total energy of -128.496349730541 Ha, a CCSD value of -128.7084878405062 Ha, and a CCSD(T) value of -128.711294157 Ha.

The next step is to generate the necessary qmcpack input from this scf calculation. To this we do (assuming afqmctools is in your PYTHONPATH):

/path/to/qmcpack/utils/afqmctools/bin/pyscf_to_afqmc.py -i scf.chk -o afqmc.h5 -t 1e-5 -v

which will peform the necessary AO to MO transformation of the one and two electron integrals and perform a modified cholesky transormation of the two electron integrals. A full explanation of the various options available for pyscf_to_afqmc.py you can do

pyscf_to_afqmc.py -h

In the above example, -i designates the input pyscf checkpoint file, -o speficies the output filename to write the qmcpack hamiltonian/wavefunction to, -t specifies the convergence threshold for the Cholesky decomposition, -v increases verbosity. You can optionally pass the -q/–qmcpack-input to generate a qmcpack input file which is based on the hamiltonian and wavefunction generated. Greater control over input file generation can be achieved using the write_xml_input function provided with afqmctools. Run gen_input.py after the integrals/wavefunction have been generated to generate the input file afqmc.xml.

Running the above will generate one file: afqmc.h5. The plain text wavefunction files are deprecated and will be removed in later releases. The qmcpack input file afqmc.xml is a skeleton input file, meaning that it’s created from the information in hamil.h5 and is meant as a convenience, not as a guarantee that the convergeable parameters (timestep, walker number, bias bound etc. are converged or appropriate).

We will next run through the relevant sections of the input file afqmc.xml below:

<project id="qmc" series="0"/>
<random seed="7"/>

<AFQMCInfo name="info0">
    <parameter name="NMO">23</parameter>
    <parameter name="NAEA">5</parameter>
    <parameter name="NAEB">5</parameter>
</AFQMCInfo>

We first specify how to name the output file. We also have fixed the random number seed so that the results of this tutorial can be reproducible (if run on the same number of cores).

Next comes the system description, which is mostly a sanity check, as these parameters will be read from the hamiltonian file. They specify the number of single-particle orbitals in the basis set (NMO) and the number of alpha (NAEA) and beta (NAEB) electrons respectively.

Next we specify the Hamiltonian and wavefunction to use:

<Hamiltonian name="ham0" info="info0">
  <parameter name="filetype">hdf5</parameter>
  <parameter name="filename">afqmc.h5</parameter>
</Hamiltonian>

<Wavefunction name="wfn0" type="NOMSD" info="info0">
  <parameter name="filetype">hdf5</parameter>
  <parameter name="filename">afqmc.h5</parameter>
</Wavefunction>

The above should be enough for most calculations. A NOMSD (non-orthogonal multi-Slater determinant) wavefunction allows for a generalised wavefunction input in the form of a single (or multiple) matrix (matrices) of molecular orbital coefficients for the RHF calculation we perform here.

We next set the walker options:

<WalkerSet name="wset0" type="shared">
  <parameter name="walker_type">CLOSED</parameter>
</WalkerSet>

The important point here is that as we are using a RHF trial wavefunction we must specify that the walker_type is CLOSED. For a UHF trial wavefunction one would set this to COLLINEAR.

And now the propagator options:

<Propagator name="prop0" info="info0">
  <parameter name="hybrid">yes</parameter>
</Propagator>

In the above we specify that we will be using the hybrid approach for updating the walker weights. If you wish to use the local energy approximation you should set this flag to false.

Finally comes the execute block which controls how the simulation is run:

 <execute wset="wset0" ham="ham0" wfn="wfn0" prop="prop0" info="info0">
   <parameter name="ncores">1</parameter>
   <parameter name="timestep">0.01</parameter>
   <parameter name="nWalkers">10</parameter>
   <parameter name="blocks">100</parameter>
   <parameter name="steps">10</parameter>
</execute>

The time step (timestep), number of Monte Carlo samples (blocks`*`steps), and number of walkers (nWalkers) should be adjusted as appropriate. Note that nWalkers sets the number of walkers per ncores. For example, if we wanted to use 100 walkers we could run the above input file on 10 cores. If the problem size is very large we may want distribute the workload over more cores per walker, say 10. In this case we would require 100 cores to maintain the same number of walkers. Typically in this case you want to specify fewer walkers per core anyway.

We can now run the qmcpack simulation:

qmcpack afqmc.xml > qmcpack.out

Assuming the calculation finishes successfully, the very first thing you should do is check the information in qmcpack.out to see confirm no warnings were raised. The second thing you should check is that the energy of the starting determinant matches the Hartree–Fock energy you computed earlier from pyscf to within roughly the error threshold you specified when generating the Cholesky decomposition. This check is not very meaningful if using, say, DFT orbitals. However if this energy is crazy it’s a good sign something went wrong with either the wavefunction or integral generation. Next you should inspect the qmc.scalar.s000.dat file which contains the mixed estimates for various quantities. This can be plotted using gnuplot. EnergyEstim__nume_real contains the block averaged values for the local energy, which should be the 7th column.

Assuming everything worked correctly we need to analyse the afqmc output using:

/path/to/qmcpack/nexus/bin/qmca -e num_skip -q el qmc.s000.scalar.dat

where num_skip is the number of blocks to skip for the equilibration stage. For a practical calculation you may want to use more walkers and run for longer to get meaningful statistics.

See the options for qmca for further information. Essentially we discarded the first 100 blocks as equilibaration and only computed the mixed estimate for the local energy internally called EnergyEstim__nume_real, which can be specified with -q el. We see that the ph-AFQMC energy agrees well with the CCSD(T) value. However, we probably did not run the simulation for long enough to really trust the error bars.

Example 2: Frozen Core¶

In this example we show how to perform a frozen core calculation, which only affects the integral generation step. We will use the the previous Neon example and freeze 2 core electrons. The following only currently works for RHF/ROHF trial wavefunctions.

mpirun -n 1 /path/to/qmcpack/utils/afqmctools/bin/pyscf_to_afqmc.py -i scf.chk -o afqmc.h5 -t 1e-5 -v -c 8,22

Again, run gen_input.py to generate the input file afqmc.xml.

Comparing the above to the previous example we see that we have added the -c or –cas option followed by a comma separated list of the form N,M defining a (N,M) CAS space containing 8 electrons in 22 spatial orbitals (freezing the lowest MO).

The rest of the qmcpack process follows as before.

Example 3: UHF Trial¶

In this example we show how to use a unrestricted Hartree–Fock (UHF) style wavefunction to find the ph-AFQMC (triplet) ground state energy of the carbon atom (cc-pvtz). Again we first run the scf (scf.py) calculation followed by the integral generation script:

mpirun -n 1 /path/to/qmcpack/utils/afqmctools/bin/pyscf_to_afqmc.py -i scf.chk -o afqmc.h5 -t 1e-5 -v -a

Note the new flag -a/–ao which tells the script to transform the integrals to an orthogonalised atomic orbital basis, rather that the more typical MO basis. This is necessary as qmcpack does not support spin dependent two electron integrals.

Running qmcpack as before should yield a mixed estimate for the energy of roughly: -37.78471 +/- 0.00014.

Example 4: NOMSD Trial¶

In this example we will show how to format trial different wavefunctions in such a way that qmcpack can read them.

Rather than use the pyscf_to_afqmc.py, script we will break up the process to allow for more flexibility and show what is going on under the hood.

The qmcpack input can be generated with the scf.py script. See the comments in scf.py for a breakdown of the steps involved.

Currently QMCPACK can deal with trial wavefunctions in two forms: Non-orthogonal multi slater determinant trial wavefunctions (NOMSD) and particle-hole style trial wavefunctions (PHMSD). The NOMSD trial wavefunctions are the most general form and expect Slater determinants in the form of M X N matrices of molecular orbital coefficients, where N is the number of electrons and M is the number of orbitals, along with a list of ci coefficients. Importantly the Slater determinants must be non-orthogonal.

Example 5: CASSCF Trial¶

In this example we will show how to format a casscf trial wavefunction.

Rather than use the pyscf_to_afqmc.py, script we will break up the process to allow for more flexibility and show what is going on under the hood.

The qmcpack input can be generated with the scf.py script followed by gen_input.py.

See the relevant code below for a breakdown of the steps involved.

The first step is to run a CASSCF calculation. Here we’ll consider N:sub:2. This replicates the calculations from Al-Saidi et al J. Chem. Phys. 127, 144101 (2007). They find a CASSCF energy of -108.916484 Ha, and a ph-AFQMC energy of -109.1975(6) Ha with a 97 determinant CASSCF trial.

mol = gto.M(atom=[['N', (0,0,0)], ['N', (0,0,3.0)]],
            basis='cc-pvdz',
            unit='Bohr')
nalpha, nbeta = mol.nelec
rhf = scf.RHF(mol)
rhf.chkfile = 'scf.chk'
rhf.kernel()

M = 12
N = 6
nmo = rhf.mo_coeff.shape[-1]
mc = mcscf.CASSCF(rhf, M, N)
mc.chkfile = 'scf.chk'
mc.kernel()

Next we unpack the wavefunction

nalpha = 3
nbeta = 3
ci, occa, occb = zip(*fci.addons.large_ci(mc.ci, M, (nalpha,nbeta),
                     tol=tol, return_strs=False))

and sort the determinants by the magnitude of their weight:

ixs = numpy.argsort(numpy.abs(coeff))[::-1]
coeff = coeff[ixs]
occa = numpy.array(occa)[ixs]
occb = numpy.array(occb)[ixs]

Next we reinsert the frozen core as the AFQMC simulation is not run using an active space:

core = [i for i in range(mc.ncore)]
occa = [numpy.array(core + [o + mc.ncore for o in oa]) for oa in occa]
occb = [numpy.array(core + [o + mc.ncore for o in ob]) for ob in occb]

Next we need to generate the one- and two-electron integrals. Note that we need to use the CASSCF MO coefficients to rotate the integrals.

scf_data = load_from_pyscf_chk_mol('scf.chk', 'mcscf')
write_hamil_mol(scf_data, 'afqmc.h5', 1e-5, verbose=True)

Finally we can write the wavefunction to the QMCPACK format:

ci = numpy.array(ci, dtype=numpy.complex128)
uhf = True # UHF always true for CI expansions.
write_qmcpack_wfn('afqmc.h5', (ci, occa, occb), uhf, mol.nelec, nmo)

To generate the input file we again run gen_input.py. Note the rediag option which is necessary if the CI code used uses a different convention for ordering creation and annihilation operations when defining determinant strings.

Example 6: Back Propagation¶

Note

matplotlib is required to generate the figure in this example.

The basic estimators printed out in the qmcpack *.scalar.dat files are mixed estimates. Unless the operator for which the mixed estimate is computed commutes with the Hamiltonian this result will generally be biased. To obtain pure estimates we can use back propagation as outlined in: Motta & Zhang, JCTC 13, 5367 (2017). For this example we will look at computing the one-body energy of a methane molecule (see Fig. 2 of M&Z).

As before run scf.py and generate the integrals using pyscf_to_afmqc.py:

mpirun -n 1 /path/to/qmcpack/utils/afqmctools/bin/pyscf_to_afqmc.py -i scf.chk -o afqmc.h5 -t 1e-5 -v

Note we are working in the MO basis. The input file is generated using gen_input.py and comparing to the previous examples we can now see the estimator block:

<Estimator name="back_propagation">
    <parameter name="naverages">4</parameter>
    <parameter name="block_size">2</parameter>
    <parameter name="ortho">1</parameter>
    <OneRDM />
    <parameter name="nsteps">200</parameter>
</Estimator>

Which will tell QMCPACK to compute the back propagated one-rdm. In the above we set block_size to be 2 meaning that we average the back propagated estimates into bins of length 2 in this case. This helps reduce the size of the hdf5 files. We also specify the option nsteps: We see that it is set to 200, meaning that we will back propagated the bra wavefunction in the estimator by 200*.01 = 2 a.u., where the timestep has been set to 0.01 a.u. Finally naverages allows us to split the full path into naverages chunks, so we will have averaged data at \(\tau_{BP}=[0.5, 1.0, 1.5, 2.0]\) au. This allows us to monitor the convergence of the estimator with back propagation time.

Running QMCPACK as before we will notice that in addition to the qmc.s000.scalar.dat file we have generated a new file qmc.s000.scalar.h5. This file will contain the back propagated estimates, which, for the time being, means the back propagated one-particle reduced density matrix (1RDM), given as

\[P^{\sigma}_{ij} = \langle c_{i\sigma}^{\dagger} c_{j\sigma} \rangle\]

Before we analyse the output we should question why we chose a back propagation time of 2 au. The back propagation time represents yet another parameter which must be carefully converged.

In this example we will show how this is done. In this directory you will find a script check_h1e_conv.py which shows how to use various helper scripts provided in afqmctools/analysis/average.py. The most of important of which are:

from afqmctools.analysis.extraction import get_metadata
metadata = get_metadata(filename)

which returns a dict containing the RDM metadata,

from afqmctools.analysis.average import average_one_rdm
rdm_av, rdm_errs = average_one_rdm(f, name='back_propagated', eqlb=3, ix=2)

which computes the average of the 1RDM, where ‘i’ specifies the index for the length of back propagation time desired (e.g. \(i=2 \rightarrow \tau_{BP} = 1.5\) au). eqlb is the equilibration time, and here we skip 10 blocks of length 2 au.

from afqmctools.analysis.extraction import extract_observable
dm = extract_observable(filename,
                        estimator='back_propagated'
                        name='one_rdm',
                        ix=2)

which extracts the 1RDM for all blocks and finally,

from afqmctools.analysis.extraction import extract_observable
dm, weights = extract_observable(filename,
                                 estimator='back_propagated'
                                 name='one_rdm',
                                 ix=2,
                                 sample=index)

which extracts a single density matrix for block index.

Have a look through check_h1e_conv.py and run it. A plot should be produced which shows the back propagated AFQMC one-body energy as a function of back propagation time, which converges to a value of roughly -78.888(1). This system is sufficiently small to perform FCI on. How does ph-AFQMC compare? Why are the error bars getting bigger with back propagation time?

Finally, we should mention that the path restoration algorithm introduced in M&Z is also implemented and can be turned on using the path_restoration parameter in the Estimator block.

In QMCPACK path restoration restores both the cosine projection and phase along the back propagation path. In general it was found in M&Z that path restoration always produced better results than using the standard back propagation algorithm, and it is recommended that it is always used. Does path restoration affect the results for methane?

Example 7: 2x2x2 Diamond supercell¶

In this example we will show how to generate the AFQMC input from a pbc pyscf calculation for a 2x2x2 supercell of diamond using a RHF trial wavefunction.

Again the first step is to run a pyscf calculation using the scf.py script in this directory.

The first part of the pyscf calculation is straightforward. See pyscf/examples/pbc for more examples on how to set up Hartree–Fock and DFT simulations.

import h5py
import numpy
import sys
from pyscf.pbc import scf, dft, gto

cell = gto.Cell()
cell.verbose = 5
alat0 = 3.6
cell.a = (numpy.ones((3,3))-numpy.eye(3))*alat0 / 2.0
cell.atom = (('C',0,0,0), ('C',numpy.array([0.25,0.25,0.25])*alat0))
cell.basis = 'gth-szv'
cell.pseudo = 'gth-pade'
cell.mesh = [28,28,28]
cell.build()
nk = [2,2,2]
kpts = cell.make_kpts(nk)

mf = scf.KRHF(cell, kpts=kpts)
mf.chkfile = 'scf.chk'
mf.kernel()

In addition to a standard pyscf calculation, we add the following lines:

from afqmctools.utils.linalg import get_ortho_ao
hcore = mf.get_hcore()
fock = (hcore + mf.get_veff())
X, nmo_per_kpt = get_ortho_ao(cell,kpts)
with h5py.File(mf.chkfile) as fh5:
  fh5['scf/hcore'] = hcore
  fh5['scf/fock'] = fock
  fh5['scf/orthoAORot'] = X
  fh5['scf/nmo_per_kpt'] = nmo_per_kpt

essentially, storing the fock matrix, core Hamiltonian and transformation matrix to the orthogonalised AO basis. This is currently required for running PBC AFQMC calculations.

Once the above (scf.py) script is run we will again use the pyscf_to_afqmc.py script to generate the necessary AFQMC input file.

mpirun -n 8 /path/to/qmcpack/utils/afqmctools/bin/pyscf_to_afqmc.py -i scf.chk -o afqmc.h5 -t 1e-5 -v -a

Note that the commands necessary to generate the integrals are identical to those for the molecular calculations, except now we accelerate their calculation using MPI. Note that if the number of tasks > number of kpoints then the number of MPI tasks must be divisible by the number of kpoints.

Once this is done we will again find a Hamiltonian file and afqmc input xml file. Inspecting these you will notice that their structure is identical to the molecular calculations seen previously. This is because we have not exploited k-point symmetry and are writing the integrals in a supercell basis. In the next example we will show how exploiting k-point symmetry can be done explicitly, which leads to a faster and lower memory algorithm for AFQMC.

Example 8: 2x2x2 Diamond k-point symmetry¶

In this example we will show how to run an AFQMC simulation that exploits k-point symmetry which is much more efficient that running in the supercell way discussed in the previous example. We will again look at the same 2x2x2 cell of diamond. We assume you have run the scf calculation in the previous example.

Essentially all that changes in the integral generation step is that we pass the -k/–kpoint flag to pyscf_to_afqmc.py.

mpirun -n 8 /path/to/qmcpack/utils/afqmctools/bin/pyscf_to_afqmc.py -i ../07-diamond_2x2x2_supercell/scf.chk -o afqmc.h5 -t 1e-5 -v -a -k

You will notice that now the Cholesky decomposition is done for each momentum transfer independently and the the form of the hamiltonian file has changed to be k-point dependent.

Apart from these changes, running the AFQMC simulation proceeds as before, however you should see a significant performance boost relative to the supercell simulations, particularly on GPU machines.