.. _lab-advanced-molecules: Lab 3: Advanced molecular calculations ====================================== Topics covered in this lab -------------------------- This lab covers molecular QMC calculations with wavefunctions of increasing sophistication. All of the trial wavefunctions are initially generated with the GAMESS code. Topics covered include: - Generating single-determinant trial wavefunctions with GAMESS (HF and DFT) - Generating multideterminant trial wavefunctions with GAMESS (CISD, CASCI, and SOCI) - Optimizing wavefunctions (Jastrow factors and CSF coefficients) with QMC - DMC time step and walker population convergence studies - Systematic progressions of Jastrow factors in VMC - Systematic convergence of DMC energies with multideterminant wavefunctions - Influence of orbitals basis choice on DMC energy Lab directories and files ------------------------- :: abs/lab3_advanced_molecules/exercises │ ├── ex1_first-run-hartree-fock - basic work flow from Hatree-Fock to DMC │ ├── gms - Hatree-Fock calculation using GAMESS │ │ ├── h2o.hf.inp - GAMESS input │ │ ├── h2o.hf.dat - GAMESS punch file containing orbitals │ │ └── h2o.hf.out - GAMESS output with orbitals and other info │ ├── convert - Convert GAMESS wavefunction to QMCPACK format │ │ ├── h2o.hf.out - GAMESS output │ │ ├── h2o.ptcl.xml - converted particle positions │ │ └── h2o.wfs.xml - converted wave function │ ├── opt - VMC optimization │ │ └── optm.xml - QMCPACK VMC optimization input │ ├── dmc_timestep - Check DMC timestep bias │ │ └── dmc_ts.xml - QMCPACK DMC input │ └── dmc_walkers - Check DMC population control bias │ └── dmc_wk.xml - QMCPACK DMC input template │ ├── ex2_slater-jastrow-wf-options - explore jastrow and orbital options │ ├── jastrow - Jastrow options │ │ ├── 12j - no 3-body Jastrow │ │ ├── 1j - only 1-body Jastrow │ │ └── 2j - only 2-body Jastrow │ └── orbitals - Orbital options │ ├── pbe - PBE orbitals │ │ └── gms - DFT calculation using GAMESS │ │ └── h2o.pbe.inp - GAMESS DFT input │ ├── pbe0 - PBE0 orbitals │ ├── blyp - BLYP orbitals │ └── b3lyp - B3LYP orbitals │ ├── ex3_multi-slater-jastrow │ ├── cisd - CISD wave function │ │ ├── gms - CISD calculation using GAMESS │ │ │ ├── h2o.cisd.inp - GAMESS input │ │ │ ├── h2o.cisd.dat - GAMESS punch file containing orbitals │ │ │ └── h2o.cisd.out - GAMESS output with orbitals and other info │ │ └── convert - Convert GAMESS wavefunction to QMCPACK format │ │ └── h2o.hf.out - GAMESS output │ ├── casci - CASCI wave function │ │ └── gms - CASCI calculation using GAMESS │ └── soci - SOCI wave function │ ├── gms - SOCI calculation using GAMESS │ ├── thres0.01 - VMC optimization with few determinants │ └── thres0.0075 - VMC optimization with more determinants │ └── pseudo ├── H.BFD.gamess - BFD pseudopotential for H in GAMESS format ├── O.BFD.CCT.gamess - BFD pseudopotential for O in GAMESS format ├── H.xml - BFD pseudopotential for H in QMCPACK format └── O.xml - BFD pseudopotential for H in QMCPACK format Exercise #1: Basics ------------------- The purpose of this exercise is to show how to generate wavefunctions for QMCPACK using GAMESS and to optimize the resulting wavefunctions using VMC. This will be followed by a study of the time step and walker population dependence of DMC energies. The exercise will be performed on a water molecule at the equilibrium geometry. Generation of a Hartree-Fock wavefunction with GAMESS ----------------------------------------------------- From the top directory, go to “``ex1_first-run-hartree-fock/gms``.” This directory contains an input file for a HF calculation of a water molecule using BFD ECPs and the corresponding cc-pVTZ basis set. The input file should be named: “h2o.hf.inp.” Study the input file. See Section :ref:`lab-adv-mol-gamess` for a more detailed description of the GAMESS input syntax. However, there will be a better time to do this soon, so we recommend continuing with the exercise at this point. After you are done, execute GAMESS with this input and store the standard output in a file named “h2o.hf.output.” Finally, in the “convert” folder, use ``convert4qmc`` to generate the QMCPACK ``particleset`` and ``wavefunction`` files. It is always useful to rename the files generated by ``convert4qmc`` to something meaningful since by default they are called ``sample.Gaussian-G2.xml`` and ``sample.Gaussian-G2.ptcl.xml``. In a standard computer (without cross-compilation), these tasks can be accomplished by the following commands. :: cd ${TRAINING TOP}/ex1_first-run-hartree-fock/gms jobrun_vesta rungms h2o.hf cd ../convert cp ../gms/h2o.hf.output jobrun_vesta convert4qmc -gamess h2o.hf.output -add3BodyJ mv sample.Gaussian-G2.xml h2o.wfs.xml mv sample.Gaussian-G2.ptcl.xml h2o.ptcl.xml The HF energy of the system is -16.9600590022 Ha. To search for the energy in the output file quickly, you can use :: grep "TOTAL ENERGY =" h2o.hf.output As the job runs on VESTA, it is a good time to review Section :ref`lab-adv-mol-convert4qmc`, “Appendix B: convert4qmc," which contains a description on the use of the converter. Optimize the wavefunction ~~~~~~~~~~~~~~~~~~~~~~~~~ When execution of the previous steps is completed, there should be two new files called ``h2o.wfs.xml`` and ``h2o.ptcl.xml``. Now we will use VMC to optimize the Jastrow parameters in the wavefunction. From the top directory, go to “``ex1_first-run-hartree-fock/opt``.” Copy the xml files generated in the previous step to the current directory. This directory should already contain a basic QMCPACK input file for an optimization calculation (``optm.xml``) Open ``optm.xml`` with your favorite text editor and modify the name of the files that contain the ``wavefunction`` and ``particleset`` XML blocks. These files are included with the commands: .. code-block:: xml (the particle set must be defined before the wavefunction). The name of the particle set and wavefunction files should now be ``h2o.ptcl.xml`` and ``h2o.wfs.xml``, respectively. Study both files and submit when you are ready. Notice that the location of the ECPs has been set for you; in your own calculations you have to make sure you obtain the ECPs from the appropriate libraries and convert them to QMCPACK format using ppconvert. While these calculations finish is a good time to study :ref:`lab-adv-mol-opt-appendix`, which contains a review of the main parameters in the optimization XML block. The previous steps can be accomplished by the following commands: :: cd ${TRAINING TOP}/ex1_first-run-hartree-fock/opt cp ../convert/h2o.wfs.xml ./ cp ../convert/h2o.ptcl.xml ./ # edit optm.xml to include the correct ptcl.xml and wfs.xml jobrun_vesta qmcpack optm.xml Use the analysis tool ``qmca`` to analyze the results of the calculation. Obtain the VMC energy and variance for each step in the optimization and plot it using your favorite program. Remember that ``qmca`` has built-in functions to plot the analyzed data. :: qmca -q e *scalar.dat -p The resulting energy as a function of the optimization step should look qualitatively similar to :numref:`fig17`. The energy should decrease quickly as a function of the number of optimization steps. After 6–8 steps, the energy should be converged to :math:`\sim`\ 2–3 mHa. To improve convergence, we would need to increase the number of samples used during optimization (You can check this for yourself later.). With optimized wavefunctions, we are in a position to perform VMC and DMC calculations. The modified wavefunction files after each step are written in a file named ``ID.sNNN.opt.xml``, where ID is the identifier of the calculation defined in the input file (this is defined in the project XML block with parameter “id”) and NNN is a series number that increases with every executable xml block in the input file. .. _fig17: .. figure:: /figs/lab_advanced_molecules_opt_conv.png :width: 500 :align: center VMC energy as a function of optimization step. Time-step study ~~~~~~~~~~~~~~~ Now we will study the dependence of the DMC energy with time step. From the top directory, go to “``ex1_first-run-hartree-fock/dmc_timestep``.” This folder contains a basic XML input file (``dmc_ts.xml``) that performs a short VMC calculation and three DMC calculations with varying time steps (0.1, 0.05, 0.01). Link the ``particleset`` and the last ``optimization`` file from the previous folder (the file called ``jopt-h2o.sNNN.opt.xml`` with the largest value of NNN). Rename the optimized ``wavefunction`` file to any suitable name if you wish (for example, ``h2o.opt.xml``) and change the name of the ``particleset`` and ``wavefunction`` files in the input file. An optimized wavefunction can be found in the reference files (same location) in case it is needed. The main steps needed to perform this exercise are: :: cd \$\{TRAINING TOP\}/ex1_first-run-hartree-fock/dmc_timestep cp ../opt/h2o.ptcl.xml ./ cp ../opt/jopt-h2o.s007.opt.xml h2o.opt.wfs.xml # edit dmc_ts.xml to include the correct ptcl.xml and wfs.xml jobrun_vesta qmcpack dmc_ts.xml While these runs complete, go to :ref:`lab-adv-mol-vmcdmc-appendix` and review the basic VMC and DMC input blocks. Notice that in the current DMC blocks the time step is decreased as the number of blocks is increased. Why is this? When the simulations are finished, use ``qmca`` to analyze the output files and plot the DMC energy as a function of time step. Results should be qualitatively similar to those presented in :numref:`fig18`; in this case we present more time steps with well converged results to better illustrate the time step dependence. In realistic calculations, the time step must be chosen small enough so that the resulting error is below the desired accuracy. Alternatively, various calculations can be performed and the results extrapolated to the zero time-step limit. .. _fig18: .. figure:: /figs/lab_advanced_molecules_dmc_timestep.png :width: 500 :align: center DMC energy as a function of time step. Walker population study ~~~~~~~~~~~~~~~~~~~~~~~ Now we will study the dependence of the DMC energy with the number of walkers in the simulation. Remember that, in principle, the DMC distribution is reached in the limit of an infinite number of walkers. In practice, the energy and most properties converge to high accuracy with :math:`\sim`\ 100–1,000 walkers. The actual number of walkers needed in a calculation will depend on the accuracy of the VMC wavefunction and on the complexity and size of the system. Also notice that using too many walkers is not a problem; at worse it will be inefficient since it will cost more computer time than necessary. In fact, this is the strategy used when running QMC calculations on large parallel computers since we can reduce the statistical error bars efficiently by running with large walker populations distributed across all processors. From the top directory, go to “``ex1_first-run-hartree-fock/dmc_walkers``.” Copy the optimized ``wavefunction`` and ``particleset`` files used in the previous calculations to the current folder; these are the files generated during step 2 of this exercise. An optimized ``wavefunction`` file can be found in the reference files (same location) in case it is needed. The directory contains a sample DMC input file and submission script. Create three directories named NWx, with x values of 120,240,480, and copy the input file to each one. Go to “NW120,” and, in the input file, change the name of the ``wavefunction`` and ``particleset`` files (in this case they will be located one directory above, so use “``../dmc_timestep/h2.opt.xml``,” for example); change the PP directory so that it points to one directory above; change “targetWalkers” to 120; and change the number of steps to 100, the time step to 0.01, and the number of blocks to 400. Notice that “targetWalkers” is one way to set the desired (average) number of walkers in a DMC calculation. One can alternatively set “samples” in the `` 10 25 1 20 0.5 10240 0.95 0.0 0.05 yes 10.0 yes quartic -6 1.0e-5 0.15 1 Options: - bigchange: (default 50.0) Largest parameter change allowed - usebuffer: (default no) Save useful information during VMC - MinMethod: (default quartic) Method to calculate magnitude of parameter change quartic: fit quartic polynomial to four values of the cost function obtained using reweighting along chosen direction linemin: direct line minimization using reweighting rescale: no 1-D minimization. Uses Umrigars suggestions. - stepsize: (default 0.25) Step size in either quartic or linemin methods. - alloweddifference: (default 1e-4) Allowed increase in energy - exp0: (default -16.0) Initial value for stabilizer (shift to diagonal of H). Actual value of stabilizer is 10 exp0 - nstabilizers: (default 3) Number of stabilizers to try - stabilizaterScale: (default 2.0) Increase in value of exp0 between iterations. - max its: (default 1) Number of inner loops with same sample - minwalkers: (default 0.3) Minimum value allowed for the ratio of effective samples to actual number of walkers in a reweighting step. The optimization will stop if the effective number of walkers in any reweighting calculation drops below this value. Last set of acceptable parameters are kept. - maxWeight: (defaul 1e6) Maximum weight allowed in reweighting. Any weight above this value will be reset to this value. Recommendations: - Set samples to equal to (#threads)*blocks. - Set steps to 1. Use substeps to control correlation between samples. - For cases where equilibration is slow, increase both substeps and warmupsteps. - For hard cases (e.g., simultaneous optimization of long MSD and 3-Body J), set exp0 to 0 and do a single inner iteration (max its=1) per sample of configurations. .. _lab-adv-mol-vmcdmc-appendix: Appendix D: VMC and DMC XML block --------------------------------- .. code-block:: xml :caption: Sample XML blocks for VMC and DMC calculations. :name: Listing 61 yes 100 100 1 20 30 0.3 yes 1920 100 100 0.1 General Options: - **move**: (default “walker”) Type of electron move. Options: “pbyp” and “walker.” - **checkpoint**: (default “-1”) (If > 0) Generate checkpoint files with given frequency. The calculations can be restarted/continued with the produced checkpoint files. - **useDrift**: (default “yes”) Defines the sampling mode. useDrift = “yes” will use Langevin acceleration to sample the VMC and DMC distributions, while useDrift=“no” will use random displacements in a box. - **warmupSteps**: (default 0) Number of steps warmup steps at the beginning of the calculation. No output is produced for these steps. - **blocks**: (default 1) Number of blocks (outer loop). - **steps**: (default 1) Number of steps per blocks (middle loop). - **sub steps**: (default 1) Number of substeps per step (inner loop). During substeps, the local energy is not evaluated in VMC calculations, which leads to faster execution. In VMC calculations, set substeps to the average autocorrelation time of the desired quantity. - **time step**: (default 0.1) Electronic time step in bohr. - **samples**: (default 0) Number of walker configurations saved during the current calculation. - **walkers**: (default #threads) In VMC, sets the number of walkers per node. The total number of walkers in the calculation will be equal to walkers*(# nodes). Options unique to DMC: - **targetWalkers**: (default #walkers from previous calculation, e.g., VMC). Sets the target number of walkers. The actual population of walkers will fluctuate around this value. The walkers will be distributed across all the nodes in the calculation. On a given node, the walkers are split across all the threads in the system. - **nonlocalmoves**: (default “no”) Set to “yes” to turns on the use of Casula’s T-moves. .. _lab-adv-mol-wf-appendix: Appendix E: Wavefunction XML block ---------------------------------- .. code-block:: xml :caption: Basic framework for a single-determinant determinantset XML block. :name: Listing 62 ... ... ... ... In this section we describe the basic format of a QMCPACK wavefunction XML block. Everything listed in this section is generated by the appropriate converter tools. Little to no modification is needed when performing standard QMC calculations. As a result, this section is meant mainly for illustration purposes. Only experts should attempt to modify these files (with very few exceptions like the cutoff of CI coefficients and the cutoff in Jastrow functions) since changes can lead to unexpected results. A QMCPACK wavefunction XML block is a combination of a determinantset, which contains the antisymmetric part of the wavefunction and one or more Jastrow blocks. The syntax of the antisymmetric block depends on whether the wavefunction is a single determinant or a multideterminant expansion. :ref:`Listing 62 ` shows the general structure of the single-determinant case. The determinantset block is composed of a basisset block, which defines the atomic orbital basis set, and a slaterdeterminant block, which defines the SPOs and occupation numbers of the Slater determinant. :ref:`Listing 63 ` shows a (piece of a) sample of a slaterdeterminant block. The slaterdeterminant block consists of two determinant blocks, one for each electron spin. The parameter “size” in the determinant block refers to the number of SPOs present while the “size” parameter in the coefficient block refers to the number of atomic basis functions per SPO. .. code-block:: xml :caption: Sample XML block for the single Slater determinant case. :name: Listing 63 9.55471000000000e-01 -3.87000000000000e-04 6.51140000000000e-02 2.17700000000000e-03 1.43900000000000e-03 4.00000000000000e-06 -4.58000000000000e-04 -5.20000000000000e-05 -2.40000000000000e-05 6.00000000000000e-06 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -5.26000000000000e-04 2.63000000000000e-04 2.63000000000000e-04 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -1.27000000000000e-04 6.30000000000000e-05 6.30000000000000e-05 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 -3.20000000000000e-05 1.60000000000000e-05 1.60000000000000e-05 -0.00000000000000e+00 -0.00000000000000e+00 -0.00000000000000e+00 7.00000000000000e-06 :ref:`Listing 64 ` shows the general structure of the multideterminant case. Similar to the single-determinant case, the determinantset must contain a basisset block. This definition is identical to the one described previously. In this case, the definition of the SPOs must be done independently from the definition of the determinant configurations; the latter is done in the sposet block, while the former is done on the multideterminant block. Notice that two sposet sets must be defined, one for each electron spin. The name of each sposet set is required in the definition of the multideterminant block. The determinants are defined in terms of occupation numbers based on these orbitals. .. code-block:: xml :caption: Basic framework for a multideterminant determinantset XML block. :name: Listing 64 ... ... ... ... ... ... There are various options in the multideterminant block that users should be aware of. - cutoff: (IMPORTANT! ) Only configurations with (absolute value) “qchem coeff” larger than this value will be read by QMCPACK. - optimize: Turn on/off the optimization of linear CI coefficients. - coeff: (in csf ) Current coefficient of given configuration. Gets updated during wavefunction optimization. - qchem coeff: (in csf ) Original coefficient of given configuration from GAMESS calculation. This is used when applying a cutoff to the configurations read from the file. The cutoff is applied on this parameter and not on the optimized coefficient. - nca and nab: Number of core orbitals for up/down electrons. A core orbital is an orbital that is doubly occupied in all determinant configurations, not to be confused with core electrons. These are not explicitly listed on the definition of configurations. - nea and neb: Number of up/down active electrons (those being explicitly correlated). - nstates: Number of correlated orbitals. - size (in detlist ): Contains the number of configurations in the list. The remaining part of the determinantset block is the definition of Jastrow factor. Any number of these can be defined. :ref:`Listing 65 ` shows a sample Jastrow block including 1-, 2- and 3-body terms. This is the standard block produced by ``convert4qmc`` with the option -add3BodyJ (this particular example is for a water molecule). Optimization of individual radial functions can be turned on/off using the “optimize” parameter. It can be added to any coefficients block, even though it is currently not present in the J1 and J2 blocks. .. code-block:: xml :caption: Sample Jastrow XML block. :name: Listing 65 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 This training assumes basic familiarity with the UNIX operating system. In particular, we use simple scripts written in “csh.” In addition, we assume you have obtained all the necessary files and executables and that the training files are located at ${TRAINING TOP}. The goal of this training is not only to familiarize you with the execution and options in QMCPACK but also to introduce you to important concepts in QMC calculations and many-body electronic structure calculations.