Development Guide

The section gives guidance on how to extend the functionality of QMCPACK. Future examples will likely include topics such as the addition of a Jastrow function or a new QMC method.

QMCPACK coding standards

This chapter presents what we collectively have agreed are best practices for the code. This includes formatting style, naming conventions, documentation conventions, and certain prescriptions for C++ language use. At the moment only the formatting can be enforced in an objective fashion.

New development should follow these guidelines, and contributors are expected to adhere to them as they represent an integral part of our effort to continue QMCPACK as a world-class, sustainable QMC code. Although some of the source code has a ways to go to live up to these ideas, new code, even in old files, should follow the new conventions not the local conventions of the file whenever possible. Work on the code with continuous improvement in mind rather than a commitment to stasis.

The current workflow conventions for the project are described in the wiki on the GitHub repository. It will save you and all the maintainers considerable time if you read these and ask questions up front.

A PR should follow these standards before inclusion in the mainline. You can be sure of properly following the formatting conventions if you use clang-format. The mechanics of clang-format setup and use can be found at https://github.com/QMCPACK/qmcpack/wiki/Source-formatting.

The clang-format file found at qmcpack/src/.clang-format should be run over all code touched in a PR before a pull request is prepared. We also encourage developers to run clang-tidy with the qmcpack/src/.clang-tidy configuration over all new code.

As much as possible, try to break up refactoring, reformatting, feature, and bugs into separate, small PRs. Aim for something that would take a reviewer no more than an hour. In this way we can maintain a good collective development velocity.

Files

Each file should start with the header.

//////////////////////////////////////////////////////////////////////////////////////
// This file is distributed under the University of Illinois/NCSA Open Source License.
// See LICENSE file in top directory for details.
//
// Copyright (c) 2020 QMCPACK developers
//
// File developed by: Name, email, affiliation
//
// File created by: Name, email, affiliation
//////////////////////////////////////////////////////////////////////////////////////

If you make significant changes to an existing file, add yourself to the list of “developed by” authors.

File organization

Header files should be placed in the same directory as their implementations. Unit tests should be written for all new functionality. These tests should be placed in a tests subdirectory below the implementations.

File names

Each class should be defined in a separate file with the same name as the class name. Use separate .cpp implementation files whenever possible to aid in incremental compilation.

The filenames of tests are composed by the filename of the object tested and the prefix test_. The filenames of fake and mock objects used in tests are composed by the prefixes fake_ and mock_, respectively, and the filename of the object that is imitated.

Header files

All header files should be self-contained (i.e., not dependent on following any other header when it is included). Nor should they include files that are not necessary for their use (i.e., headers needed only by the implementation). Implementation files should not include files only for the benefit of files they include.

There are many header files that currently violate this. Each header must use #define guards to prevent multiple inclusion. The symbol name of the #define guards should be NAMESPACE(s)_CLASSNAME_H.

Includes

Header files should be included with the full path based on the src directory. For example, the file qmcpack/src/QMCWaveFunctions/SPOSet.h should be included as

#include "QMCWaveFunctions/SPOSet.h"

Even if the included file is located in the same directory as the including file, this rule should be obeyed. Header files from external projects and standard libraries should be includes using the <iostream> convention, while headers that are part of the QMCPACK project should be included using the "our_header.h" convention.

For readability, we suggest using the following standard order of includes:

  1. related header

  2. std C library headers

  3. std C++ library headers

  4. Other libraries’ headers

  5. QMCPACK headers

In each section the included files should be sorted in alphabetical order.

Naming

The balance between description and ease of implementation should be balanced such that the code remains self-documenting within a single terminal window. If an extremely short variable name is used, its scope must be shorter than \(\sim 40\) lines. An exception is made for template parameters, which must be in all CAPS.

Namespace names

Namespace names should be one word, lowercase.

Type and class names

Type and class names should start with a capital letter and have a capital letter for each new word. Underscores (_) are not allowed.

Variable names

Variable names should not begin with a capital letter, which is reserved for type and class names. Underscores (_) should be used to separate words.

Class data members

Class private/protected data members names should follow the convention of variable names with a trailing underscore (_).

(Member) function names

Function names should start with a lowercase character and have a capital letter for each new word.

Lambda expressions

Named lambda expressions follow the naming convention for functions:

auto myWhatever = [](int i) { return i + 4; };

Macro names

Macro names should be all uppercase and can include underscores (_). The underscore is not allowed as first or last character.

Test case and test names

Test code files should be named as follows:

class DiracMatrix;
//leads to
test_dirac_matrix.cpp
//which contains test cases named
TEST_CASE("DiracMatrix_update_row","[wavefunction][fermion]")

where the test case covers the updateRow and [wavefunction][fermion] indicates the test belongs to the fermion wavefunction functionality.

Comments

Comment style

Use the // Comment syntax for actual comments.

Use

/** base class for Single-particle orbital sets
 *
 * SPOSet stands for S(ingle)P(article)O(rbital)Set which contains
 * a number of single-particle orbitals with capabilities of
 * evaluating \f$ \psi_j({\bf r}_i)\f$
 */

or

///index in the builder list of sposets
int builder_index;

Documentation

Doxygen will be used for source documentation. Doxygen commands should be used when appropriate guidance on this has been decided.

File docs

Do not put the file name after the \file Doxygen command. Doxygen will fill it in for the file the tag appears in.

/** \file
 *  File level documentation
 */

Class docs

Every class should have a short description (in the header of the file) of what it is and what is does. Comments for public class member functions follow the same rules as general function comments. Comments for private members are allowed but are not mandatory.

Function docs

For function parameters whose type is non-const reference or pointer to non-const memory, it should be specified if they are input (In:), output (Out:) or input-output parameters (InOut:).

Example:

/** Updates foo and computes bar using in_1 .. in_5.
 * \param[in] in_3
 * \param[in] in_5
 * \param[in,out] foo
 * \param[out] bar
 */

//This is probably not what our clang-format would do
void computeFooBar(Type in_1, const Type& in_2, Type& in_3,
                   const Type* in_4, Type* in_5, Type& foo,
                   Type& bar);

Variable documentation

Name should be self-descriptive. If you need documentation consider renaming first.

Golden rule of comments

If you modify a piece of code, also adapt the comments that belong to it if necessary.

Formatting and “style”

Use the provided clang-format style in src/.clang-format to format .h, .hpp, .cu, and .cpp files. Many of the following rules will be applied to the code by clang-format, which should allow you to ignore most of them if you always run it on your modified code.

You should use clang-format support and the .clangformat file with your editor, use a Git precommit hook to run clang-format or run clang-format manually on every file you modify. However, if you see numerous formatting updates outside of the code you have modified, first commit the formatting changes in a separate PR.

Indentation

Indentation consists of two spaces. Do not use tabs in the code.

Line length

The length of each line of your code should be at most 120 characters.

Horizontal spacing

No trailing white spaces should be added to any line. Use no space before a comma (,) and a semicolon (;), and add a space after them if they are not at the end of a line.

Preprocessor directives

The preprocessor directives are not indented. The hash is the first character of the line.

Binary operators

The assignment operators should always have spaces around them.

Unary operators

Do not put any space between an unary operator and its argument.

Types

The using syntax is preferred to typedef for type aliases. If the actual type is not excessively long or complex, simply use it; renaming simple types makes code less understandable.

Pointers and references

Pointer or reference operators should go with the type. But understand the compiler reads them from right to left.

Type* var;
Type& var;

//Understand this is incompatible with multiple declarations
Type* var1, var2; // var1 is a pointer to Type but var2 is a Type.

Templates

The angle brackets of templates should not have any external or internal padding.

template<class C>
class Class1;

Class1<Class2<type1>> object;

Vertical spacing

Use empty lines when it helps to improve the readability of the code, but do not use too many. Do not use empty lines after a brace that opens a scope or before a brace that closes a scope. Each file should contain an empty line at the end of the file. Some editors add an empty line automatically, some do not.

Variable declarations and definitions

  • Avoid declaring multiple variables in the same declaration, especially if they are not fundamental types:

    int x, y;                        // Not recommended
    Matrix a("my-matrix"), b(size);  // Not allowed
    
    // Preferred
    int x;
    int y;
    Matrix a("my-matrix");
    Matrix b(10);
    
  • Use the following order for keywords and modifiers in variable declarations:

    // General type
    [static] [const/constexpr] Type variable_name;
    
    // Pointer
    [static] [const] Type* [const] variable_name;
    
    // Integer
    // the int is not optional not all platforms support long, etc.
    [static] [const/constexpr] [signedness] [size] int variable_name;
    
    // Examples:
    static const Matrix a(10);
    
    
    const double* const d(3.14);
    constexpr unsigned long l(42);
    

Function declarations and definitions

The return type should be on the same line as the function name. Parameters should also be on the same line unless they do not fit on it, in which case one parameter per line aligned with the first parameter should be used.

Also include the parameter names in the declaration of a function, that is,

// calculates a*b+c
double function(double a, double b, double c);

// avoid
double function(double, double, double);

//dont do this
double function(BigTemplatedSomething<double> a, BigTemplatedSomething<double> b,
                BigTemplatedSomething<double> c);

//do this
double function(BigTemplatedSomething<double> a,
                BigTemplatedSomething<double> b,
                BigTemplatedSomething<double> c);

Conditionals

Examples:

if (condition)
  statement;
else
  statement;

if (condition)
{
  statement;
}
else if (condition2)
{
  statement;
}
else
{
  statement;
}

Switch statement

Switch statements should always have a default case.

Example:

switch (var)
{
  case 0:
    statement1;
    statement2;
    break;

  case 1:
    statement1;
    statement2;
    break;

  default:
    statement1;
    statement2;
}

Loops

Examples:

for (statement; condition; statement)
  statement;

for (statement; condition; statement)
{
  statement1;
  statement2;
}

while (condition)
  statement;

while (condition)
{
  statement1;
  statement2;
}

do
{
  statement;
}
while (condition);

Class format

public, protected, and private keywords are not indented.

Example:

class Foo : public Bar
{
public:
  Foo();
  explicit Foo(int var);

  void function();
  void emptyFunction() {}

  void setVar(const int var)
  {
    var_ = var;
  }
  int getVar() const
  {
    return var_;
  }

private:
  bool privateFunction();

  int var_;
  int var2_;
};

Constructor initializer lists

Examples:

// When everything fits on one line:
Foo::Foo(int var) : var_(var)
{
  statement;
}

// If the signature and the initializer list do not
// fit on one line, the colon is indented by 4 spaces:
Foo::Foo(int var)
    : var_(var), var2_(var + 1)
{
  statement;
}

// If the initializer list occupies more lines,
// they are aligned in the following way:
Foo::Foo(int var)
    : some_var_(var),
      some_other_var_(var + 1)
{
  statement;
}

// No statements:
Foo::Foo(int var)
    : some_var_(var) {}

Namespace formatting

The content of namespaces is not indented. A comment should indicate when a namespace is closed. (clang-format will add these if absent). If nested namespaces are used, a comment with the full namespace is required after opening a set of namespaces or an inner namespace.

Examples:

namespace ns
{
void foo();
}  // ns
namespace ns1
{
namespace ns2
{
// ns1::ns2::
void foo();

namespace ns3
{
// ns1::ns2::ns3::
void bar();
}  // ns3
}  // ns2

namespace ns4
{
namespace ns5
{
// ns1::ns4::ns5::
void foo();
}  // ns5
}  // ns4
}  // ns1

QMCPACK C++ guidance

The guidance here, like any advice on how to program, should not be treated as a set of rules but rather the hard-won wisdom of many hours of suffering development. In the past, many rules were ignored, and the absolute worst results of that will affect whatever code you need to work with. Your PR should go much smoother if you do not ignore them.

Encapsulation

A class is not just a naming scheme for a set of variables and functions. It should provide a logical set of methods, could contain the state of a logical object, and might allow access to object data through a well-defined interface related variables, while preserving maximally ability to change internal implementation of the class.

Do not use struct as a way to avoid controlling access to the class. Only in rare cases where a class is a fully public data structure struct is this appropriate. Ignore (or fix one) the many examples of this in QMCPACK.

Do not use inheritance primarily as a means to break encapsulation. If your class could aggregate or compose another class, do that, and access it solely through its public interface. This will reduce dependencies.

Casting

In C++ source, avoid C style casts; they are difficult to search for and imprecise in function. An exception is made for controlling implicit conversion of simple numerical types.

Explicit C++ style casts make it clear what the safety of the cast is and what sort of conversion is expected to be possible.

int c = 2;
int d = 3;
double a;
a = (double)c / d;  // Ok

const class1 c1;
class2* c2;
c2 = (class2*)&c1; // NO
SPOSetAdvanced* spo_advanced = new SPOSetAdvanced();

SPOSet* spo = (SPOSet*)spo_advanced; // NO
SPOSet* spo = static_cast<SPOSet*>(spo_advanced); // OK if upcast, dangerous if downcast

Pre-increment and pre-decrement

Use the pre-increment (pre-decrement) operator when a variable is incremented (decremented) and the value of the expression is not used. In particular, use the pre-increment (pre-decrement) operator for loop counters where i is not used:

for (int i = 0; i < N; ++i)
{
  doSomething();
}

for (int i = 0; i < N; i++)
{
  doSomething(i);
}

The post-increment and post-decrement operators create an unnecessary copy that the compiler cannot optimize away in the case of iterators or other classes with overloaded increment and decrement operators.

Alternative operator representations

Alternative representations of operators and other tokens such as and, or, and not instead of &&, ||, and ! are not allowed. For the reason of consistency, the far more common primary tokens should always be used.

Use of const

  • Add the const qualifier to all function parameters that are not modified in the function body.

  • For parameters passed by value, add only the keyword in the function definition.

  • Member functions should be specified const whenever possible.

    // Declaration
    int computeFoo(int bar, const Matrix& m)
    
    // Definition
    int computeFoo(const int bar, const Matrix& m)
    {
      int foo = 42;
    
      // Compute foo without changing bar or m.
      // ...
    
      return foo;
    }
    
    class MyClass
    {
      int count_
      ...
      int getCount() const { return count_;}
    }
    

Scalar estimator implementation

Introduction: Life of a specialized OperatorBase

Almost all observables in QMCPACK are implemented as specialized derived classes of the OperatorBase base class. Each observable is instantiated in HamiltonianFactory and added to QMCHamiltonian for tracking. QMCHamiltonian tracks two types of observables: main and auxiliary. Main observables contribute to the local energy. These observables are elements of the simulated Hamiltonian such as kinetic or potential energy. Auxiliary observables are expectation values of matrix elements that do not contribute to the local energy. These Hamiltonians do not affect the dynamics of the simulation. In the code, the main observables are labeled by “physical” flag; the auxiliary observables have “physical” set to false.

Initialization

When an <estimator type="est_type" name="est_name" other_stuff="value"/> tag is present in the <hamiltonian/> section, it is first read by HamiltonianFactory. In general, the type of the estimator will determine which specialization of OperatorBase should be instantiated, and a derived class with myName="est_name" will be constructed. Then, the put() method of this specific class will be called to read any other parameters in the <estimator/> XML node. Sometimes these parameters will instead be read by HamiltonianFactory because it can access more objects than OperatorBase.

Cloning

When OpenMP threads are spawned, the estimator will be cloned by the CloneManager, which is a parent class of many QMC drivers.

// In CloneManager.cpp
#pragma omp parallel for shared(w,psi,ham)
for(int ip=1; ip<NumThreads; ++ip)
{
  wClones[ip]=new MCWalkerConfiguration(w);
  psiClones[ip]=psi.makeClone(*wClones[ip]);
  hClones[ip]=ham.makeClone(*wClones[ip],*psiClones[ip]);
}

In the preceding snippet, ham is the reference to the estimator on the master thread. If the implemented estimator does not allocate memory for any array, then the default constructor should suffice for the makeClone method.

// In SpeciesKineticEnergy.cpp
OperatorBase* SpeciesKineticEnergy::makeClone(ParticleSet& qp, TrialWaveFunction& psi)
{
  return new SpeciesKineticEnergy(*this);
}

If memory is allocated during estimator construction (usually when parsing the XML node in the put method), then the makeClone method should perform the same initialization on each thread.

OperatorBase* LatticeDeviationEstimator::makeClone(ParticleSet& qp, TrialWaveFunction& psi)
{
  LatticeDeviationEstimator* myclone = new LatticeDeviationEstimator(qp,spset,tgroup,sgroup);
  myclone->put(input_xml);
  return myclone;
}

Evaluate

After the observable class (derived class of OperatorBase) is constructed and prepared (by the put() method), it is ready to be used in a QMCDriver. A QMCDriver will call H.auxHevaluate(W,thisWalker) after every accepted move, where H is the QMCHamiltonian that holds all main and auxiliary Hamiltonian elements, W is a MCWalkerConfiguration, and thisWalker is a pointer to the current walker being worked on. As shown in the following, this function goes through each auxiliary Hamiltonian element and evaluates it using the current walker configuration. Under the hood, observables are calculated and dumped to the main particle set’s property list for later collection.

// In QMCHamiltonian.cpp
// This is more efficient.
// Only calculate auxH elements if moves are accepted.
void QMCHamiltonian::auxHevaluate(ParticleSet& P, Walker_t& ThisWalker)
{
#if !defined(REMOVE_TRACEMANAGER)
  collect_walker_traces(ThisWalker,P.current_step);
#endif
  for(int i=0; i<auxH.size(); ++i)
  {
    auxH[i]->setHistories(ThisWalker);
    RealType sink = auxH[i]->evaluate(P);
    auxH[i]->setObservables(Observables);
#if !defined(REMOVE_TRACEMANAGER)
    auxH[i]->collect_scalar_traces();
#endif
    auxH[i]->setParticlePropertyList(P.PropertyList,myIndex);
  }
}

For estimators that contribute to the local energy (main observables), the return value of evaluate() is used in accumulating the local energy. For auxiliary estimators, the return value is not used (sink local variable above); only the value of Value is recorded property lists by the setObservables() method as shown in the preceding code snippet. By default, the setObservables() method will transfer auxH[i]->Value to P.PropertyList[auxH[i]->myIndex]. The same property list is also kept by the particle set being moved by QMCDriver. This list is updated by auxH[i]->setParticlePropertyList(P.PropertyList,myIndex), where myIndex is the starting index of space allocated to this specific auxiliary Hamiltonian in the property list kept by the target particle set P.

Collection

The actual statistics are collected within the QMCDriver, which owns an EstimatorManager object. This object (or a clone in the case of multithreading) will be registered with each mover it owns. For each mover (such as VMCUpdatePbyP derived from QMCUpdateBase), an accumulate() call is made, which by default, makes an accumulate(walkerset) call to the EstimatorManager it owns. Since each walker has a property set, EstimatorManager uses that local copy to calculate statistics. The EstimatorManager performs block averaging and file I/O.

Single scalar estimator implementation guide

Almost all of the defaults can be used for a single scalar observable. With any luck, only the put() and evaluate() methods need to be implemented. As an example, this section presents a step-by-step guide for implementing a verb|SpeciesKineticEnergy| estimator that calculates the kinetic energy of a specific species instead of the entire particle set. For example, a possible input to this estimator can be:

<estimator type="specieskinetic" name="ukinetic" group="u"/>

<estimator type="specieskinetic" name="dkinetic" group="d"/>.

This should create two extra columns in the scalar.dat file that contains the kinetic energy of the up and down electrons in two separate columns. If the estimator is properly implemented, then the sum of these two columns should be equal to the default Kinetic column.

Barebone

The first step is to create a barebone class structure for this simple scalar estimator. The goal is to be able to instantiate this scalar estimator with an XML node and have it print out a column of zeros in the scalar.dat file.

To achieve this, first create a header file “SpeciesKineticEnergy.h” in the QMCHamiltonians folder, with only the required functions declared as follows:

// In SpeciesKineticEnergy.h
#ifndef QMCPLUSPLUS_SPECIESKINETICENERGY_H
#define QMCPLUSPLUS_SPECIESKINETICENERGY_H

#include <Particle/WalkerSetRef.h>
#include <QMCHamiltonians/OperatorBase.h>

namespace qmcplusplus
{

class SpeciesKineticEnergy: public OperatorBase
{
public:

  SpeciesKineticEnergy(ParticleSet& P):tpset(P){ };

  bool put(xmlNodePtr cur);         // read input xml node, required
  bool get(std::ostream& os) const; // class description, required

  Return_t evaluate(ParticleSet& P);
  inline Return_t evaluate(ParticleSet& P, std::vector<NonLocalData>& Txy)
  { // delegate responsity inline for speed
    return evaluate(P);
  }

  // pure virtual functions require overrider
  void resetTargetParticleSet(ParticleSet& P) { }                         // required
  OperatorBase* makeClone(ParticleSet& qp, TrialWaveFunction& psi); // required

private:
  ParticleSet& tpset;

}; // SpeciesKineticEnergy

} // namespace qmcplusplus
#endif

Notice that a local reference tpset to the target particle set P is saved in the constructor. The target particle set carries much information useful for calculating observables. Next, make “SpeciesKineticEnergy.cpp,” and make vacuous definitions.

// In SpeciesKineticEnergy.cpp
#include <QMCHamiltonians/SpeciesKineticEnergy.h>
namespace qmcplusplus
{

bool SpeciesKineticEnergy::put(xmlNodePtr cur)
{
  return true;
}

bool SpeciesKineticEnergy::get(std::ostream& os) const
{
  return true;
}

SpeciesKineticEnergy::Return_t SpeciesKineticEnergy::evaluate(ParticleSet& P)
{
  Value = 0.0;
  return Value;
}

OperatorBase* SpeciesKineticEnergy::makeClone(ParticleSet& qp, TrialWaveFunction& psi)
{
  // no local array allocated, default constructor should be enough
  return new SpeciesKineticEnergy(*this);
}

} // namespace qmcplusplus

Now, head over to HamiltonianFactory and instantiate this observable if an XML node is found requesting it. Look for “gofr” in HamiltonianFactory.cpp, for example, and follow the if block.

// In HamiltonianFactory.cpp
#include <QMCHamiltonians/SpeciesKineticEnergy.h>
else if(potType =="specieskinetic")
{
  SpeciesKineticEnergy* apot = new SpeciesKineticEnergy(*target_particle_set);
  apot->put(cur);
  targetH->addOperator(apot,potName,false);
}

The last argument of addOperator() (i.e., the false flag) is crucial. This tells QMCPACK that the observable we implemented is not a physical Hamiltonian; thus, it will not contribute to the local energy. Changes to the local energy will alter the dynamics of the simulation. Finally, add “SpeciesKineticEnergy.cpp” to HAMSRCS in “CMakeLists.txt” located in the QMCHamiltonians folder. Now, recompile QMCPACK and run it on an input that requests <estimator type="specieskinetic" name="ukinetic"/> in the hamiltonian block. A column of zeros should appear in the scalar.dat file under the name “ukinetic.”

Evaluate

The evaluate() method is where we perform the calculation of the desired observable. In a first iteration, we will simply hard-code the name and mass of the particles.

// In SpeciesKineticEnergy.cpp
#include <QMCHamiltonians/BareKineticEnergy.h> // laplaician() defined here
SpeciesKineticEnergy::Return_t SpeciesKineticEnergy::evaluate(ParticleSet& P)
{
  std::string group="u";
  RealType minus_over_2m = -0.5;

  SpeciesSet& tspecies(P.getSpeciesSet());

  Value = 0.0;
  for (int iat=0; iat<P.getTotalNum(); iat++)
  {
    if (tspecies.speciesName[ P.GroupID(iat) ] == group)
    {
      Value += minus_over_2m*laplacian(P.G[iat],P.L[iat]);
    }
  }
  return Value;

  // Kinetic column has:
  // Value = -0.5*( Dot(P.G,P.G) + Sum(P.L) );
}

Voila—you should now be able to compile QMCPACK, rerun, and see that the values in the “ukinetic” column are no longer zero. Now, the only task left to make this basic observable complete is to read in the extra parameters instead of hard-coding them.

Parse extra input

The preferred method to parse extra input parameters in the XML node is to implement the put() function of our specific observable. Suppose we wish to read in a single string that tells us whether to record the kinetic energy of the up electron (group=“u”) or the down electron (group=“d”). This is easily achievable using the OhmmsAttributeSet class,

// In SpeciesKineticEnergy.cpp
#include <OhmmsData/AttributeSet.h>
bool SpeciesKineticEnergy::put(xmlNodePtr cur)
{
  // read in extra parameter "group"
  OhmmsAttributeSet attrib;
  attrib.add(group,"group");
  attrib.put(cur);

  // save mass of specified group of particles
  SpeciesSet& tspecies(tpset.getSpeciesSet());
  int group_id  = tspecies.findSpecies(group);
  int massind   = tspecies.getAttribute("mass");
  minus_over_2m = -1./(2.*tspecies(massind,group_id));

  return true;
}

where we may keep “group” and “minus_over_2m” as local variables to our specific class.

// In SpeciesKineticEnergy.h
private:
  ParticleSet& tpset;
  std::string  group;
  RealType minus_over_2m;

Notice that the previous operations are made possible by the saved reference to the target particle set. Last but not least, compile and run a full example (i.e., a short DMC calculation) with the following XML nodes in your input:

<estimator type="specieskinetic" name="ukinetic" group="u"/>

<estimator type="specieskinetic" name="dkinetic" group="d"/>

Make sure the sum of the “ukinetic” and “dkinetic” columns is exactly the same as the Kinetic columns at every block.

For easy reference, a summary of the complete list of changes follows:

// In HamiltonianFactory.cpp
#include "QMCHamiltonians/SpeciesKineticEnergy.h"
else if(potType =="specieskinetic")
{
      SpeciesKineticEnergy* apot = new SpeciesKineticEnergy(*targetPtcl);
      apot->put(cur);
      targetH->addOperator(apot,potName,false);
}
// In SpeciesKineticEnergy.h
#include <Particle/WalkerSetRef.h>
#include <QMCHamiltonians/OperatorBase.h>

namespace qmcplusplus
{

class SpeciesKineticEnergy: public OperatorBase
{
public:

  SpeciesKineticEnergy(ParticleSet& P):tpset(P){ };

  // xml node is read by HamiltonianFactory, eg. the sum of following should be equivalent to Kinetic
  // <estimator name="ukinetic" type="specieskinetic" target="e" group="u"/>
  // <estimator name="dkinetic" type="specieskinetic" target="e" group="d"/>
  bool put(xmlNodePtr cur);         // read input xml node, required
  bool get(std::ostream& os) const; // class description, required

  Return_t evaluate(ParticleSet& P);
  inline Return_t evaluate(ParticleSet& P, std::vector<NonLocalData>& Txy)
  { // delegate responsity inline for speed
    return evaluate(P);
  }

  // pure virtual functions require overrider
  void resetTargetParticleSet(ParticleSet& P) { }                         // required
  OperatorBase* makeClone(ParticleSet& qp, TrialWaveFunction& psi); // required

private:
  ParticleSet&       tpset; // reference to target particle set
  std::string        group; // name of species to track
  RealType   minus_over_2m; // mass of the species !! assume same mass
  // for multiple species, simply initialize multiple estimators

}; // SpeciesKineticEnergy

} // namespace qmcplusplus
#endif
// In SpeciesKineticEnergy.cpp
#include <QMCHamiltonians/SpeciesKineticEnergy.h>
#include <QMCHamiltonians/BareKineticEnergy.h> // laplaician() defined here
#include <OhmmsData/AttributeSet.h>

namespace qmcplusplus
{

bool SpeciesKineticEnergy::put(xmlNodePtr cur)
{
  // read in extra parameter "group"
  OhmmsAttributeSet attrib;
  attrib.add(group,"group");
  attrib.put(cur);

  // save mass of specified group of particles
  int group_id  = tspecies.findSpecies(group);
  int massind   = tspecies.getAttribute("mass");
  minus_over_2m = -1./(2.*tspecies(massind,group_id));

  return true;
}

bool SpeciesKineticEnergy::get(std::ostream& os) const
{ // class description
  os << "SpeciesKineticEnergy: " << myName << " for species " << group;
  return true;
}

SpeciesKineticEnergy::Return_t SpeciesKineticEnergy::evaluate(ParticleSet& P)
{
  Value = 0.0;
  for (int iat=0; iat<P.getTotalNum(); iat++)
  {
    if (tspecies.speciesName[ P.GroupID(iat) ] == group)
    {
      Value += minus_over_2m*laplacian(P.G[iat],P.L[iat]);
    }
  }
  return Value;
}

OperatorBase* SpeciesKineticEnergy::makeClone(ParticleSet& qp, TrialWaveFunction& psi)
{ //default constructor
  return new SpeciesKineticEnergy(*this);
}

} // namespace qmcplusplus

Multiple scalars

It is fairly straightforward to create more than one column in the scalar.dat file with a single observable class. For example, if we want a single SpeciesKineticEnergy estimator to simultaneously record the kinetic energies of all species in the target particle set, we only have to write two new methods: addObservables() and setObservables(), then tweak the behavior of evaluate(). First, we will have to override the default behavior of addObservables() to make room for more than one column in the scalar.dat file as follows,

// In SpeciesKineticEnergy.cpp
void SpeciesKineticEnergy::addObservables(PropertySetType& plist, BufferType& collectables)
{
  myIndex = plist.size();
  for (int ispec=0; ispec<num_species; ispec++)
  { // make columns named "$myName_u", "$myName_d" etc.
    plist.add(myName + "_" + species_names[ispec]);
  }
}

where “num_species” and “species_name” can be local variables initialized in the constructor. We should also initialize some local arrays to hold temporary data.

// In SpeciesKineticEnergy.h
private:
  int num_species;
  std::vector<std::string> species_names;
  std::vector<RealType> species_kinetic,vec_minus_over_2m;

// In SpeciesKineticEnergy.cpp
SpeciesKineticEnergy::SpeciesKineticEnergy(ParticleSet& P):tpset(P)
{
  SpeciesSet& tspecies(P.getSpeciesSet());
  int massind = tspecies.getAttribute("mass");

  num_species = tspecies.size();
  species_kinetic.resize(num_species);
  vec_minus_over_2m.resize(num_species);
  species_names.resize(num_species);
  for (int ispec=0; ispec<num_species; ispec++)
  {
    species_names[ispec] = tspecies.speciesName[ispec];
    vec_minus_over_2m[ispec] = -1./(2.*tspecies(massind,ispec));
  }
}

Next, we need to override the default behavior of setObservables() to transfer multiple values to the property list kept by the main particle set, which eventually goes into the scalar.dat file.

// In SpeciesKineticEnergy.cpp
void SpeciesKineticEnergy::setObservables(PropertySetType& plist)
{ // slots in plist must be allocated by addObservables() first
  copy(species_kinetic.begin(),species_kinetic.end(),plist.begin()+myIndex);
}

Finally, we need to change the behavior of evaluate() to fill the local vector “species_kinetic” with appropriate observable values.

SpeciesKineticEnergy::Return_t SpeciesKineticEnergy::evaluate(ParticleSet& P)
{
  std::fill(species_kinetic.begin(),species_kinetic.end(),0.0);

  for (int iat=0; iat<P.getTotalNum(); iat++)
  {
    int ispec = P.GroupID(iat);
    species_kinetic[ispec] += vec_minus_over_2m[ispec]*laplacian(P.G[iat],P.L[iat]);
  }

  Value = 0.0; // Value is no longer used
  return Value;
}

That’s it! The SpeciesKineticEnergy estimator no longer needs the “group” input and will automatically output the kinetic energy of every species in the target particle set in multiple columns. You should now be able to run with <estimator type="specieskinetic" name="skinetic"/> and check that the sum of all columns that start with “skinetic” is equal to the default “Kinetic” column.

HDF5 output

If we desire an observable that will output hundreds of scalars per simulation step (e.g., SkEstimator), then it is preferred to output to the stat.h5 file instead of the scalar.dat file for better organization. A large chunk of data to be registered in the stat.h5 file is called a “Collectable” in QMCPACK. In particular, if a OperatorBase object is initialized with UpdateMode.set(COLLECTABLE,1), then the “Collectables” object carried by the main particle set will be processed and written to the stat.h5 file, where “UpdateMode” is a bit set (i.e., a collection of flags) with the following enumeration:

// In OperatorBase.h
///enum for UpdateMode
enum {PRIMARY=0,
  OPTIMIZABLE=1,
  RATIOUPDATE=2,
  PHYSICAL=3,
  COLLECTABLE=4,
  NONLOCAL=5,
  VIRTUALMOVES=6
};

As a simple example, to put the two columns we produced in the previous section into the stat.h5 file, we will first need to declare that our observable uses “Collectables.”

// In constructor add:
hdf5_out = true;
UpdateMode.set(COLLECTABLE,1);

Then make some room in the “Collectables” object carried by the target particle set.

// In addObservables(PropertySetType& plist, BufferType& collectables) add:
if (hdf5_out)
{
  h5_index = collectables.size();
  std::vector<RealType> tmp(num_species);
  collectables.add(tmp.begin(),tmp.end());
}

Next, make some room in the stat.h5 file by overriding the registerCollectables() method.

// In SpeciesKineticEnergy.cpp
void SpeciesKineticEnergy::registerCollectables(std::vector<observable_helper*>& h5desc, hid_t gid) const
{
  if (hdf5_out)
  {
    std::vector<int> ndim(1,num_species);
    observable_helper* h5o=new observable_helper(myName);
    h5o->set_dimensions(ndim,h5_index);
    h5o->open(gid);
    h5desc.push_back(h5o);
  }
}

Finally, edit evaluate() to use the space in the “Collectables” object.

// In SpeciesKineticEnergy.cpp
SpeciesKineticEnergy::Return_t SpeciesKineticEnergy::evaluate(ParticleSet& P)
{
  RealType wgt = tWalker->Weight; // MUST explicitly use DMC weights in Collectables!
  std::fill(species_kinetic.begin(),species_kinetic.end(),0.0);

  for (int iat=0; iat<P.getTotalNum(); iat++)
  {
    int ispec = P.GroupID(iat);
    species_kinetic[ispec] += vec_minus_over_2m[ispec]*laplacian(P.G[iat],P.L[iat]);
    P.Collectables[h5_index + ispec] += vec_minus_over_2m[ispec]*laplacian(P.G[iat],P.L[iat])*wgt;
  }

  Value = 0.0; // Value is no longer used
  return Value;
}

There should now be a new entry in the stat.h5 file containing the same columns of data as the stat.h5 file. After this check, we should clean up the code by

  • making “hdf5_out” and input flag by editing the put() method and

  • disabling output to scalar.dat when the “hdf5_out” flag is on.

Estimator output

Estimator definition

For simplicity, consider a local property \(O(\bf{R})\), where \(\bf{R}\) is the collection of all particle coordinates. An estimator for \(O(\bf{R})\) is a weighted average over walkers:

(240)\[ \begin{aligned} E[O] = \left(\sum\limits_{i=1}^{N^{tot}_{walker}} w_i O(\bf{R}_i) \right) / \left( \sum \limits_{i=1}^{N^{tot}_{walker}} w_i \right). \end{aligned}\]

\(N^{tot}_{walker}\) is the total number of walkers collected in the entire simulation. Notice that \(N^{tot}_{walker}\) is typically far larger than the number of walkers held in memory at any given simulation step. \(w_i\) is the weight of walker \(i\).

In a VMC simulation, the weight of every walker is 1.0. Further, the number of walkers is constant at each step. Therefore, (240) simplifies to

(241)\[ \begin{aligned} E_{VMC}[O] = \frac{1}{N_{step}N_{walker}^{ensemble}} \sum_{s,e} O(\bf{R}_{s,e})\:.\end{aligned}\]

Each walker \(\bf{R}_{s,e}\) is labeled by step index s and ensemble index e.

In a DMC simulation, the weight of each walker is different and may change from step to step. Further, the ensemble size varies from step to step. Therefore, (240) simplifies to

(242)\[ \begin{aligned} E_{DMC}[O] = \frac{1}{N_{step}} \sum_{s} \left\{ \left(\sum_e w_{s,e} O(\bf{R}_{s,e}) \right) / \left( \sum \limits_{e} w_{s,e} \right) \right\}\:.\end{aligned}\]

We will refer to the average in the \(\{\}\) as ensemble average and to the remaining averages as block average. The process of calculating \(O(\mathbf{R})\) is evaluate.

Class relations

A large number of classes are involved in the estimator collection process. They often have misleading class or method names. Check out the document gotchas in the following list:

  1. EstimatorManager is an unused copy of EstimatorManagerBase. EstimatorManagerBase is the class used in the QMC drivers. (PR #371 explains this.)

  2. EstimatorManagerBase::Estimators is completely different from QMCDriver::Estimators, which is subtly different from OperatorBase::Estimators. The first is a list of pointers to ScalarEstimatorBase. The second is the master estimator (one per MPI group). The third is the slave estimator that exists one per OpenMP thread.

  3. QMCHamiltonian is NOT a parent class of OperatorBase. Instead, QMCHamiltonian owns two lists of OperatorBase named H and auxH.

  4. QMCDriver::H is NOT the same as QMCHamiltonian::H. The first is a pointer to a QMCHamiltonian. QMCHamiltonian::H is a list.

  5. EstimatorManager::stopBlock(std::vector) is completely different from EstimatorManager:: stopBlock(RealType), which is the same as stopBlock(RealType, true) but that is subtly different from stopBlock(RealType, false). The first three methods are intended to be called by the master estimator, which exists one per MPI group. The last method is intended to be called by the slave estimator, which exists one per OpenMP thread.

Estimator output stages

Estimators take four conceptual stages to propagate to the output files: evaluate, load ensemble, unload ensemble, and collect. They are easier to understand in reverse order.

Collect stage

File output is performed by the master EstimatorManager owned by QMCDriver. The first 8+ entries in EstimatorManagerBase::AverageCache will be written to scalar.dat. The remaining entries in AverageCache will be written to stat.h5. File writing is triggered by EstimatorManagerBase ::collectBlockAverages inside EstimatorManagerBase::stopBlock.

// In EstimatorManagerBase.cpp::collectBlockAverages
  if(Archive)
  {
    *Archive << std::setw(10) << RecordCount;
    int maxobjs=std::min(BlockAverages.size(),max4ascii);
    for(int j=0; j<maxobjs; j++)
      *Archive << std::setw(FieldWidth) << AverageCache[j];
    for(int j=0; j<PropertyCache.size(); j++)
      *Archive << std::setw(FieldWidth) << PropertyCache[j];
    *Archive << std::endl;
    for(int o=0; o<h5desc.size(); ++o)
      h5desc[o]->write(AverageCache.data(),SquaredAverageCache.data());
    H5Fflush(h_file,H5F_SCOPE_LOCAL);
  }

EstimatorManagerBase::collectBlockAverages is triggered from the master-thread estimator via either stopBlock(std::vector) or stopBlock(RealType, true). Notice that file writing is NOT triggered by the slave-thread estimator method stopBlock(RealType, false).

// In EstimatorManagerBase.cpp
void EstimatorManagerBase::stopBlock(RealType accept, bool collectall)
{
  //take block averages and update properties per block
  PropertyCache[weightInd]=BlockWeight;
  PropertyCache[cpuInd] = MyTimer.elapsed();
  PropertyCache[acceptInd] = accept;
  for(int i=0; i<Estimators.size(); i++)
    Estimators[i]->takeBlockAverage(AverageCache.begin(),SquaredAverageCache.begin());
  if(Collectables)
  {
    Collectables->takeBlockAverage(AverageCache.begin(),SquaredAverageCache.begin());
  }
  if(collectall)
    collectBlockAverages(1);
}
// In ScalarEstimatorBase.h
template<typename IT>
inline void takeBlockAverage(IT first, IT first_sq)
{
  first += FirstIndex;
  first_sq += FirstIndex;
  for(int i=0; i<scalars.size(); i++)
  {
    *first++ = scalars[i].mean();
    *first_sq++ = scalars[i].mean2();
    scalars_saved[i]=scalars[i]; //save current block
    scalars[i].clear();
  }
}

At the collect stage, calarEstimatorBase::scalars must be populated with ensemble-averaged data. Two derived classes of ScalarEstimatorBase are crucial: LocalEnergyEstimator will carry Properties, where as CollectablesEstimator will carry Collectables.

Unload ensemble stage

LocalEnergyEstimator::scalars are populated by ScalarEstimatorBase::accumulate, whereas CollectablesEstimator::scalars are populated by CollectablesEstimator:: accumulate_all. Both accumulate methods are triggered by EstimatorManagerBase::accumulate. One confusing aspect about the unload stage is that EstimatorManagerBase::accumulate has a master and a slave call signature. A slave estimator such as QMCUpdateBase::Estimators should unload a subset of walkers. Thus, the slave estimator should call accumulate(W,it,it_end). However, the master estimator, such as SimpleFixedNodeBranch::myEstimator, should unload data from the entire walker ensemble. This is achieved by calling accumulate(W).

void EstimatorManagerBase::accumulate(MCWalkerConfiguration& W)
{ // intended to be called by master estimator only
  BlockWeight += W.getActiveWalkers();
  RealType norm=1.0/W.getGlobalNumWalkers();
  for(int i=0; i< Estimators.size(); i++)
    Estimators[i]->accumulate(W,W.begin(),W.end(),norm);
  if(Collectables)//collectables are normalized by QMC drivers
    Collectables->accumulate_all(W.Collectables,1.0);
}
void EstimatorManagerBase::accumulate(MCWalkerConfiguration& W
 , MCWalkerConfiguration::iterator it
 , MCWalkerConfiguration::iterator it_end)
{ // intended to be called slaveEstimator only
  BlockWeight += it_end-it;
  RealType norm=1.0/W.getGlobalNumWalkers();
  for(int i=0; i< Estimators.size(); i++)
    Estimators[i]->accumulate(W,it,it_end,norm);
  if(Collectables)
    Collectables->accumulate_all(W.Collectables,1.0);
}
// In LocalEnergyEstimator.h
inline void accumulate(const Walker_t& awalker, RealType wgt)
{ // ensemble average W.Properties
  // expect ePtr to be W.Properties; expect wgt = 1/GlobalNumberOfWalkers
  const RealType* restrict ePtr = awalker.getPropertyBase();
  RealType wwght= wgt* awalker.Weight;
  scalars[0](ePtr[WP::LOCALENERGY],wwght);
  scalars[1](ePtr[WP::LOCALENERGY]*ePtr[WP::LOCALENERGY],wwght);
  scalars[2](ePtr[LOCALPOTENTIAL],wwght);
  for(int target=3, source=FirstHamiltonian; target<scalars.size(); ++target, ++source)
    scalars[target](ePtr[source],wwght);
}
// In CollectablesEstimator.h
inline void accumulate_all(const MCWalkerConfiguration::Buffer_t& data, RealType wgt)
{ // ensemble average W.Collectables
  // expect data to be W.Collectables; expect wgt = 1.0
  for(int i=0; i<data.size(); ++i)
    scalars[i](data[i], wgt);
}

At the unload ensemble stage, the data structures Properties and Collectables must be populated by appropriately normalized values so that the ensemble average can be correctly taken. QMCDriver is responsible for the correct loading of data onto the walker ensemble.

Load ensemble stage

Properties in the MC ensemble of walkers QMCDriver::W is populated by QMCHamiltonian
::saveProperties. The master QMCHamiltonian::LocalEnergy, ::KineticEnergy, and ::Observables must be properly populated at the end of the evaluate stage.
// In QMCHamiltonian.h
  template<class IT>
  inline
  void saveProperty(IT first)
  { // expect first to be W.Properties
    first[LOCALPOTENTIAL]= LocalEnergy-KineticEnergy;
    copy(Observables.begin(),Observables.end(),first+myIndex);
  }

Collectables’s load stage is combined with its evaluate stage.

Evaluate stage

The master QMCHamiltonian::Observables is populated by slave OperatorBase ::setObservables. However, the call signature must be OperatorBase::setObservables (QMCHamiltonian::
Observables). This call signature is enforced by QMCHamiltonian::evaluate and QMCHamiltonian::
auxHevaluate.
// In QMCHamiltonian.cpp
QMCHamiltonian::Return_t
QMCHamiltonian::evaluate(ParticleSet& P)
{
  LocalEnergy = 0.0;
  for(int i=0; i<H.size(); ++i)
  {
    myTimers[i]->start();
    LocalEnergy += H[i]->evaluate(P);
    H[i]->setObservables(Observables);
#if !defined(REMOVE_TRACEMANAGER)
    H[i]->collect_scalar_traces();
#endif
    myTimers[i]->stop();
    H[i]->setParticlePropertyList(P.PropertyList,myIndex);
  }
  KineticEnergy=H[0]->Value;
  P.PropertyList[WP::LOCALENERGY]=LocalEnergy;
  P.PropertyList[LOCALPOTENTIAL]=LocalEnergy-KineticEnergy;
  // auxHevaluate(P);
  return LocalEnergy;
}
// In QMCHamiltonian.cpp
void QMCHamiltonian::auxHevaluate(ParticleSet& P, Walker_t& ThisWalker)
{
#if !defined(REMOVE_TRACEMANAGER)
  collect_walker_traces(ThisWalker,P.current_step);
#endif
  for(int i=0; i<auxH.size(); ++i)
  {
    auxH[i]->setHistories(ThisWalker);
    RealType sink = auxH[i]->evaluate(P);
    auxH[i]->setObservables(Observables);
#if !defined(REMOVE_TRACEMANAGER)
    auxH[i]->collect_scalar_traces();
#endif
    auxH[i]->setParticlePropertyList(P.PropertyList,myIndex);
  }
}

Estimator use cases

VMCSingleOMP pseudo code

bool VMCSingleOMP::run()
{
  masterEstimator->start(nBlocks);
  for (int ip=0; ip<NumThreads; ++ip)
    Movers[ip]->startRun(nBlocks,false);  // slaveEstimator->start(blocks, record)

  do // block
  {
    #pragma omp parallel
    {
      Movers[ip]->startBlock(nSteps);  // slaveEstimator->startBlock(steps)
      RealType cnorm = 1.0/static_cast<RealType>(wPerNode[ip+1]-wPerNode[ip]);
      do // step
      {
        wClones[ip]->resetCollectables();
        Movers[ip]->advanceWalkers(wit, wit_end, recompute);
        wClones[ip]->Collectables *= cnorm;
        Movers[ip]->accumulate(wit, wit_end);
      } // end step
      Movers[ip]->stopBlock(false);  // slaveEstimator->stopBlock(acc, false)
    } // end omp
    masterEstimator->stopBlock(estimatorClones);  // write files
  } // end block
  masterEstimator->stop(estimatorClones);
}

DMCOMP pseudo code

bool DMCOMP::run()
{
  masterEstimator->setCollectionMode(true);

  masterEstimator->start(nBlocks);
  for(int ip=0; ip<NumThreads; ip++)
    Movers[ip]->startRun(nBlocks,false);  // slaveEstimator->start(blocks, record)

  do // block
  {
    masterEstimator->startBlock(nSteps);
    for(int ip=0; ip<NumThreads; ip++)
      Movers[ip]->startBlock(nSteps);  // slaveEstimator->startBlock(steps)

    do // step
    {
      #pragma omp parallel
      {
      wClones[ip]->resetCollectables();
      // advanceWalkers
      } // end omp

      //branchEngine->branch
      { // In WalkerControlMPI.cpp::branch
      wgt_inv=WalkerController->NumContexts/WalkerController->EnsembleProperty.Weight;
      walkers.Collectables *= wgt_inv;
      slaveEstimator->accumulate(walkers);
      }
      masterEstimator->stopBlock(acc)  // write files
    }  // end for step
  }  // end for block

  masterEstimator->stop();
}

Summary

Two ensemble-level data structures, ParticleSet::Properties and ::Collectables, serve as intermediaries between evaluate classes and output classes to scalar.dat and stat.h5. Properties appears in both scalar.dat and stat.h5, whereas Collectables appears only in stat.h5. Properties is overwritten by QMCHamiltonian::Observables at the end of each step. QMCHamiltonian::Observables is filled upon call to QMCHamiltonian::evaluate and ::auxHevaluate. Collectables is zeroed at the beginning of each step and accumulated upon call to ::auxHevaluate.

Data are output to scalar.dat in four stages: evaluate, load, unload, and collect. In the evaluate stage, QMCHamiltonian::Observables is populated by a list of OperatorBase. In the load stage, QMCHamiltonian::Observables is transfered to Properties by QMCDriver. In the unload stage, Properties is copied to LocalEnergyEstimator::scalars. In the collect stage, LocalEnergyEstimator::scalars is block-averaged to EstimatorManagerBase
::AverageCache and dumped to file. For Collectables, the evaluate and load stages are combined in a call to QMCHamiltonian::auxHevaluate. In the unload stage, Collectables is copied to CollectablesEstimator::scalars. In the collect stage, CollectablesEstimator
::scalars is block-averaged to EstimatorManagerBase::AverageCache and dumped to file.

Appendix: dmc.dat

There is an additional data structure, ParticleSet::EnsembleProperty, that is managed by WalkerControlBase::EnsembleProperty and directly dumped to dmc.dat via its own averaging procedure. dmc.dat is written by WalkerControlBase::measureProperties, which is called by WalkerControlBase::branch, which is called by SimpleFixedNodeBranch
::branch, for example.

Slater-backflow wavefunction implementation details

For simplicity, consider \(N\) identical fermions of the same spin (e.g., up electrons) at spatial locations \(\{\mathbf{r}_1,\mathbf{r}_2,\dots,\mathbf{r}_{N}\}\). Then the Slater determinant can be written as

(243)\[ S=\det M\:,\]

where each entry in the determinant is an SPO evaluated at a particle position

(244)\[ \begin{aligned} M_{ij} = \phi_i(\mathbf{r}_j)\:.\end{aligned}\]

When backflow transformation is applied to the determinant, the particle coordinates \(\mathbf{r}_i\) that go into the SPOs are replaced by quasi-particle coordinates \(\mathbf{x}_i\):

(245)\[ \begin{aligned} M_{ij} = \phi_i(\mathbf{x}_j)\:, \end{aligned}\]

where

(246)\[ \begin{aligned} \mathbf{x}_i=\mathbf{r}_i+\sum\limits_{j=1,j\neq i}^N\eta(r_{ij})(\mathbf{r}_i-\mathbf{r}_j)\:. \end{aligned}\]

\(r_{ij}=\vert\mathbf{r}_i-\mathbf{r}_j\vert\). The integers i,j label the particle/quasi-particle. There is a one-to-one correspondence between the particles and the quasi-particles, which is simplest when \(\eta=0\).

Value

The evaluation of the Slater-backflow wavefunction is almost identical to that of a Slater wavefunction. The only difference is that the quasi-particle coordinates are used to evaluate the SPOs. The actual value of the determinant is stored during the inversion of the matrix \(M\) (cgetrf\(\rightarrow\)cgetri). Suppose \(M=LU\), then \(S=\prod\limits_{i=1}^N L_{ii} U_{ii}\).

// In DiracDeterminantWithBackflow::evaluateLog(P,G,L)
Phi->evaluate(BFTrans->QP, FirstIndex, LastIndex, psiM,dpsiM,grad_grad_psiM);
psiMinv = psiM;
LogValue=InvertWithLog(psiMinv.data(),NumPtcls,NumOrbitals
  ,WorkSpace.data(),Pivot.data(),PhaseValue);

QMCPACK represents the complex value of the wavefunction in polar coordinates \(S=e^Ue^{i\theta}\). Specifically, LogValue \(U\) and PhaseValue \(\theta\) are handled separately. In the following, we will consider derivatives of the log value only.

Gradient

To evaluate particle gradient of the log value of the Slater-backflow wavefunction, we can use the \(\log\det\) identity in (247). This identity maps the derivative of \(\log\det M\) with respect to a real variable \(p\) to a trace over \(M^{-1}dM\):

(247)\[ \begin{aligned} \frac{\partial}{\partial p}\log\det M = \text{tr}\left( M^{-1} \frac{\partial M}{\partial p} \right) .\end{aligned}\]

Following Kwon, Ceperley, and Martin [KCM93], the particle gradient

(248)\[ \begin{aligned} G_i^\alpha \equiv \frac{\partial}{\partial r_i^\alpha} \log\det M = \sum\limits_{j=1}^N \sum\limits_{\beta=1}^3 F_{jj}^\beta A_{jj}^{\alpha\beta}\:, \end{aligned}\]

where the quasi-particle gradient matrix

(249)\[ \begin{aligned} A_{ij}^{\alpha\beta} \equiv \frac{\partial x_j^\beta}{\partial r_i^\alpha}\:,\end{aligned}\]

and the intermediate matrix

(250)\[ \begin{aligned} F_{ij}^\alpha\equiv\sum\limits_k M^{-1}_{ik} dM_{kj}^\alpha\:,\end{aligned}\]

with the SPO derivatives (w.r. to quasi-particle coordinates)

(251)\[ \begin{aligned} dM_{ij}^\alpha \equiv \frac{\partial M_{ij}}{\partial x_j^\alpha}\:.\end{aligned}\]

Notice that we have made the name change of \(\phi\rightarrow M\) from the notations of ref. [KCM93]. This name change is intended to help the reader associate M with the QMCPACK variable psiM.

// In DiracDeterminantWithBackflow::evaluateLog(P,G,L)
for(int i=0; i<num; i++) // k in above formula
{
  for(int j=0; j<NumPtcls; j++)
  {
    for(int k=0; k<OHMMS_DIM; k++) // alpha in above formula
    {
      myG(i) += dot(BFTrans->Amat(i,FirstIndex+j),Fmat(j,j));
    }
  }
}

(248) is still relatively simple to understand. The \(A\) matrix maps changes in particle coordinates \(d\mathbf{r}\) to changes in quasi-particle coordinates \(d\mathbf{x}\). Dotting A into F propagates \(d\mathbf{x}\) to \(dM\). Thus \(F\cdot A\) is the term inside the trace operator of (247). Finally, performing the trace completes the evaluation of the derivative.

Laplacian

The particle Laplacian is given in [KCM93] as

(252)\[ \begin{aligned} L_i \equiv \sum\limits_{\beta} \frac{\partial^2}{\partial (r_i^\beta)^2} \log\det M = \sum\limits_{j\alpha} B_{ij}^\alpha F_{jj}^\alpha - \sum\limits_{jk}\sum\limits_{\alpha\beta\gamma} A_{ij}^{\alpha\beta}A_{ik}^{\alpha\gamma}\times\left(F_{kj}^\alpha F_{jk}^\gamma -\delta_{jk}\sum\limits_m M^{-1}_{jm} d2M_{mj}^{\beta\gamma}\right), \end{aligned}\]

where the quasi-particle Laplacian matrix

(253)\[ \begin{aligned} B_{ij}^{\alpha} \equiv \sum\limits_\beta \frac{\partial^2 x_j^\alpha}{\partial (r_i^\beta)^2}\:,\end{aligned}\]

with the second derivatives of the single-particles orbitals being

(254)\[ \begin{aligned} d2M_{ij}^{\alpha\beta} \equiv \frac{\partial^2 M_{ij}}{\partial x_j^\alpha\partial x_j^\beta}\:.\end{aligned}\]

Schematically, \(L_i\) has contributions from three terms of the form \(BF, AAFF, and tr(AA,Md2M)\), respectively. \(A, B, M ,d2M,\) and \(F\) can be calculated and stored before the calculations of \(L_i\). The first \(BF\) term can be directly calculated in a loop over quasi-particle coordinates \(j\alpha\).

// In DiracDeterminantWithBackflow::evaluateLog(P,G,L)
for(int j=0; j<NumPtcls; j++)
  for(int a=0; a<OHMMS_DIM; k++)
    myL(i) += BFTrans->Bmat_full(i,FirstIndex+j)[a]*Fmat(j,j)[a];

Notice that \(B_{ij}^\alpha\) is stored in Bmat_full, NOT Bmat.

The remaining two terms both involve \(AA\). Thus, it is best to define a temporary tensor \(AA\):

(255)\[ \begin{aligned} {}_iAA_{jk}^{\beta\gamma} \equiv \sum\limits_\alpha A_{ij}^{\alpha\beta} A_{ij}^{\alpha\gamma}\:,\end{aligned}\]

which we will overwrite for each particle \(i\). Similarly, define \(FF\):

(256)\[ \begin{aligned} FF_{jk}^{\alpha\gamma} \equiv F_{kj}^\alpha F_{jk}^\gamma\:,\end{aligned}\]

which is simply the outer product of \(F\otimes F\). Then the \(AAFF\) term can be calculated by fully contracting \(AA\) with \(FF\).

// In DiracDeterminantWithBackflow::evaluateLog(P,G,L)
for(int j=0; j<NumPtcls; j++)
  for(int k=0; k<NumPtcls; k++)
    for(int i=0; i<num; i++)
    {
      Tensor<RealType,OHMMS_DIM> AA = dot(transpose(BFTrans->Amat(i,FirstIndex+j)),BFTrans->Amat(i,FirstIndex+k));
      HessType FF = outerProduct(Fmat(k,j),Fmat(j,k));
      myL(i) -= traceAtB(AA,FF);
    }

Finally, define the SPO derivative term:

(257)\[ \begin{aligned} Md2M_j^{\beta\gamma} \equiv \sum\limits_m M^{-1}_{jm} d2M_{mj}^\beta\:,\end{aligned}\]

then the last term is given by the contraction of \(Md2M\) (q_j) with the diagonal of \(AA\).

for(int j=0; j<NumPtcls; j++)
{
  HessType q_j;
  q_j=0.0;
  for(int k=0; k<NumPtcls; k++)
    q_j += psiMinv(j,k)*grad_grad_psiM(j,k);
  for(int i=0; i<num; i++)
  {
    Tensor<RealType,OHMMS_DIM> AA = dot(
      transpose(BFTrans->Amat(i,FirstIndex+j)),
      BFTrans->Amat(i,FirstIndex+j)
    );
    myL(i) += traceAtB(AA,q_j);
  }
}

Wavefunction parameter derivative

To use the robust linear optimization method of [TU07], the trial wavefunction needs to know its contributions to the overlap and hamiltonian matrices. In particular, we need derivatives of these matrices with respect to wavefunction parameters. As a consequence, the wavefunction \(\psi\) needs to be able to evaluate \(\frac{\partial}{\partial p} \ln \psi\) and \(\frac{\partial}{\partial p} \frac{\mathcal{H}\psi}{\psi}\), where \(p\) is a parameter.

When 2-body backflow is considered, a wavefunction parameter \(p\) enters the \(\eta\) function only (equation (246)). \(\mathbf{r}\), \(\phi\), and \(M\) do not explicitly dependent on \(p\). Derivative of the log value is almost identical to particle gradient. Namely, (248) applies upon the substitution \(r_i^\alpha\rightarrow p\).

(258)\[ \begin{aligned} \frac{\partial}{\partial p} \ln\det M = \sum\limits_{j=1}^N \sum\limits_{\beta=1}^3 F_{jj}^\beta \left({}_pC_{j}^{\beta}\right)\:,\end{aligned}\]

where the quasi-particle derivatives are stored in Cmat

(259)\[ \begin{aligned} {}_pC_{i}^{\alpha} \equiv \frac{\partial}{\partial p} x_{i}^{\alpha}\:.\end{aligned}\]

The change in local kinetic energy is a lot more difficult to calculate

(260)\[ \begin{aligned} \frac{\partial T_{\text{local}}}{\partial p} = \frac{\partial}{\partial p} \left\{ \left( \sum\limits_{i=1}^N \frac{1}{2m_i} \nabla^2_i \right) \ln \det M \right\} = \sum\limits_{i=1}^N \frac{1}{2m_i} \frac{\partial}{\partial p} L_i\:, \end{aligned}\]

where \(L_i\) is the particle Laplacian defined in (252) To evaluate (260), we need to calculate parameter derivatives of all three terms defined in the Laplacian evaluation. Namely \((B)(F)\), \((AA)(FF)\), and \(\text{tr}(AA,Md2M)\), where we have put parentheses around previously identified data structures. After \(\frac{\partial}{\partial p}\) hits, each of the three terms will split into two terms by the product rule. Each smaller term will contain a contraction of two data structures. Therefore, we will need to calculate the parameter derivatives of each data structure defined in the Laplacian evaluation:

(261)\[\begin{split} \begin{aligned} {}_pX_{ij}^{\alpha\beta} \equiv \frac{\partial}{\partial p} A_{ij}^{\alpha\beta}\:, \\ {}_pY_{ij}^{\alpha} \equiv \frac{\partial}{\partial p} B_{ij}^{\alpha}\:, \\ {}_pdF_{ij}^{\alpha} \equiv \frac{\partial}{\partial p} F_{ij}^{\alpha}\:, \\ {}_{pi}{AA'}_{jk}^{\beta\gamma} \equiv \frac{\partial}{\partial p} {}_iAA_{jk}^{\beta\gamma}\:, \\ {}_p {FF'}_{jk}^{\alpha\gamma} \equiv \frac{\partial}{\partial p} FF_{jk}^{\alpha\gamma}\:, \\ {}_p {Md2M'}_{j}^{\beta\gamma} \equiv \frac{\partial}{\partial p} Md2M_j^{\beta\gamma}\:.\end{aligned}\end{split}\]

X and Y are stored as Xmat and Ymat_full (NOT Ymat) in the code. dF is dFa. \(AA'\) is not fully stored; intermediate values are stored in Aij_sum and a_j_sum. \(FF'\) is calculated on the fly as \(dF\otimes F+F\otimes dF\). \(Md2M'\) is not stored; intermediate values are stored in q_j_prime.

Particles and distance tables

ParticleSets

The ParticleSet class stores particle positions and attributes (charge, mass, etc).

The R member stores positions. For calculations, the R variable needs to be transferred to the structure-of-arrays (SoA) storage in RSoA. This is done by the update method. In the future the interface may change to use functions to set and retrieve positions so the SoA transformation of the particle data can happen automatically.

A particular distance table is retrieved with getDistTable. Use addTable to add a ParticleSet and return the index of the distance table. If the table already exists the index of the existing table will be returned.

The mass and charge of each particle is stored in Mass and Z. The flag, SameMass, indicates if all the particles have the same mass (true for electrons).

Groups

Particles can belong to different groups. For electrons, the groups are up and down spins. For ions, the groups are the atomic element. The group type for each particle can be accessed through the GroupID member. The number of groups is returned from groups(). The total number particles is accessed with getTotalNum(). The number of particles in a group is groupsize(int igroup).

The particle indices for each group are found with first(int igroup) and last(int igroup). These functions only work correctly if the particles are packed according to group. The flag, IsGrouped, indicates if the particles are grouped or not. The particles will not be grouped if the elements are not grouped together in the input file. This ordering is usually the responsibility of the converters.

Code can be written to only handle the grouped case, but put an assert or failure check if the particles are not grouped. Otherwise the code will give wrong answers and it can be time-consuming to debug.

Distance tables

Distance tables store distances between particles. There are symmetric (AA) tables for distance between like particles (electron-electron or ion-ion) and asymmetric (BA) tables for distance between unlike particles (electron-ion)

The Distances and Displacements members contain the data. The indexing order is target index first, then source. For electron-ion tables, the sources are the ions and the targets are the electrons.

Looping over particles

Some sample code on how to loop over all the particles in an electron-ion distance table:

// d_table is an electron-ion distance table

for (int jat = 0; j < d_table.targets(); jat++) { // Loop over electrons
  for (int iat = 0; i < d_table.sources(); iat++) { // Loop over ions
     d_table.Distances[jat][iat];
  }
}

Interactions sometimes depend on the type of group of the particles. The code can loop over all particles and use GroupID[idx] to choose the interaction. Alternately, the code can loop over the number of groups and then loop from the first to last index for those groups. This method can attain higher performance by effectively hoisting tests for group ID out of the loop.

An example of the first approach is

// P is a ParticleSet

for (int iat = 0; iat < P.getTotalNum(); iat++) {
  int group_idx = P.GroupID[iat];
  // Code that depends on the group index
}

An example of the second approach is

// P is a ParticleSet
assert(P.IsGrouped == true); // ensure particles are grouped

for (int ig = 0; ig < P.groups(); ig++) { // loop over groups
  for (int iat = P.first(ig); iat < P.last(ig); iat++) { // loop over elements in each group
     // Code that depends on group
  }
}

Adding a wavefunction

The total wavefunction is stored in TrialWaveFunction as a product of all the components. Each component derives from WaveFunctionComponent. The code contains an example of a wavefunction component for a Helium atom using a simple form and is described in Helium Wavefunction Example

Mathematical preliminaries

The wavefunction evaluation functions compute the log of the wavefunction, the gradient and the Laplacian of the log of the wavefunction. Expanded, the gradient and Laplacian are

(262)\[\begin{split} \begin{aligned} G &=& \nabla \log(\psi) = \frac{\nabla \psi}{\psi} \\ L &=& {\nabla ^2} \log(\psi) = \frac{{\nabla ^2}\psi}{\psi} - \frac{\nabla \psi}{\psi} \cdot \frac{\nabla \psi}{\psi} \\ &=& \frac{{\nabla ^2} \psi}{\psi} - G \cdot G\end{aligned}\end{split}\]

However, the local energy formula needs \(\frac{{\nabla ^2} \psi}{\psi}\). The conversion from the Laplacian of the log of the wavefunction to the local energy value is performed in QMCHamiltonians/BareKineticEnergy.h (i.e. \(L + G \cdot G\).)

Wavefunction evaluation

The process for creating a new wavefunction component class is to derive from WaveFunctionComponent and implement a number pure virtual functions. To start most of them can be empty.

The following four functions evaluate the wavefunction values and spatial derivatives:

evaluateLog Computes the log of the wavefunction and the gradient and Laplacian (of the log of the wavefunction) for all particles. The input is theParticleSet(P) (of the electrons). The return value is the log of wavefunction, and the gradient is in G and Laplacian in L.

ratio Computes the wavefunction ratio (not the log) for a single particle move (\(\psi_{new}/\psi_{old}\)). The inputs are the ParticleSet(P) and the particle index (iat).

evalGrad Computes the gradient for a given particle. The inputs are the ParticleSet(P) and the particle index (iat).

ratioGrad Computes the wavefunction ratio and the gradient at the new position for a single particle move. The inputs are the ParticleSet(P) and the particle index (iat). The output gradient is in grad_iat;

The updateBuffer function needs to be implemented, but to start it can simply call evaluateLog.

The put function should be implemented to read parameter specifics from the input XML file.

Function use

For debugging it can be helpful to know the under what conditions the various routines are called.

The VMC and DMC loops initialize the walkers by calling evaluateLog. For all-electron moves, each timestep advance calls evaluateLog. If the use_drift parameter is no, then only the wavefunction value is used for sampling. The gradient and Laplacian are used for computing the local energy.

For particle-by-particle moves, each timestep advance

  1. calls evalGrad

  2. computes a trial move

  3. calls ratioGrad for the wavefunction ratio and the gradient at the trial position. (If the use_drift parameter is no, the ratio function is called instead.)

The following example shows part of an input block for VMC with all-electron moves and drift.

<qmc method="vmc" target="e" move="alle">
  <parameter name="use_drift">yes</parameter>
</qmc>

Particle distances

The ParticleSet parameter in these functions refers to the electrons. The distance tables that store the inter-particle distances are stored as an array.

To get the electron-ion distances, add the ion ParticleSet using addTable and save the returned index. Use that index to get the ion-electron distance table.

const int ei_id = elecs.addTable(ions, DT_SOA); // in the constructor only
const auto& ei_table = elecs.getDistTable(ei_id); // when consuming a distance table

Getting the electron-electron distances is very similar, just add the electron ParticleSet using addTable.

Only the lower triangle for the electron-electron table should be used. It is the only part of the distance table valid throughout the run. During particle-by-particle move, there are extra restrictions. When a move of electron iel is proposed, only the lower triangle parts [0,iel)[0,iel) [iel, Nelec)[iel, Nelec) and the row [iel][0:Nelec) are valid. In fact, the current implementation of distance based two and three body Jastrow factors in QMCPACK only needs the row [iel][0:Nelec).

In ratioGrad, the new distances are stored in the Temp_r and Temp_dr members of the distance tables.

Setup

A builder processes XML input, creates the wavefunction, and adds it to targetPsi. Builders derive from WaveFunctionComponentBuilder.

The new builder hooks into the XML processing in WaveFunctionFactory.cpp in the build function.

Caching values

The acceptMove and restore methods are called on accepted and rejected moves for the component to update cached values.

Threading

The makeClone function needs to be implemented to work correctly with OpenMP threading. There will be one copy of the component created for each thread. If there is no extra storage, calling the copy constructor will be sufficient. If there are cached values, the clone call may need to create space.

Parameter optimization

The checkInVariables, checkOutVariables, and resetParameters functions manage the variational parameters. Optimizable variables also need to be registered when the XML is processed.

Variational parameter derivatives are computed in the evaluateDerivatives function. The first output value is an array with parameter derivatives of log of the wavefunction. The second output values is an array with parameter derivatives of the Laplacian divided by the wavefunction (and not the Laplacian of the log of the wavefunction) The kinetic energy term contains a \(-1/2m\) factor. The \(1/m\) factor is applied in TrialWaveFunction.cpp, but the \(-1/2\) is not and must be accounted for in this function.

Helium Wavefunction Example

The code contains an example of a wavefunction component for a Helium atom using STO orbitals and a Pade Jastrow.

to The wavefunction is

(263)\[\psi = \frac{1}{\sqrt{\pi}} \exp(-Z r_1) \exp(-Z r_2) \exp(A / (1 + B r_{12}))\]

where \(Z = 2\) is the nuclear charge, \(A=1/2\) is the electron-electron cusp, and \(B\) is a variational parameter. The electron-ion distances are \(r_1\) and \(r_2\), and \(r_{12}\) is the electron-electron distance. The wavefunction is the same as the one expressed with built-in components in examples/molecules/He/he_simple_opt.xml.

The code is in src/QMCWaveFunctions/ExampleHeComponent.cpp. The builder is in src/QMCWaveFunctions/ExampleHeBuilder.cpp. The input file is in examples/molecules/He/he_example_wf.xml. A unit test compares results from the wavefunction evaluation functions for consistency in src/QMCWaveFunctions/tests/test_example_he.cpp.

The recommended approach for creating a new wavefunction component is to copy the example and the unit test. Implement the evaluation functions and ensure the unit test passes.

Linear Algebra

Like in many methods which solve the Schrödinger equation, linear algebra plays a critical role in QMC algorithms and thus is crucial to the performance of QMCPACK. There are a few components in QMCPACK use BLAS/LAPACK with their own characteristics.

Real space QMC

Single particle orbitals

Spline evaluation as commonly used in solid-state simulations does not use any dense linear algebra library calls. LCAO evaluation as commonly used in molecular calculations relies on BLAS2 GEMV to compute SPOs from a basis set.

Slater determinants

Slater determinants are calculated on \(N \times N\) Slater matrices. \(N\) is the number of electrons for a given spin. In the actually implementation, operations on the inverse matrix of Slater matrix for each walker dominate the computation. To initialize it, DGETRF and DGETRI from LAPACK are called. The inverse matrix can be stored out of place. During random walking, inverse matrices are updated by either Sherman-Morrison rank-1 update or delayed update. Update algorithms heavily relies on BLAS. All the BLAS operations require S,C,D,Z cases.

Sherman-Morrison rank-1 update uses BLAS2 GEMV and GER on \(N \times N\) matrices.

Delayed rank-K update uses

  • BLAS1 SCOPY on \(N\) array.

  • BLAS2 GEMV, GER on \(k \times N\) and \(k \times k\) matrices. \(k\) ranges from 1 to \(K\) when updates are delayed and accumulated.

  • BLAS3 GEMM at the final update.

    • ’T’, ’N’, K, N, N

    • ’N’, ’N’, N, K, K

    • ’N’, ’N’, N, N, K

The optimal K depends on the hardware but it usually ranges from 32 to 256.

QMCPACK solves systems with a few to thousands of electrons. To make all the BLAS/LAPACK operation efficient on accelerators. Batching is needed and optimized for \(N < 2000\). Non-batched functions needs to be optimized for \(N > 500\). Note: 2000 and 500 are only rough estimates.

Wavefunction optimizer

to be added.

Auxiliary field QMC

The AFQMC implementation in QMCPACK relies heavily on linear algebra operations from BLAS/LAPACK. The performance of the code is netirely dependent on the performance of these libraries. See below for a detailed list of the main routines used from BLAS/LAPACK. Since the AFQMC code can work with both single and double precision builds, all 4 versions of these routines (S,C,D,Z) are generally needed, for this reason we omit the data type label.

  • BLAS1: SCAL, COPY, DOT, AXPY

  • BLAS2: GEMV, GER

  • BLAS3: GEMM

  • LAPACK: GETRF, GETRI, GELQF, UNGLQ, ORGLQ, GESVD, HEEVR, HEGVX

While the dimensions of the matrix operations will depend entirely on the details of the calculation, typical matrix dimensions range from the 100s, for small system sizes, to over 20000 for the largest calculations attempted so far. For builds with GPU accelerators, we make use of batched and strided implementations of these routines. Batched implementations of GEMM, GETRF, GETRI, GELQF and UNGLQ are particularly important for the performance of the GPU build on small to medium size problems. Batched implementations of DOT, AXPY and GEMV would also be quite useful, but they are not yet generally available. On GPU builds, the code uses batched implementations of these routines when available by default.

KCM93(1,2,3)

Yongkyung Kwon, D. M. Ceperley, and Richard M. Martin. Effects of three-body and backflow correlations in the two-dimensional electron gas. Phys. Rev. B, 48:12037–12046, October 1993. doi:10.1103/PhysRevB.48.12037.

TU07

Julien Toulouse and C. J. Umrigar. Optimization of quantum monte carlo wave functions by energy minimization. The Journal of Chemical Physics, 126(8):084102, 2007. arXiv:http://dx.doi.org/10.1063/1.2437215, doi:10.1063/1.2437215.