Use DeePMD-kit

In this text, we will call the deep neural network that is used to represent the interatomic interactions (Deep Potential) the model. The typical procedure of using DeePMD-kit is

  1. Prepare data
  2. Train a model
  3. Freeze the model
  4. Test the model
  5. Inference with the model

Prepare data

One needs to provide the following information to train a model: the atom type, the simulation box, the atom coordinate, the atom force, system energy and virial. A snapshot of a system that contains these information is called a frame. We use the following convention of units:

Property| Unit — | :—: Time | ps Length | Å Energy | eV Force | eV/Å Pressure| Bar

The frames of the system are stored in two formats. A raw file is a plain text file with each information item written in one file and one frame written on one line. The default files that provide box, coordinate, force, energy and virial are box.raw, coord.raw, force.raw, energy.raw and virial.raw, respectively. We recommend you use these file names. Here is an example of force.raw:

$ cat force.raw
-0.724  2.039 -0.951  0.841 -0.464  0.363
 6.737  1.554 -5.587 -2.803  0.062  2.222
-1.968 -0.163  1.020 -0.225 -0.789  0.343

This force.raw contains 3 frames with each frame having the forces of 2 atoms, thus it has 3 lines and 6 columns. Each line provides all the 3 force components of 2 atoms in 1 frame. The first three numbers are the 3 force components of the first atom, while the second three numbers are the 3 force components of the second atom. The coordinate file coord.raw is organized similarly. In box.raw, the 9 components of the box vectors should be provided on each line. In virial.raw, the 9 components of the virial tensor should be provided on each line. The number of lines of all raw files should be identical.

We assume that the atom types do not change in all frames. It is provided by type.raw, which has one line with the types of atoms written one by one. The atom types should be integers. For example the type.raw of a system that has 2 atoms with 0 and 1:

$ cat type.raw
0 1

The second format is the data sets of numpy binary data that are directly used by the training program. User can use the script $deepmd_source_dir/data/raw/raw_to_set.sh to convert the prepared raw files to data sets. For example, if we have a raw file that contains 6000 frames,

$ ls 
box.raw  coord.raw  energy.raw  force.raw  type.raw  virial.raw
$ $deepmd_source_dir/data/raw/raw_to_set.sh 2000
nframe is 6000
nline per set is 2000
will make 3 sets
making set 0 ...
making set 1 ...
making set 2 ...
$ ls 
box.raw  coord.raw  energy.raw  force.raw  set.000  set.001  set.002  type.raw  virial.raw

It generates three sets set.000, set.001 and set.002, with each set contains 2000 frames. The last set (set.002) is used as testing set, while the rest sets (set.000 and set.001) are used as training sets. One do not need to take care of the binary data files in each of the set.* directories. The path containing set.* and type.raw is called a system.

Train a model

Write the input script

The method of training is explained in our [DeePMD][2] and [DeepPot-SE][3] papers. With the source code we provide a small training dataset taken from 400 frames generated by NVT ab-initio water MD trajectory with 300 frames for training and 100 for testing. An example training parameter file is provided. One can try with the training by

$ cd $deepmd_source_dir/examples/water/train/
$ dp train water_se_a.json

where water_se_a.json is the json format parameter file that controls the training. It is also possible to use yaml format file with the same keys as json (see water_se_a.yaml example). You can use script json2yaml.py in data/json/ dir to convert your json files to yaml. The components of the water.json contains four parts, model, learning_rate, loss and training.

The model section specify how the deep potential model is built. An example of the smooth-edition is provided as follows

    "model": {
	"type_map":	["O", "H"],
	"descriptor" :{
	    "type":		"se_a",
	    "rcut_smth":	5.80,
	    "rcut":		6.00,
	    "sel":		[46, 92],
	    "neuron":		[25, 50, 100],
	    "axis_neuron":	16,
	    "resnet_dt":	false,
	    "seed":		1,
	    "_comment":		" that's all"
	},
	"fitting_net" : {
	    "neuron":		[240, 240, 240],
	    "resnet_dt":	true,	    
	    "seed":		1,
	    "_comment":		" that's all"
	},
	"_comment":	" that's all"
    }

The type_map is optional, which provide the element names (but not restricted to) for corresponding atom types.

The construction of the descriptor is given by option descriptor. The type of the descriptor is set to "se_a", which means smooth-edition, angular infomation. The rcut is the cut-off radius for neighbor searching, and the rcut_smth gives where the smoothing starts. sel gives the maximum possible number of neighbors in the cut-off radius. It is a list, the length of which is the same as the number of atom types in the system, and sel[i] denote the maximum possible number of neighbors with type i. The neuron specifies the size of the embedding net. From left to right the members denote the sizes of each hidden layers from input end to the output end, respectively. The axis_neuron specifies the size of submatrix of the embedding matrix, the axis matrix as explained in the [DeepPot-SE paper][3]. If the outer layer is of twice size as the inner layer, then the inner layer is copied and concatenated, then a ResNet architecture is build between them. If the option resnet_dt is set true, then a timestep is used in the ResNet. seed gives the random seed that is used to generate random numbers when initializing the model parameters.

The construction of the fitting net is give by fitting_net. The key neuron specifies the size of the fitting net. If two neighboring layers are of the same size, then a ResNet architecture is build between them. If the option resnet_dt is set true, then a timestep is used in the ResNet. seed gives the random seed that is used to generate random numbers when initializing the model parameters.

An example of the learning_rate is given as follows

    "learning_rate" :{
	"type":		"exp",
	"start_lr":	0.005,
	"decay_steps":	5000,
	"decay_rate":	0.95,
	"_comment":	"that's all"
    }

The option start_lr, decay_rate and decay_steps specify how the learning rate changes. For example, the tth batch will be trained with learning rate:

lr(t) = start_lr * decay_rate ^ ( t / decay_steps )

An example of the loss is

    "loss" : {
	"start_pref_e":	0.02,
	"limit_pref_e":	1,
	"start_pref_f":	1000,
	"limit_pref_f":	1,
	"start_pref_v":	0,
	"limit_pref_v":	0,
	"_comment":	" that's all"
    }

The options start_pref_e, limit_pref_e, start_pref_f, limit_pref_f, start_pref_v and limit_pref_v determine how the prefactors of energy error, force error and virial error changes in the loss function (see the appendix of the [DeePMD paper][2] for details). Taking the prefactor of force error for example, the prefactor at batch t is

w_f(t) = start_pref_f * ( lr(t) / start_lr ) + limit_pref_f * ( 1 - lr(t) / start_lr )

Since we do not have virial data, the virial prefactors start_pref_v and limit_pref_v are set to 0.

An example of training is

    "training" : {
	"systems":	["../data1/", "../data2/"],
	"set_prefix":	"set",    
	"stop_batch":	1000000,
	"_comment": " batch_size can be supplied with, e.g. 1, or auto (string) or [10, 20]",
	"batch_size":	1,

	"seed":		1,

	"_comment": " display and restart",
	"_comment": " frequencies counted in batch",
	"disp_file":	"lcurve.out",
	"disp_freq":	100,
	"_comment": " numb_test can be supplied with, e.g. 1, or XX% (string) or [10, 20]",
	"numb_test":	10,
	"save_freq":	1000,
	"save_ckpt":	"model.ckpt",

	"disp_training":true,
	"time_training":true,
	"profiling":	false,
	"profiling_file":"timeline.json",
	"_comment":	"that's all"
    }

The option systems provide location of the systems (path to set.* and type.raw). It is a vector, thus DeePMD-kit allows you to provide multiple systems. DeePMD-kit will train the model with the systems in the vector one by one in a cyclic manner. It is warned that the example water data (in folder examples/data/water) is of very limited amount, is provided only for testing purpose, and should not be used to train a productive model.

The option batch_size specifies the number of frames in each batch. It can be set to "auto" to enable a automatic batch size or it can be input as a list setting batch size individually for each system. The option stop_batch specifies the total number of batches will be used in the training.

The option numb_test specifies the number of tests that will be used for each system. If it is an integer each system will be tested with the same number of tests. It can be set to percentage "XX%" to use XX% of frames of each system for its testing or it can be input as a list setting numer of tests individually for each system (the order should correspond to ordering of the systems key in json).

Training

The training can be invoked by

$ dp train water_se_a.json

During the training, the error of the model is tested every disp_freq batches with numb_test frames from the last set in the systems directory on the fly, and the results are output to disp_file. A typical disp_file looks like

# batch      l2_tst    l2_trn    l2_e_tst  l2_e_trn    l2_f_tst  l2_f_trn         lr
      0    2.67e+01  2.57e+01    2.21e-01  2.22e-01    8.44e-01  8.12e-01    1.0e-03
    100    6.14e+00  5.40e+00    3.01e-01  2.99e-01    1.93e-01  1.70e-01    1.0e-03
    200    5.02e+00  4.49e+00    1.53e-01  1.53e-01    1.58e-01  1.42e-01    1.0e-03
    300    4.36e+00  3.71e+00    7.32e-02  7.27e-02    1.38e-01  1.17e-01    1.0e-03
    400    4.04e+00  3.29e+00    3.16e-02  3.22e-02    1.28e-01  1.04e-01    1.0e-03

The first column displays the number of batches. The second and third columns display the loss function evaluated by numb_test frames randomly chosen from the test set and that evaluated by the current training batch, respectively. The fourth and fifth columns display the RMS energy error (normalized by number of atoms) evaluated by numb_test frames randomly chosen from the test set and that evaluated by the current training batch, respectively. The sixth and seventh columns display the RMS force error (component-wise) evaluated by numb_test frames randomly chosen from the test set and that evaluated by the current training batch, respectively. The last column displays the current learning rate.

Checkpoints will be written to files with prefix save_ckpt every save_freq batches.

Several command line options can be passed to dp train, which can be checked with

$ dp train --help

An explanation will be provided

positional arguments:
  INPUT                 the input json database

optional arguments:
  -h, --help            show this help message and exit
  --init-model INIT_MODEL
                        Initialize a model by the provided checkpoint
  --restart RESTART     Restart the training from the provided checkpoint

The keys intra_op_parallelism_threads and inter_op_parallelism_threads are Tensorflow configurations for multithreading, which are explained here. Skipping -t and OMP_NUM_THREADS leads to the default setting of these keys in the Tensorflow.

--init-model model.ckpt, for example, initializes the model training with an existing model that is stored in the checkpoint model.ckpt, the network architectures should match.

--restart model.ckpt, continues the training from the checkpoint model.ckpt.

On some resources limited machines, one may want to control the number of threads used by DeePMD-kit. This is achieved by three environmental variables: OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS and TF_INTER_OP_PARALLELISM_THREADS. OMP_NUM_THREADS controls the multithreading of DeePMD-kit implemented operations. TF_INTRA_OP_PARALLELISM_THREADS and TF_INTER_OP_PARALLELISM_THREADS controls intra_op_parallelism_threads and inter_op_parallelism_threads, which are Tensorflow configurations for multithreading. An explanation is found here.

For example if you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:

export OMP_NUM_THREADS=6
export TF_INTRA_OP_PARALLELISM_THREADS=3
export TF_INTER_OP_PARALLELISM_THREADS=2
dp train input.json

Freeze a model

The trained neural network is extracted from a checkpoint and dumped into a database. This process is called “freezing” a model. The idea and part of our code are from Morgan. To freeze a model, typically one does

$ dp freeze -o graph.pb

in the folder where the model is trained. The output database is called graph.pb.

Test a model

The frozen model can be used in many ways. The most straightforward test can be performed using dp test. A typical usage of dp test is

dp test -m graph.pb -s /path/to/system -n 30

where -m gives the tested model, -s the path to the tested system and -n the number of tested frames. Several other command line options can be passed to dp test, which can be checked with

$ dp test --help

An explanation will be provided

usage: dp test [-h] [-m MODEL] [-s SYSTEM] [-S SET_PREFIX] [-n NUMB_TEST]
               [-r RAND_SEED] [--shuffle-test] [-d DETAIL_FILE]

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Frozen model file to import
  -s SYSTEM, --system SYSTEM
                        The system dir
  -S SET_PREFIX, --set-prefix SET_PREFIX
                        The set prefix
  -n NUMB_TEST, --numb-test NUMB_TEST
                        The number of data for test
  -r RAND_SEED, --rand-seed RAND_SEED
                        The random seed
  --shuffle-test        Shuffle test data
  -d DETAIL_FILE, --detail-file DETAIL_FILE
                        The file containing details of energy force and virial
                        accuracy

Model inference

One may use the python interface of DeePMD-kit for model inference, an example is given as follows

import deepmd.DeepPot as DP
import numpy as np
dp = DP('graph.pb')
coord = np.array([[1,0,0], [0,0,1.5], [1,0,3]]).reshape([1, -1])
cell = np.diag(10 * np.ones(3)).reshape([1, -1])
atype = [1,0,1]
e, f, v = dp.eval(coord, cell, atype)

where e, f and v are predicted energy, force and virial of the system, respectively.

Run MD with LAMMPS

Include deepmd in the pair style

Running an MD simulation with LAMMPS is simpler. In the LAMMPS input file, one needs to specify the pair style as follows

pair_style     deepmd graph.pb
pair_coeff     

where graph.pb is the file name of the frozen model. The pair_coeff should be left blank. It should be noted that LAMMPS counts atom types starting from 1, therefore, all LAMMPS atom type will be firstly subtracted by 1, and then passed into the DeePMD-kit engine to compute the interactions. A detailed documentation of this pair style is available..

Long-range interaction

The reciprocal space part of the long-range interaction can be calculated by LAMMPS command kspace_style. To use it with DeePMD-kit, one writes

pair_style	deepmd graph.pb
pair_coeff
kspace_style	pppm 1.0e-5
kspace_modify	gewald 0.45

Please notice that the DeePMD does nothing to the direct space part of the electrostatic interaction, because this part is assumed to be fitted in the DeePMD model (the direct space cut-off is thus the cut-off of the DeePMD model). The splitting parameter gewald is modified by the kspace_modify command.

Run path-integral MD with i-PI

The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named dp_ipi that computes the interactions (including energy, force and virial). The server and client communicates via the Unix domain socket or the Internet socket. The client can be started by

$ dp_ipi water.json

It is noted that multiple instances of the client is allow for computing, in parallel, the interactions of multiple replica of the path-integral MD.

water.json is the parameter file for the client dp_ipi, and an example is provided:

{
    "verbose":		false,
    "use_unix":		true,
    "port":		31415,
    "host":		"localhost",
    "graph_file":	"graph.pb",
    "coord_file":	"conf.xyz",
    "atom_type" : {
	"OW":		0, 
	"HW1":		1,
	"HW2":		1
    }
}

The option use_unix is set to true to activate the Unix domain socket, otherwise, the Internet socket is used.

The option graph_file provides the file name of the frozen model.

The dp_ipi gets the atom names from an XYZ file provided by coord_file (meanwhile ignores all coordinates in it), and translates the names to atom types by rules provided by atom_type.

Use deep potential with ASE

Deep potential can be set up as a calculator with ASE to obtain potential energies and forces.

from ase import Atoms
from deepmd.calculator import DP

water = Atoms('H2O',
              positions=[(0.7601, 1.9270, 1),
                         (1.9575, 1, 1),
                         (1., 1., 1.)],
              cell=[100, 100, 100],
              calculator=DP(model="frozen_model.pb"))
print(water.get_potential_energy())
print(water.get_forces())

Optimization is also available:

from ase.optimize import BFGS
dyn = BFGS(water)
dyn.run(fmax=1e-6)
print(water.get_positions())