Train a Deep Potential model using descriptor "se_e2_a"
¶
The notation of se_e2_a
is short for the Deep Potential Smooth Edition (DeepPot-SE) constructed from all information (both angular and radial) of atomic configurations. The e2
stands for the embedding with two-atoms information. This descriptor was described in detail in the DeepPot-SE paper.
In this example we will train a DeepPot-SE model for a water system. A complete training input script of this example can be find in the directory.
$deepmd_source_dir/examples/water/se_e2_a/input.json
With the training input script, data (please read the warning) are also provided in the example directory. One may train the model with the DeePMD-kit from the directory.
The contents of the example:
The training input script¶
A working training script using descriptor se_e2_a
is provided as input.json
in the same directory as this README.
The input.json
is divided in several sections, model
, learning_rate
, loss
and training
.
For more information, one can find the a full documentation on the training input script.
Model¶
The model
defines how the model is constructed, for example
"model": {
"type_map": ["O", "H"],
"descriptor" :{
...
},
"fitting_net" : {
...
}
}
We are looking for a model for water, so we have two types of atoms. The atom types are recorded as integers. In this example, we denote 0
for oxygen and 1
for hydrogen. A mapping from the atom type to their names is provided by type_map
.
The model has two subsections descritpor
and fitting_net
, which defines the descriptor and the fitting net, respectively. The type_map
is optional, which provides the element names (but not necessarily to be the element name) of the corresponding atom types.
Descriptor¶
The construction of the descriptor is given by section descriptor
. An example of the descriptor is provided as follows
"descriptor" :{
"type": "se_e2_a",
"rcut_smth": 0.50,
"rcut": 6.00,
"sel": [46, 92],
"neuron": [25, 50, 100],
"type_one_side": true,
"axis_neuron": 16,
"resnet_dt": false,
"seed": 1
}
The
type
of the descriptor is set to"se_e2_a"
.rcut
is the cut-off radius for neighbor searching, and thercut_smth
gives where the smoothing starts.sel
gives the maximum possible number of neighbors in the cut-off radius. It is a list, the length of which is the same as the number of atom types in the system, andsel[i]
denote the maximum possible number of neighbors with typei
.The
neuron
specifies the size of the embedding net. From left to right the members denote the sizes of each hidden layer from input end to the output end, respectively. If the outer layer is of twice size as the inner layer, then the inner layer is copied and concatenated, then a ResNet architecture is built between them.If the option
type_one_side
is set totrue
, then descriptor will consider the types of neighbor atoms. Otherwise, both the types of centric and neighbor atoms are considered.The
axis_neuron
specifies the size of submatrix of the embedding matrix, the axis matrix as explained in the DeepPot-SE paperIf the option
resnet_dt
is settrue
, then a timestep is used in the ResNet.seed
gives the random seed that is used to generate random numbers when initializing the model parameters.
Fitting¶
The construction of the fitting net is give by section fitting_net
"fitting_net" : {
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 1
},
neuron
specifies the size of the fitting net. If two neighboring layers are of the same size, then a ResNet architecture is built between them.If the option
resnet_dt
is settrue
, then a timestep is used in the ResNet.seed
gives the random seed that is used to generate random numbers when initializing the model parameters.
Learning rate¶
The learning_rate
section in input.json
is given as follows
"learning_rate" :{
"type": "exp",
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"decay_steps": 5000,
"_comment": "that's all"
}
start_lr
gives the learning rate at the beginning of the training.stop_lr
gives the learning rate at the end of the training. It should be small enough to ensure that the network parameters satisfactorily converge.During the training, the learning rate decays exponentially from
start_lr
tostop_lr
following the formula.lr(t) = start_lr * decay_rate ^ ( t / decay_steps )
where
t
is the training step.
Loss¶
The loss function of DeePMD-kit is given by
loss = pref_e * loss_e + pref_f * loss_f + pref_v * loss_v
where loss_e
, loss_f
and loss_v
denote the loss in energy, force and virial, respectively. pref_e
, pref_f
and pref_v
give the prefactors of the energy, force and virial losses. The prefectors may not be a constant, rather it changes linearly with the learning rate. Taking the force prefactor for example, at training step t
, it is given by
pref_f(t) = start_pref_f * ( lr(t) / start_lr ) + limit_pref_f * ( 1 - lr(t) / start_lr )
where lr(t)
denotes the learning rate at step t
. start_pref_f
and limit_pref_f
specifies the pref_f
at the start of the training and at the limit of t -> inf
.
The loss
section in the input.json
is
"loss" : {
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0
}
The options start_pref_e
, limit_pref_e
, start_pref_f
, limit_pref_f
, start_pref_v
and limit_pref_v
determine the start and limit prefactors of energy, force and virial, respectively.
If one does not want to train with virial, then he/she may set the virial prefactors start_pref_v
and limit_pref_v
to 0.
Training parameters¶
Other training parameters are given in the training
section.
"training": {
"training_data": {
"systems": ["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
"batch_size": "auto"
},
"validation_data":{
"systems": ["../data_water/data_3"],
"batch_size": 1,
"numb_btch": 3
},
"numb_step": 1000000,
"seed": 1,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000
}
The sections "training_data"
and "validation_data"
give the training dataset and validation dataset, respectively. Taking the training dataset for example, the keys are explained below:
systems
provide paths of the training data systems. DeePMD-kit allows you to provide multiple systems. This key can be alist
or astr
.list
:systems
gives the training data systems.str
:systems
should be a valid path. DeePMD-kit will recursively search all data systems in this path.
At each training step, DeePMD-kit randomly pick
batch_size
frame(s) from one of the systems. The probability of using a system is by default in proportion to the number of batches in the system. More optional are available for automatically determining the probability of using systems. One can set the keyauto_prob
to"prob_uniform"
all systems are used with the same probability."prob_sys_size"
the probability of using a system is in proportional to its size (number of frames)."prob_sys_size; sidx_0:eidx_0:w_0; sidx_1:eidx_1:w_1;..."
thelist
of systems are divided into blocks. The blocki
has systems ranging fromsidx_i
toeidx_i
. The probability of using a system from blocki
is in proportional tow_i
. Within one block, the probability of using a system is in proportional to its size.
An example of using
"auto_prob"
is given as below. The probability of usingsystems[2]
is 0.4, and the sum of the probabilities of usingsystems[0]
andsystems[1]
is 0.6. If the number of frames insystems[1]
is twice assystem[0]
, then the probability of usingsystem[1]
is 0.4 and that ofsystem[0]
is 0.2.
"training_data": {
"systems": ["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
"auto_prob": "prob_sys_size; 0:2:0.6; 2:3:0.4",
"batch_size": "auto"
}
The probability of using systems can also be specified explicitly with key
"sys_prob"
that is a list having the length of the number of systems. For example
"training_data": {
"systems": ["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
"sys_prob": [0.5, 0.3, 0.2],
"batch_size": "auto:32"
}
The key
batch_size
specifies the number of frames used to train or validate the model in a training step. It can be set tolist
: the length of which is the same as thesystems
. The batch size of each system is given by the elements of the list.int
: all systems use the same batch size."auto"
: the same as"auto:32"
, see"auto:N"
"auto:N"
: automatically determines the batch size so that thebatch_size
times the number of atoms in the system is no less thanN
.
The key
numb_batch
invalidate_data
gives the number of batches of model validation. Note that the batches may not be from the same system
Other keys in the training
section are explained below:
numb_step
The number of training steps.seed
The random seed for getting frames from the training data set.disp_file
The file for printing learning curve.disp_freq
The frequency of printing learning curve. Set in the unit of training stepssave_freq
The frequency of saving check point.
Train a Deep Potential model¶
When the input script is prepared, one may start training by
dp train input.json
By default, the verbosity level of the DeePMD-kit is INFO
, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems worth special notice.
DEEPMD INFO ---Summary of DataSystem: training -----------------------------------------------
DEEPMD INFO found 3 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../data_water/data_0/ 192 1 80 0.250 T
DEEPMD INFO ../data_water/data_1/ 192 1 160 0.500 T
DEEPMD INFO ../data_water/data_2/ 192 1 80 0.250 T
DEEPMD INFO --------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: validation -----------------------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../data_water/data_3 192 1 80 1.000 T
DEEPMD INFO --------------------------------------------------------------------------------------
The DeePMD-kit prints detailed informaiton on the training and validation data sets. The data sets are defined by "training_data"
and "validation_data"
defined in the "training"
section of the input script. The training data set is composed by three data systems, while the validation data set is composed by one data system. The number of atoms, batch size, number of batches in the system and the probability of using the system are all shown on the screen. The last column presents if the periodic boundary condition is assumed for the system.
During the training, the error of the model is tested every disp_freq
training steps with the batch used to train the model and with numb_btch
batches from the validating data. The training error and validation error are printed correspondingly in the file disp_file
. The batch size can be set in the input script by the key batch_size
in the corresponding sections for training and validation data set. An example of the output
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 3.33e+01 3.41e+01 1.03e+01 1.03e+01 8.39e-01 8.72e-01 1.0e-03
100 2.57e+01 2.56e+01 1.87e+00 1.88e+00 8.03e-01 8.02e-01 1.0e-03
200 2.45e+01 2.56e+01 2.26e-01 2.21e-01 7.73e-01 8.10e-01 1.0e-03
300 1.62e+01 1.66e+01 5.01e-02 4.46e-02 5.11e-01 5.26e-01 1.0e-03
400 1.36e+01 1.32e+01 1.07e-02 2.07e-03 4.29e-01 4.19e-01 1.0e-03
500 1.07e+01 1.05e+01 2.45e-03 4.11e-03 3.38e-01 3.31e-01 1.0e-03
The file contains 8 columns, form right to left, are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by number of atoms in the system.
Warning¶
It is warned that the example water data (in folder examples/water/data
) is of very limited amount, is provided only for testing purpose, and should not be used to train a productive model.