Calculate Model Deviation

7.2. Calculate Model Deviation#

7.2.1. Theory#

Model deviation \(\epsilon_y\) is the standard deviation of properties \(\boldsymbol y\) inferred by an ensemble of models \(\mathcal{M}_ 1, \dots, \mathcal{M}_{n_m}\) that are trained by the same dataset(s) with the model parameters initialized independently. The DeePMD-kit supports \(\boldsymbol y\) to be the atomic force \(\boldsymbol F_i\) and the virial tensor \(\boldsymbol \Xi\). The model deviation is used to estimate the error of a model at a certain data frame, denoted by \(\boldsymbol x\), containing the coordinates and chemical species of all atoms. We present the model deviation of the atomic force and the virial tensor

\[ \epsilon_{\boldsymbol{F},i} (\boldsymbol x)= \sqrt{\langle \lVert \boldsymbol F_i(\boldsymbol x; \boldsymbol \theta_k)-\langle \boldsymbol F_i(\boldsymbol x; \boldsymbol \theta_k) \rangle \rVert^2 \rangle},\]

\[ \epsilon_{\boldsymbol{\Xi},{\alpha \beta}} (\boldsymbol x)= \frac{1}{N} \sqrt{\langle ( {\Xi}_{\alpha \beta}(\boldsymbol x; \boldsymbol \theta_k)-\langle {\Xi}_{\alpha \beta}(\boldsymbol x; \boldsymbol \theta_k) \rangle )^2 \rangle},\]

where \(\boldsymbol \theta_k\) is the parameters of the model \(\mathcal M_k\), and the ensemble average \(\langle\cdot\rangle\) is estimated by

\[ \langle \boldsymbol y(\boldsymbol x; \boldsymbol \theta_k) \rangle = \frac{1}{n_m} \sum_{k=1}^{n_m} \boldsymbol y(\boldsymbol x; \boldsymbol \theta_k).\]

Small \(\epsilon_{\boldsymbol{F},i}\) means the model has learned the given data; otherwise, it is not covered, and the training data needs to be expanded. If the magnitude of \(\boldsymbol F_i\) or \(\boldsymbol \Xi\) is quite large, a relative model deviation \(\epsilon_{\boldsymbol{F},i,\text{rel}}\) or \(\epsilon_{\boldsymbol{\Xi},\alpha\beta,\text{rel}}\) can be used instead of the absolute model deviation:

\[ \epsilon_{\boldsymbol{F},i,\text{rel}} (\boldsymbol x) = \frac{\lvert \epsilon_{\boldsymbol{F},i} (\boldsymbol x) \lvert} {\lvert \langle \boldsymbol F_i (\boldsymbol x; \boldsymbol \theta_k) \rangle \lvert + \nu},\]

\[ \epsilon_{\boldsymbol{\Xi},\alpha\beta,\text{rel}} (\boldsymbol x) = \frac{ \epsilon_{\boldsymbol{\Xi},\alpha\beta} (\boldsymbol x) } {\lvert \langle \boldsymbol \Xi (\boldsymbol x; \boldsymbol \theta_k) \rangle \lvert + \nu},\]

where \(\nu\) is a small constant used to protect an atom where the magnitude of \(\boldsymbol{F}_i\) or \(\boldsymbol{\Xi}\) is small from having a large model deviation.

Statistics of \(\epsilon_{\boldsymbol{F},i}\) and \(\epsilon_{\boldsymbol{\Xi},{\alpha \beta}}\) can be provided, including the maximum, average, and minimal model deviation over the atom index \(i\) and over the component index \(\alpha,\beta\), respectively. The maximum model deviation of forces \(\epsilon_{\boldsymbol F,\text{max}}\) in a frame was found to be the best error indicator in a concurrent or active learning algorithm.[1]

7.2.2. Instructions#

One can also use a subcommand to calculate the deviation of predicted forces or virials for a bunch of models in the following way:

dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out

where -m specifies model files to be calculated, -s gives the data to be evaluated, -o the file to which model deviation results are dumped. Here is more information on this sub-command:

usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}]
                     [-l LOG_PATH] [-m MODELS [MODELS ...]] [-s SYSTEM]
                     [-S SET_PREFIX] [-o OUTPUT] [-f FREQUENCY] [--real_error]
                     [--atomic] [--relative RELATIVE]
                     [--relative_v RELATIVE_V]

options:
  -h, --help            show this help message and exit
  -v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
                        set verbosity level by string or number, 0=ERROR, 1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
  -l LOG_PATH, --log-path LOG_PATH
                        set log file to log messages to disk, if not specified, the logs will only be output to console (default: None)
  -m MODELS [MODELS ...], --models MODELS [MODELS ...]
                        Frozen models file (prefix) to import. TensorFlow backend: suffix is .pb; PyTorch backend: suffix is .pth. (default: ['graph.000', 'graph.001', 'graph.002', 'graph.003'])
  -s SYSTEM, --system SYSTEM
                        The system directory. Recursively detect systems in this directory. (default: .)
  -S SET_PREFIX, --set-prefix SET_PREFIX
                        [DEPRECATED] Deprecated argument. (default: None)
  -o OUTPUT, --output OUTPUT
                        The output file for results of model deviation (default: model_devi.out)
  -f FREQUENCY, --frequency FREQUENCY
                        The trajectory frequency of the system (default: 1)
  --real_error          Calculate the RMS real error of the model. The real data should be given in the systems. (default: False)
  --atomic              Print the force model deviation of each atom. (default: False)
  --relative RELATIVE   Calculate the relative model deviation of force. The level parameter for computing the relative model deviation of the force should be given. (default: None)
  --relative_v RELATIVE_V
                        Calculate the relative model deviation of virial. The level parameter for computing the relative model deviation of the virial should be given. (default: None)

examples:
    dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out

For more details concerning the definition of model deviation and its application, please refer to Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 253, 107206.

7.2.3. Relative model deviation#

By default, the model deviation is output in absolute value. If the argument --relative is passed, then the relative model deviation of the force will be output, including values output by the argument --atomic. The relative model deviation of the force on atom \(i\) is defined by

\[E_{f_i}=\frac{\left|D_{f_i}\right|}{\left|f_i\right|+l}\]

where \(D_{f_i}\) is the absolute model deviation of the force on atom \(i\), \(f_i\) is the norm of the force and \(l\) is provided as the parameter of the keyword relative. If the argument --relative_v is set, then the relative model deviation of the virial will be output instead of the absolute value, with the same definition of that of the force:

\[E_{v_i}=\frac{\left|D_{v_i}\right|}{\left|v_i\right|+l}\]

Calculate Model Deviation

Contents

7.2. Calculate Model Deviation#

7.2.1. Theory#

7.2.2. Instructions#

7.2.3. Relative model deviation#