DPGEN2’s documentation
DPGEN2 is the 2nd generation of the Deep Potential GENerator.
Important
The project DeePMD-kit is licensed under GNU LGPLv3.0.
Guide on dpgen2 commands
One may use dpgen2 through command line interface. A full documentation of the cli is found here
Submit a workflow
The dpgen2 workflow can be submitted via the submit
command
dpgen2 submit input.json
where input.json
is the input script. A guide of writing the script is found here. When a workflow is submitted, a ID (WFID) of the workflow will be printed for later reference.
Check the convergence of a workflow
The convergence of stages of the workflow can be checked by the status
command. It prints the indexes of the finished stages, iterations, and the accurate, candidate and failed ratio of explored configurations of each iteration.
$ dpgen2 status input.json WFID
# stage id_stg. iter. accu. cand. fail.
# Stage 0 --------------------
0 0 0 0.8333 0.1667 0.0000
0 1 1 0.7593 0.2407 0.0000
0 2 2 0.7778 0.2222 0.0000
0 3 3 1.0000 0.0000 0.0000
# Stage 0 converged YES reached max numb iterations NO
# All stages converged
Watch the progress of a workflow
The progress of a workflow can be watched on-the-fly
$ dpgen2 watch input.json WFID
INFO:root:steps iter-000000--prep-run-train----------------------- finished
INFO:root:steps iter-000000--prep-run-lmp------------------------- finished
INFO:root:steps iter-000000--prep-run-fp-------------------------- finished
INFO:root:steps iter-000000--collect-data------------------------- finished
INFO:root:steps iter-000001--prep-run-train----------------------- finished
INFO:root:steps iter-000001--prep-run-lmp------------------------- finished
...
The artifacts can be downloaded on-the-fly with -d
flag. Note that the existing files are automatically skipped if one sets dflow_config["archive_mode"] = None
.
Show the keys of steps
Each dpgen2 step is assigned a unique key. The keys of the finished steps can be checked with showkey
command
0 : iter-000000--prep-train
1 -> 4 : iter-000000--run-train-0000 -> iter-000000--run-train-0003
5 : iter-000000--prep-lmp
6 -> 14 : iter-000000--run-lmp-000000 -> iter-000000--run-lmp-000008
15 : iter-000000--select-confs
16 : iter-000000--prep-fp
17 -> 20 : iter-000000--run-fp-000000 -> iter-000000--run-fp-000003
21 : iter-000000--collect-data
22 : iter-000000--scheduler
23 : iter-000000--id
24 : iter-000001--prep-train
25 -> 28 : iter-000001--run-train-0000 -> iter-000001--run-train-0003
29 : iter-000001--prep-lmp
30 -> 38 : iter-000001--run-lmp-000000 -> iter-000001--run-lmp-000008
39 : iter-000001--select-confs
40 : iter-000001--prep-fp
41 -> 44 : iter-000001--run-fp-000000 -> iter-000001--run-fp-000003
45 : iter-000001--collect-data
46 : iter-000001--scheduler
47 : iter-000001--id
Resubmit a workflow
If a workflow stopped abnormally, one may submit a new workflow with some steps of the old workflow reused.
dpgen2 resubmit input.json WFID --reuse 0-41
The steps of workflow WDID 0-41 (0<=id<41, note that 41 is not included) will be reused in the new workflow. The indexes of the steps are printed by dpgen2 showkey
. In the example, all the steps before the iter-000001--run-fp-000000
will be used in the new workflow.
Command line interface
DPGEN2: concurrent learning workflow generating the machine learning potential energy models.
usage: dpgen2 [-h] [-v]
{submit,resubmit,showkey,status,download,watch,terminate,stop,suspend,delete,retry,resume}
...
Named Arguments
- -v, --version
show program’s version number and exit
Valid subcommands
- command
Possible choices: submit, resubmit, showkey, status, download, watch, terminate, stop, suspend, delete, retry, resume
Sub-commands
submit
Submit DPGEN2 workflow
dpgen2 submit [-h] [-o] CONFIG
Positional Arguments
- CONFIG
the config file in json format defining the workflow.
Named Arguments
- -o, --old-compatible
compatible with old-style input script used in dpgen2 < 0.0.6.
Default: False
resubmit
Submit DPGEN2 workflow resuing steps from an existing workflow
dpgen2 resubmit [-h] [-l] [-u REUSE [REUSE ...]] [-k] [-o] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format defining the workflow.
- ID
the ID of the existing workflow.
Named Arguments
- -l, --list
list the Steps of the existing workflow.
Default: False
- -u, --reuse
specify which Steps to reuse.
- -k, --keep-schedule
if set then keep schedule of the old workflow. otherwise use the schedule defined in the input file
Default: False
- -o, --old-compatible
compatible with old-style input script used in dpgen2 < 0.0.6.
Default: False
showkey
Print the keys of the successful DPGEN2 steps
dpgen2 showkey [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the existing workflow.
status
Print the status (stage, iteration, convergence) of the DPGEN2 workflow
dpgen2 status [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the existing workflow.
download
Download the artifacts of DPGEN2 steps
dpgen2 download [-h] [-k KEYS [KEYS ...]] [-p PREFIX] [-n] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the existing workflow.
Named Arguments
- -k, --keys
the keys of the downloaded steps. If not provided download all artifacts
- -p, --prefix
the prefix of the path storing the download artifacts
- -n, --no-check-point
if specified, download regardless whether check points exist.
Default: True
watch
Watch a DPGEN2 workflow
dpgen2 watch [-h] [-k KEYS [KEYS ...]] [-f FREQUENCY] [-d] [-p PREFIX] [-n]
CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the existing workflow.
Named Arguments
- -k, --keys
the subkey to watch. For example, ‘prep-run-train’ ‘prep-run-lmp’
Default: [‘prep-run-train’, ‘prep-run-lmp’, ‘prep-run-fp’, ‘collect-data’]
- -f, --frequency
the frequency of workflow status query. In unit of second
Default: 600.0
- -d, --download
whether to download artifacts of a step when it finishes
Default: False
- -p, --prefix
the prefix of the path storing the download artifacts
- -n, --no-check-point
if specified, download regardless whether check points exist.
Default: True
terminate
Terminate a DPGEN2 workflow.
dpgen2 terminate [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the workflow.
stop
Stop a DPGEN2 workflow.
dpgen2 stop [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the workflow.
suspend
Suspend a DPGEN2 workflow.
dpgen2 suspend [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the workflow.
delete
Delete a DPGEN2 workflow.
dpgen2 delete [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the workflow.
retry
Retry a DPGEN2 workflow.
dpgen2 retry [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the workflow.
resume
Resume a DPGEN2 workflow.
dpgen2 resume [-h] CONFIG ID
Positional Arguments
- CONFIG
the config file in json format.
- ID
the ID of the workflow.
Guide on writing input scripts for dpgen2 commands
Preliminaries
The reader of this doc is assumed to be familiar with the concurrent learning algorithm that the dpgen2 implements. If not, one may check this paper.
The input script for all dpgen2 commands
For all the dpgen2 commands, one need to provide dflow2
global configurations. For example,
"dflow_config" : {
"host" : "http://address.of.the.host:port"
},
"dflow_s3_config" : {
"endpoint" : "address.of.the.s3.sever:port"
},
The dpgen
simply pass all keys of "dflow_config"
to dflow.config
and all keys of "dflow_s3_config"
to dflow.s3_config
.
The input script for submit
and resubmit
The full documentation of the submit
and resubmit
script can be found here. This documentation provides a fast guide on how to write the input script.
In the input script of dpgen2 submit
and dpgen2 resubmit
, one needs to provide the definition of the workflow and how they are executed in the input script. One may find an example input script in the dpgen2 Al-Mg alloy example.
The definition of the workflow can be provided by the following sections:
Inputs
This section provides the inputs to start a dpgen2 workflow. An example for the Al-Mg alloy
"inputs": {
"type_map": ["Al", "Mg"],
"mass_map": [27, 24],
"init_data_sys": [
"path/to/init/data/system/0",
"path/to/init/data/system/1"
],
}
The key "init_data_sys"
provides the initial training data to kick-off the training of deep potential (DP) models.
Training
This section defines how a model is trained.
"train" : {
"type" : "dp",
"numb_models" : 4,
"config" : {},
"template_script" : {
"_comment" : "omitted content of tempalte script"
},
"_comment" : "all"
}
The "type" : "dp"
tell the traning method is "dp"
, i.e. calling DeePMD-kit to train DP models. The "config"
key defines the training configs, see the full documentation. The "template_script"
provides the template training script in json
format.
Exploration
This section defines how the configuration space is explored.
"explore" : {
"type" : "lmp",
"config" : {
"command": "lmp -var restart 0"
},
"max_numb_iter" : 5,
"conv_accuracy" : 0.9,
"fatal_at_max" : false,
"f_trust_lo": 0.05,
"f_trust_hi": 0.50,
"configurations": [
{
"lattice" : ["fcc", 4.57],
"replicate" : [2, 2, 2],
"numb_confs" : 30,
"concentration" : [[1.0, 0.0], [0.5, 0.5], [0.0, 1.0]]
}
{
"lattice" : ["fcc", 4.57],
"replicate" : [3, 3, 3],
"numb_confs" : 30,
"concentration" : [[1.0, 0.0], [0.5, 0.5], [0.0, 1.0]]
}
],
"stages": [
[
{
"_comment" : "stage 0, task group 0",
"type" : "lmp-md",
"ensemble": "nvt", "nsteps": 50, "temps": [50, 100], "trj_freq": 10,
"conf_idx": [0], "n_sample" : 3
},
{
"_comment" : "stage 1, task group 0",
"type" : "lmp-template",
"lmp" : "template.lammps", "plm" : "template.plumed",
"trj_freq" : 10, "revisions" : {"V_NSTEPS" : [40], "V_TEMP" : [150, 200]},
"conf_idx": [0], "n_sample" : 3
}
],
[
{
"_comment" : "stage 1, task group 0",
"type" : "lmp-md",
"ensemble": "npt", "nsteps": 50, "press": [1e0], "temps": [50, 100, 200], "trj_freq": 10,
"conf_idx": [1], "n_sample" : 3
}
],
],
}
The "type" : "lmp"
means that configurations are explored by LAMMPS DPMD runs. The "config"
key defines the lmp configs, see the full documentation. The "configurations"
provides the initial configurations (coordinates of atoms and the simulation cell) of the DPMD simulations. It is a list. The elements of the list can be
list[str]
: The strings provides the path to the configuration files.dict
: Automatic alloy configuration generator. See the detailed doc of the allowed keys.
The "stages"
defines the exploration stages. It is of type list[list[dict]]
. The outer list
enumerate the exploration stages, the inner list enumerate the task groups of the stage. Each dict
defines a stage. See the full documentation of the target group for writting task groups.
"n_sample"
tells the number of confgiruations randomly sampled from the set picked by "conf_idx"
from configurations
for each exploration task. All configurations has the equal possibility to be sampled. The default value of "n_sample"
is null
, in this case all picked configurations are sampled. In the example, we have 3 samples for stage 0 task group 0 and 2 thermodynamic states (NVT, T=50 and 100K), then the task group has 3x2=6 NVT DPMD tasks.
FP
This section defines the first-principle (FP) calculation .
"fp" : {
"type" : "vasp",
"config" : {
"command": "source /opt/intel/oneapi/setvars.sh && mpirun -n 16 vasp_std"
},
"task_max": 2,
"pp_files": {"Al" : "vasp/POTCAR.Al", "Mg" : "vasp/POTCAR.Mg"},
"incar": "vasp/INCAR",
"_comment" : "all"
}
The "type" : "vasp"
means that first-principles are VASP calculations. The "config"
key defines the vasp configs, see the full documentation. The "task_max"
key defines the maximal number of vasp calculations in each dpgen2 iteration. The "pp_files"
and "incar"
keys provides the pseudopotential files and the template incar file.
Configuration of dflow step
The execution units of the dpgen2 are the dflow Step
s. How each step is executed is defined by the "step_configs"
.
"step_configs":{
"prep_train_config" : {
"_comment" : "content omitted"
},
"run_train_config" : {
"_comment" : "content omitted"
},
"prep_explore_config" : {
"_comment" : "content omitted"
},
"run_explore_config" : {
"_comment" : "content omitted"
},
"prep_fp_config" : {
"_comment" : "content omitted"
},
"run_fp_config" : {
"_comment" : "content omitted"
},
"select_confs_config" : {
"_comment" : "content omitted"
},
"collect_data_config" : {
"_comment" : "content omitted"
},
"cl_step_config" : {
"_comment" : "content omitted"
},
"_comment" : "all"
},
The configs for prepare training, run training, prepare exploration, run exploration, prepare fp, run fp, select configurations, collect data and concurrent learning steps are given correspondingly.
The readers are refered to this page for a full documentation of the step configs.
Any of the config in the step_configs
can be ommitted. If so, the configs of the step is set to the default step configs, which is provided by the following section, for example,
"default_step_config" : {
"template_config" : {
"image" : "dpgen2:x.x.x"
}
},
The way of writing the default_step_config
is the same as any step config in the step_configs
. One may refer to this page for full documentation.
Arguments of the submit script
DPGEN2 configurations
Op configs
RunDPTrain
- init_model_policy:
- type:
str
, optional, default:no
argument path:init_model_policy
The policy of init-model training. It can be
‘no’: No init-model training. Traing from scratch.
‘yes’: Do init-model training.
‘old_data_larger_than:XXX’: Do init-model if the training data size of the previous model is larger than XXX. XXX is an int number.
- init_model_old_ratio:
- type:
float
, optional, default:0.9
argument path:init_model_old_ratio
The frequency ratio of old data over new data
- init_model_numb_steps:
- type:
int
, optional, default:400000
, alias: init_model_stop_batchargument path:init_model_numb_steps
The number of training steps when init-model
- init_model_start_lr:
- type:
float
, optional, default:0.0001
argument path:init_model_start_lr
The start learning rate when init-model
- init_model_start_pref_e:
- type:
float
, optional, default:0.1
argument path:init_model_start_pref_e
The start energy prefactor in loss when init-model
- init_model_start_pref_f:
- type:
int
|float
, optional, default:100
argument path:init_model_start_pref_f
The start force prefactor in loss when init-model
- init_model_start_pref_v:
- type:
float
, optional, default:0.0
argument path:init_model_start_pref_v
The start virial prefactor in loss when init-model
RunLmp
- command:
- type:
str
, optional, default:lmp
argument path:command
The command of LAMMPS
RunVasp
Alloy configs
Task group configs
- task_group_configs:
- type:
dict
argument path:task_group_configs
Depending on the value of type, different sub args are accepted.
- type:
the type of the task group
When type is set to
lmp-md
(or its aliaslmp-npt
):- temps:
- type:
list
, alias: Tsargument path:task_group_configs[lmp-md]/temps
A list of temperatures in K. Also used to initialize the temperature
- press:
- type:
list
, optional, alias: Psargument path:task_group_configs[lmp-md]/press
A list of pressures in bar.
- ens:
- type:
str
, optional, default:nve
, alias: ensembleargument path:task_group_configs[lmp-md]/ens
The ensemble. Allowd options are ‘nve’, ‘nvt’, ‘npt’, ‘npt-a’, ‘npt-t’. ‘npt-a’ stands for anisotrpic box sampling and ‘npt-t’ stands for triclinic box sampling.
- dt:
- type:
float
, optional, default:0.001
argument path:task_group_configs[lmp-md]/dt
The time step
- nsteps:
- type:
int
, optional, default:100
argument path:task_group_configs[lmp-md]/nsteps
The number of steps
- trj_freq:
- type:
int
, optional, default:10
, aliases: t_freq, trj_freq, traj_freqargument path:task_group_configs[lmp-md]/trj_freq
The number of steps
- tau_t:
- type:
float
, optional, default:0.05
argument path:task_group_configs[lmp-md]/tau_t
The time scale of thermostat
- tau_p:
- type:
float
, optional, default:0.5
argument path:task_group_configs[lmp-md]/tau_p
The time scale of barostat
- pka_e:
- type:
NoneType
|float
, optional, default:None
argument path:task_group_configs[lmp-md]/pka_e
The energy of primary knock-on atom
- neidelay:
- type:
int
|NoneType
, optional, default:None
argument path:task_group_configs[lmp-md]/neidelay
The delay of updating the neighbor list
- no_pbc:
- type:
bool
, optional, default:False
argument path:task_group_configs[lmp-md]/no_pbc
Not using the periodic boundary condition
- use_clusters:
- type:
bool
, optional, default:False
argument path:task_group_configs[lmp-md]/use_clusters
Calculate atomic model deviation
- relative_f_epsilon:
- type:
NoneType
|float
, optional, default:None
argument path:task_group_configs[lmp-md]/relative_f_epsilon
Calculate relative force model deviation
- relative_v_epsilon:
- type:
NoneType
|float
, optional, default:None
argument path:task_group_configs[lmp-md]/relative_v_epsilon
Calculate relative virial model deviation
When type is set to
lmp-template
:- lmp_template_fname:
- type:
str
, aliases: lmp_template, lmpargument path:task_group_configs[lmp-template]/lmp_template_fname
The file name of lammps input template
- plm_template_fname:
- type:
NoneType
|str
, optional, default:None
, aliases: plm_template, plmargument path:task_group_configs[lmp-template]/plm_template_fname
The file name of plumed input template
- revisions:
- type:
dict
, optional, default:{}
argument path:task_group_configs[lmp-template]/revisions
- traj_freq:
- type:
int
, optional, default:10
, aliases: t_freq, trj_freq, trj_freqargument path:task_group_configs[lmp-template]/traj_freq
The frequency of dumping configurations and thermodynamic states
Step configs
- template_config:
- type:
dict
, optional, default:{'image': 'dptechnology/dpgen2:latest'}
argument path:template_config
The configs passed to the PythonOPTemplate.
- image:
- type:
str
, optional, default:dptechnology/dpgen2:latest
argument path:template_config/image
The image to run the step.
- timeout:
- type:
int
|NoneType
, optional, default:None
argument path:template_config/timeout
The time limit of the OP. Unit is second.
- retry_on_transient_error:
- type:
int
|NoneType
, optional, default:None
argument path:template_config/retry_on_transient_error
The number of retry times if a TransientError is raised.
- timeout_as_transient_error:
- type:
bool
, optional, default:False
argument path:template_config/timeout_as_transient_error
Treat the timeout as TransientError.
- envs:
- type:
dict
|NoneType
, optional, default:None
argument path:template_config/envs
The environmental variables.
- continue_on_failed:
- type:
bool
, optional, default:False
argument path:continue_on_failed
If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).
- continue_on_num_success:
- type:
int
|NoneType
, optional, default:None
argument path:continue_on_num_success
Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.
- continue_on_success_ratio:
- type:
NoneType
|float
, optional, default:None
argument path:continue_on_success_ratio
Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.
- parallelism:
- type:
int
|NoneType
, optional, default:None
argument path:parallelism
The parallelism for the step
- executor:
- type:
dict
|NoneType
, optional, default:None
argument path:executor
The executor of the step.
Depending on the value of type, different sub args are accepted.
- type:
The type of the executor.
When type is set to
lebesgue_v2
:- extra:
- type:
dict
, optionalargument path:executor[lebesgue_v2]/extra
The ‘extra’ key in the lebesgue executor. Note that we do not check if ‘the dict provided to the ‘extra’ key is valid or not.
- scass_type:
- type:
str
, optionalargument path:executor[lebesgue_v2]/extra/scass_type
The machine configuraiton.
- program_id:
- type:
str
, optionalargument path:executor[lebesgue_v2]/extra/program_id
The ID of the program.
- job_type:
- type:
str
, optional, default:container
argument path:executor[lebesgue_v2]/extra/job_type
The type of job.
- template_cover_cmd_escape_bug:
- type:
bool
, optional, default:True
argument path:executor[lebesgue_v2]/extra/template_cover_cmd_escape_bug
The key for hacking around a bug in Lebesgue.
When type is set to
dispatcher
:
Developers’ guide
The concurrent learning algorithm
Overview of the DPGEN2 implementation
The DPGEN2 workflow
How to contribute
The concurrent learning algorithm
DPGEN2 implements the concurrent learning algorithm named DP-GEN, described in this paper. It is noted that other types of workflows, like active learning, should be easily implemented within the infrastructure of DPGEN2.
The DP-GEN algorithm is iterative. In each iteration, four steps are consecutively executed: training, exploration, selection, and labeling.
Training. A set of DP models are trained with the same dataset and the same hyperparameters. The only difference is the random seed initializing the model parameters.
Exploration. One of the DP models is used to explore the configuration space. The strategy of exploration highly depends on the purpose of the application case of the model. The simulation technique for exploration can be molecular dynamics, Monte Carlo, structure search/optimization, enhanced sampling, or any combination of them. Current DPGEN2 only supports exploration based on molecular simulation platform LAMMPS.
Selection. Not all the explored configurations are labeled, rather, the model prediction errors on the configurations are estimated by the model deviation, which is defined as the standard deviation in predictions of the set of the models. The critical configurations with large and not-that-large errors are selected for labeling. The configurations with very large errors are not selected because the large error is usually caused by non-physical configurations, e.g. overlapping atoms.
Labeling. The selected configurations are labeled with energy, forces, and virial calculated by a method of first-principles accuracy. The usually used method is the density functional theory implemented in VASP, Quantum Expresso, CP2K, and etc.. The labeled data are finally added to the training dataset to start the next iteration.
In each iteration, the quality of the model is improved by selecting and labeling more critical data and adding them to the training dataset. The DP-GEN iteration is converged when no more critical data can be selected.
Overview of the DPGEN2 Implementation
The implementation DPGEN2 is based on the workflow platform dflow, which is a python wrapper of the Argo Workflows, an open-source container-native workflow engine on Kubernetes.
The DP-GEN algorithm is conceptually modeled as a computational graph. The implementation is then considered as two lines: the operators and the workflow.
Operators. Operators are implemented in Python v3. The operators should be implemented and tested without the workflow.
Workflow. Workflow is implemented on dflow. Ideally, the workflow is implemented and tested with all operators mocked.
The DPGEN2 workflow
The workflow of DPGEN2 is illustrated in the following figure
In the center is the block
operator, which is a super-OP (an OP composed by several OPs) for one DP-GEN iteration, i.e. the super-OP of the training, exploration, selection, and labeling steps. The inputs of the block
OP are lmp_task_group
, conf_selector
and dataset
.
lmp_task_group
: definition of a group of LAMMPS tasks that explore the configuration space.conf_selector
: defines the rule by which the configurations are selected for labeling.dataset
: the training dataset.
The outputs of the block
OP are
exploration_report
: a report recording the result of the exploration. For example, home many configurations are accurate enough and how many are selected as candidates for labeling.dataset_incr
: the increment of the training dataset.
The dataset_incr
is added to the training dataset
.
The exploration_report
is passed to the exploration_strategy
OP. The exploration_strategy
implements the strategy of exploration. It reads the exploration_report
generated by each iteration (block
), then tells if the iteration is converged. If not, it generates a group of LAMMPS tasks (lmp_task_group
) and the criteria of selecting configurations (conf_selector
). The lmp_task_group
and conf_selector
are then used by block
of the next iteration. The iteration closes.
Inside the block
operator
The inside of the super-OP block
is displayed on the right-hand side of the figure. It contains the following steps to finish one DPGEN iteration
prep_run_dp_train
: prepares training tasks of DP models and runs them.prep_run_lmp
: prepares the LAMMPS exploration tasks and runs them.select_confs
: selects configurations for labeling from the explored configurations.prep_run_fp
: prepares and runs first-principles tasks.collect_data
: collects thedataset_incr
and adds it to thedataset
.
The exploration strategy
The exploration strategy defines how the configuration space is explored by the concurrent learning algorithm. The design of the exploration strategy is graphically illustrated in the following figure. The exploration is composed of stages. Only the DP-GEN exploration is converged at one stage (no configuration with a large error is explored), the exploration goes to the next iteration. The whole procedure is controlled by exploration_scheduler
. Each stage has its schedule, which talks to the exploration_scheduler
to generate the schedule for the DP-GEN algorithm.
Some concepts are explained below:
Exploration group. A group of LAMMPS tasks shares similar settings. For example, a group of NPT MD simulations in a certain thermodynamic space.
Exploration stage. The
exploration_stage
contains a list of exploration groups. It contains all information needed to define thelmp_task_group
used by theblock
in the DP-GEN iteration.Stage scheduler. It guarantees the convergence of the DP-GEN algorithm in each
exploration_stage
. If the exploration is not converged, thestage_scheduler
generateslmp_task_group
andconf_selector
from theexploration_stage
for the next iteration (probably with a different initial condition, i.e. different initial configurations and randomly generated initial velocity).Exploration scheduler. The scheduler for the DP-GEN algorithm. When DP-GEN is converged in one of the stages, it goes to the next stage until all planned stages are used.
How to contribute
Anyone interested in the DPGEN2 project may contribute OPs, workflows, and exploration strategies.
To contribute OPs, one may check the guide on writing operators
To contribute workflows, one may take the DP-GEN workflow as an example. It is implemented in dpgen2/flow/dpgen_loop.py and tested with all operators mocked in test/test_dpgen_loop.py
To contribute the exploration strategy, one may check the guide on writing exploration strategies
Operators
There are two types of OPs in DPGEN2
OP. An execution unit the the workflow. It can be roughly viewed as a piece of Python script taking some input and gives some outputs. An OP cannot be used in the
dflow
until it is embedded in a super-OP.Super-OP. An execution unite that is composed by one or more OP and/or super-OPs.
Techinically, OP is a Python class derived from dflow.python.OP
. It serves as the PythonOPTemplate
of dflow.Step
.
The super-OP is a Python class derived from dflow.Steps
. It contains dflow.Step
s as building blocks, and can be used as OP template to generate a dflow.Step
. The explanation of the concepts dflow.Step
and dflow.Steps
, one may refer to the manual of dflow.
The super-OP PrepRunDPTrain
In the following we will take the PrepRunDPTrain
super-OP as an example to illustrate how to write OPs in DPGEN2.
PrepRunDPTrain
is a super-OP that prepares several DeePMD-kit training tasks, and submit all of them. This super-OP is composed by two dflow.Step
s building from dflow.python.OP
s PrepDPTrain
and RunDPTrain
.
from dflow import (
Step,
Steps,
)
from dflow.python import(
PythonOPTemplate,
OP,
Slices,
)
class PrepRunDPTrain(Steps):
def __init__(
self,
name : str,
prep_train_op : OP,
run_train_op : OP,
prep_train_image : str = "dflow:v1.0",
run_train_image : str = "dflow:v1.0",
):
...
self = _prep_run_dp_train(
self,
self.step_keys,
prep_train_op,
run_train_op,
prep_train_image = prep_train_image,
run_train_image = run_train_image,
)
The construction of the PrepRunDPTrain
takes prepare-training OP
and run-training OP
and their docker images as input, and implemented in internal method _prep_run_dp_train
.
def _prep_run_dp_train(
train_steps,
step_keys,
prep_train_op : OP = PrepDPTrain,
run_train_op : OP = RunDPTrain,
prep_train_image : str = "dflow:v1.0",
run_train_image : str = "dflow:v1.0",
):
prep_train = Step(
...
template=PythonOPTemplate(
prep_train_op,
image=prep_train_image,
...
),
...
)
train_steps.add(prep_train)
run_train = Step(
...
template=PythonOPTemplate(
run_train_op,
image=run_train_image,
...
),
...
)
train_steps.add(run_train)
train_steps.outputs.artifacts["scripts"]._from = run_train.outputs.artifacts["script"]
train_steps.outputs.artifacts["models"]._from = run_train.outputs.artifacts["model"]
train_steps.outputs.artifacts["logs"]._from = run_train.outputs.artifacts["log"]
train_steps.outputs.artifacts["lcurves"]._from = run_train.outputs.artifacts["lcurve"]
return train_steps
In _prep_run_dp_train
, two instances of dflow.Step
, i.e. prep_train
and run_train
, generated from prep_train_op
and run_train_op
, respectively, are added to train_steps
. Both of prep_train_op
and run_train_op
are OPs (python classes derived from dflow.python.OP
s) that will be illustrated later. train_steps
is an instance of dflow.Steps
. The outputs of the second OP run_train
are assigned to the outputs of the train_steps
.
The prep_train
prepares a list of paths, each of which contains all necessary files to start a DeePMD-kit training tasks.
The run_train
slices the list of paths, and assign each item in the list to a DeePMD-kit task. The task is executed by run_train_op
. This is a very nice feature of dflow
, because the developer only needs to implement how one DeePMD-kit task is executed, and then all the items in the task list will be executed in parallel. See the following code to see how it works
run_train = Step(
'run-train',
template=PythonOPTemplate(
run_train_op,
image=run_train_image,
slices = Slices(
"int('{{item}}')",
input_parameter = ["task_name"],
input_artifact = ["task_path", "init_model"],
output_artifact = ["model", "lcurve", "log", "script"],
),
),
parameters={
"config" : train_steps.inputs.parameters["train_config"],
"task_name" : prep_train.outputs.parameters["task_names"],
},
artifacts={
'task_path' : prep_train.outputs.artifacts['task_paths'],
"init_model" : train_steps.inputs.artifacts['init_models'],
"init_data": train_steps.inputs.artifacts['init_data'],
"iter_data": train_steps.inputs.artifacts['iter_data'],
},
with_sequence=argo_sequence(argo_len(prep_train.outputs.parameters["task_names"]), format=train_index_pattern),
key = step_keys['run-train'],
)
The input parameter "task_names"
and artifacts "task_paths"
and "init_model"
are sliced and supplied to each DeePMD-kit task. The output artifacts of the tasks ("model"
, "lcurve"
, "log"
and "script"
) are stacked in the same order as the input lists. These lists are assigned as the outputs of train_steps
by
train_steps.outputs.artifacts["scripts"]._from = run_train.outputs.artifacts["script"]
train_steps.outputs.artifacts["models"]._from = run_train.outputs.artifacts["model"]
train_steps.outputs.artifacts["logs"]._from = run_train.outputs.artifacts["log"]
train_steps.outputs.artifacts["lcurves"]._from = run_train.outputs.artifacts["lcurve"]
The OP RunDPTrain
We will take RunDPTrain
as an example to illustrate how to implement an OP in DPGEN2. The source code of this OP is found here
Firstly of all, an OP should be implemented as a derived class of dflow.python.OP
.
The dflow.python.OP
requires static type define for the input and output variables, i.e. the signatures of an OP. The input and output signatures of the dflow.python.OP
are given by classmethods
get_input_sign
and get_output_sign
.
from dflow.python import (
OP,
OPIO,
OPIOSign,
Artifact,
)
class RunDPTrain(OP):
@classmethod
def get_input_sign(cls):
return OPIOSign({
"config" : dict,
"task_name" : str,
"task_path" : Artifact(Path),
"init_model" : Artifact(Path),
"init_data" : Artifact(List[Path]),
"iter_data" : Artifact(List[Path]),
})
@classmethod
def get_output_sign(cls):
return OPIOSign({
"script" : Artifact(Path),
"model" : Artifact(Path),
"lcurve" : Artifact(Path),
"log" : Artifact(Path),
})
All items not defined as Artifact
are treated as parameters of the OP
. The concept of parameter and artifact are explained in the dflow document. To be short, the artifacts can be pathlib.Path
or a list of pathlib.Path
. The artifacts are passed by the file system. Other data structures are treated as parameters, they are passed as variables encoded in str
. Therefore, a large amout of information should be stored in artifacts, otherwise they can be considered as parameters.
The operation of the OP
is implemented in method execute
, and are run in docker containers. Again taking the execute
method of RunDPTrain
as an example
@OP.exec_sign_check
def execute(
self,
ip : OPIO,
) -> OPIO:
...
task_name = ip['task_name']
task_path = ip['task_path']
init_model = ip['init_model']
init_data = ip['init_data']
iter_data = ip['iter_data']
...
work_dir = Path(task_name)
...
# here copy all files in task_path to work_dir
...
with set_directory(work_dir):
fplog = open('train.log', 'w')
def clean_before_quit():
fplog.close()
# train model
command = ['dp', 'train', train_script_name]
ret, out, err = run_command(command)
if ret != 0:
clean_before_quit()
raise FatalError('dp train failed')
fplog.write(out)
# freeze model
ret, out, err = run_command(['dp', 'freeze', '-o', 'frozen_model.pb'])
if ret != 0:
clean_before_quit()
raise FatalError('dp freeze failed')
fplog.write(out)
clean_before_quit()
return OPIO({
"script" : work_dir / train_script_name,
"model" : work_dir / "frozen_model.pb",
"lcurve" : work_dir / "lcurve.out",
"log" : work_dir / "train.log",
})
The inputs and outputs variables are recorded in data structure dflow.python.OPIO
, which is initialized by a Python dict. The keys in the input/output dict
, and the types of the input/output variables will be checked against their signatures by decorator OP.exec_sign_check
. If any key or type does not match, an exception will be raised.
It is noted that all input artifacts of the OP
are read-only, therefore, the first step of the RunDPTrain.execute
is to copy all necessary input files from the directory task_path
prepared by PrepDPTrain
to the working directory work_dir
.
with_directory
method creates the work_dir
and swithes to the directory before the execution, and then exits the directoy when the task finishes or an error is raised.
In what follows, the training and model frozen bash commands are executed consecutively. The return code is check and a FatalError
is raised if a non-zero code is detected.
Finally the trained model file, input script, learning curve file and the log file are recored in a dflow.python.OPIO
and returned.
Exploration
DPGEN2 allows developers to contribute exploration strategies. The exploration strategy defines how the configuration space is explored by molecular simulations in each DPGEN iteration. Notice that we are not restricted to molecular dynamics, any molecular simulation is, in priciple, allowed. For example, Monte Carlo, enhanced sampling, structure optimization, and so on.
An exploration strategy takes the history of exploration as input, and gives back DPGEN the exploration tasks (we call it task group) and the rule to select configurations from the trajectories generated by the tasks (we call it configuration selector).
One can contribute from three aspects:
The stage scheduler
The exploration task groups
Configuration selector
Stage scheduler
The stage scheduler takes an exploration report passed from the exploration scheduler as input, and tells the exploration scheduler if the exploration in the stage is converged, if not, returns a group of exploration tasks and a configuration selector that are used in the next DPGEN iteration.
Detailed explanation of the concepts are found here.
All the stage schedulers are derived from the abstract base class StageScheduler
. The only interface to be implemented is StageScheduler.plan_next_iteration
. One may check the doc string for the explanation of the interface.
class StageScheduler(ABC):
"""
The scheduler for an exploration stage.
"""
@abstractmethod
def plan_next_iteration(
self,
hist_reports : List[ExplorationReport],
report : ExplorationReport,
confs : List[Path],
) -> Tuple[bool, ExplorationTaskGroup, ConfSelector] :
"""
Make the plan for the next iteration of the stage.
It checks the report of the current and all historical iterations of the stage,
and tells if the iterations are converged.
If not converged, it will plan the next ieration for the stage.
Parameters
----------
hist_reports: List[ExplorationReport]
The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.
report : ExplorationReport
The exploration report of this iteration.
confs: List[Path]
A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.
Returns
-------
converged: bool
If the stage converged.
task: ExplorationTaskGroup
A `ExplorationTaskGroup` defining the exploration of the next iteration. Should be `None` if the stage is converged.
conf_selector: ConfSelector
The configuration selector for the next iteration. Should be `None` if the stage is converged.
"""
One may check more details on the exploratin task group and the configuration selector.
Exploration task groups
DPGEN2 defines a python class ExplorationTask
to manage all necessry files needed to run a exploration task. It can be used as the example provided in the doc string.
class ExplorationTask():
"""Define the files needed by an exploration task.
Examples
--------
>>> # this example dumps all files needed by the task.
>>> files = exploration_task.files()
... for file_name, file_content in files.items():
... with open(file_name, 'w') as fp:
... fp.write(file_content)
"""
A collection of the exploration tasks is called exploration task group. All tasks groups are derived from the base class ExplorationTaskGroup
. The exploration task group can be viewd as a list of ExplorationTask
s, one may get the list by using property ExplorationTaskGroup.task_list
. One may add tasks, or ExplorationTaskGroup to the group by methods ExplorationTaskGroup.add_task
and ExplorationTaskGroup.add_group
, respectively.
class ExplorationTaskGroup(Sequence):
@property
def task_list(self) -> List[ExplorationTask]:
"""Get the `list` of `ExplorationTask`"""
...
def add_task(self, task: ExplorationTask):
"""Add one task to the group."""
...
def add_group(
self,
group : 'ExplorationTaskGroup',
):
"""Add another group to the group."""
...
An example of generating a group of NPT MD simulations may illustrate how to implement the ExplorationTaskGroup
s.
Configuration selector
The configuration selectors are derived from the abstract base class ConfSelector
class ConfSelector(ABC):
"""Select configurations from trajectory and model deviation files.
"""
@abstractmethod
def select (
self,
trajs : List[Path],
model_devis : List[Path],
traj_fmt : str = 'deepmd/npy',
type_map : List[str] = None,
) -> Tuple[List[ Path ], ExplorationReport]:
The abstractmethod to implement is ConfSelector.select
. trajs
and model_devis
are lists of files that recording the simulations trajectories and model deviations respectively. traj_fmt
and type_map
are parameters that may be needed for loading the trajectories by dpdata
.
The ConfSelector.select
returns a Path, each of which can be treated as a dpdata.MultiSystems
, and a ExplorationReport
.
An example of selecting configurations from LAMMPS trajectories may illustrate how to implement the ConfSelector
s.
DPGEN2 API
dpgen2 package
Subpackages
dpgen2.conf package
Submodules
dpgen2.conf.alloy_conf module
- class dpgen2.conf.alloy_conf.AlloyConf(lattice: Union[System, Tuple[str, float]], type_map: List[str], replicate: Optional[Union[List[int], Tuple[int], int]] = None)[source]
Bases:
object
- Parameters
- lattice Union[dpdata.System, Tuple[str,float]]
Lattice of the alloy confs. can be dpdata.System: lattice in dpdata.System Tuple[str, float]: pair of lattice type and lattice constant. lattice type can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”
- replicate Union[List[int], Tuple[int], int]
replicate of the lattice
- type_map List[str]
The type map
Methods
generate_file_content
(numb_confs[, ...])- Parameters
generate_systems
(numb_confs[, ...])- Parameters
- generate_file_content(numb_confs, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0, fmt: str = 'lammps/lmp') List[str] [source]
- Parameters
- numb_confs int
Number of configurations to generate
- concentration List[List[float]] or List[float] or None
If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.
- cell_pert_frac float
fraction of cell perturbation
- atom_pert_dist float
the atom perturbation distance (unit angstrom).
- fmt str
the format of the returned conf strings. Should be one of the formats supported by dpdata
- Returns
- conf_list List[str]
A list of file content of configurations.
- generate_systems(numb_confs, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0) List[str] [source]
- Parameters
- numb_confs int
Number of configurations to generate
- concentration List[List[float]] or List[float] or None
If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.
- cell_pert_frac float
fraction of cell perturbation
- atom_pert_dist float
the atom perturbation distance (unit angstrom).
- Returns
- conf_list List[dpdata.System]
A list of generated confs in dpdata.System.
- class dpgen2.conf.alloy_conf.AlloyConfGenerator(numb_confs, lattice: Union[System, Tuple[str, float]], replicate: Optional[Union[List[int], Tuple[int], int]] = None, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0)[source]
Bases:
ConfGenerator
- Parameters
- numb_confs int
Number of configurations to generate
- lattice Union[dpdata.System, Tuple[str,float]]
Lattice of the alloy confs. can be dpdata.System: lattice in dpdata.System Tuple[str, float]: pair of lattice type and lattice constant. lattice type can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”
- replicate Union[List[int], Tuple[int], int]
replicate of the lattice
- concentration List[List[float]] or List[float] or None
If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.
- cell_pert_frac float
fraction of cell perturbation
- atom_pert_dist float
the atom perturbation distance (unit angstrom).
Methods
generate
(type_map)Method of generating configurations.
get_file_content
(type_map[, fmt])Get the file content of configurations
normalize_config
([data, strict])Normalized the argument.
args
- generate(type_map) MultiSystems [source]
Method of generating configurations.
- Parameters
- type_map: List[str]
The type map.
- Returns
- confs: dpdata.MultiSystems
The returned configurations in dpdata.MultiSystems format
- dpgen2.conf.alloy_conf.generate_alloy_conf_file_content(lattice: Union[System, Tuple[str, float]], type_map: List[str], numb_confs, replicate: Optional[Union[List[int], Tuple[int], int]] = None, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0, fmt: str = 'lammps/lmp')[source]
dpgen2.conf.conf_generator module
- class dpgen2.conf.conf_generator.ConfGenerator[source]
Bases:
ABC
Methods
generate
(type_map)Method of generating configurations.
get_file_content
(type_map[, fmt])Get the file content of configurations
normalize_config
([data, strict])Normalized the argument.
args
- abstract generate(type_map) MultiSystems [source]
Method of generating configurations.
- Parameters
- type_map: List[str]
The type map.
- Returns
- confs: dpdata.MultiSystems
The returned configurations in dpdata.MultiSystems format
dpgen2.conf.file_conf module
- class dpgen2.conf.file_conf.FileConfGenerator(files: Union[str, List[str]], fmt: str = 'auto', prefix: Optional[str] = None, remove_pbc: Optional[bool] = False)[source]
Bases:
ConfGenerator
Methods
generate
(type_map)Method of generating configurations.
get_file_content
(type_map[, fmt])Get the file content of configurations
normalize_config
([data, strict])Normalized the argument.
args
- generate(type_map) MultiSystems [source]
Method of generating configurations.
- Parameters
- type_map: List[str]
The type map.
- Returns
- confs: dpdata.MultiSystems
The returned configurations in dpdata.MultiSystems format
dpgen2.conf.unit_cells module
dpgen2.entrypoint package
Submodules
dpgen2.entrypoint.args module
- dpgen2.entrypoint.args.submit_args(default_step_config={'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}})[source]
dpgen2.entrypoint.common module
dpgen2.entrypoint.download module
dpgen2.entrypoint.main module
- dpgen2.entrypoint.main.main_parser() ArgumentParser [source]
DPGEN2 commandline options argument parser.
- Returns
- argparse.ArgumentParser
the argument parser
Notes
This function is used by documentation.
dpgen2.entrypoint.showkey module
dpgen2.entrypoint.status module
dpgen2.entrypoint.submit module
- dpgen2.entrypoint.submit.make_concurrent_learning_op(train_style: str = 'dp', explore_style: str = 'lmp', fp_style: str = 'vasp', prep_train_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_train_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, prep_explore_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_explore_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, prep_fp_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_fp_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, select_confs_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, collect_data_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, cl_step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
- dpgen2.entrypoint.submit.resubmit_concurrent_learning(wf_config, wfid, list_steps=False, reuse=None, old_style=False, replace_scheduler=False)[source]
dpgen2.entrypoint.watch module
dpgen2.entrypoint.workflow module
dpgen2.exploration package
Subpackages
- class dpgen2.exploration.render.traj_render.TrajRender[source]
Bases:
ABC
Methods
get_confs
(traj, id_selected[, type_map, ...])Get configurations from trajectory by selection.
get_model_devi
(files)Get model deviations from recording files.
- abstract get_confs(traj: List[Path], id_selected: List[List[int]], type_map: Optional[List[str]] = None, conf_filters: Optional[ConfFilters] = None) MultiSystems [source]
Get configurations from trajectory by selection.
- Parameters
- traj: List[Path]
Trajectory files
- id_selected: List[List[int]]
The selected frames. id_selected[ii][jj] is the jj-th selected frame from the ii-th trajectory. id_selected[ii] may be an empty list.
- type_map: List[str]
The type map.
- Returns
- ms: dpdata.MultiSystems
The configurations in dpdata.MultiSystems format
- abstract get_model_devi(files: List[Path]) Tuple[List[ndarray], Optional[List[ndarray]]] [source]
Get model deviations from recording files.
- Parameters
- files: List[Path]
The paths to the model deviation recording files
- Returns
- model_devis: Tuple[List[np.array], Union[List[np.array],None]]
A tuple. model_devis[0] is the force model deviations, model_devis[1] is the virial model deviations. The model_devis[1] can be None. If not None, model_devis[i] is List[np.array], where np.array is a one-dimensional array. The first dimension of model_devis[i] is the trajectory (same size as len(files)), while the second dimension is the frame.
- class dpgen2.exploration.render.traj_render_lammps.TrajRenderLammps(nopbc: bool = False)[source]
Bases:
TrajRender
Methods
get_confs
(trajs, id_selected[, type_map, ...])Get configurations from trajectory by selection.
get_model_devi
(files)Get model deviations from recording files.
- get_confs(trajs: List[Path], id_selected: List[List[int]], type_map: Optional[List[str]] = None, conf_filters: Optional[ConfFilters] = None) MultiSystems [source]
Get configurations from trajectory by selection.
- Parameters
- traj: List[Path]
Trajectory files
- id_selected: List[List[int]]
The selected frames. id_selected[ii][jj] is the jj-th selected frame from the ii-th trajectory. id_selected[ii] may be an empty list.
- type_map: List[str]
The type map.
- Returns
- ms: dpdata.MultiSystems
The configurations in dpdata.MultiSystems format
- get_model_devi(files: List[Path]) Tuple[List[ndarray], Optional[List[ndarray]]] [source]
Get model deviations from recording files.
- Parameters
- files: List[Path]
The paths to the model deviation recording files
- Returns
- model_devis: Tuple[List[np.array], Union[List[np.array],None]]
A tuple. model_devis[0] is the force model deviations, model_devis[1] is the virial model deviations. The model_devis[1] can be None. If not None, model_devis[i] is List[np.array], where np.array is a one-dimensional array. The first dimension of model_devis[i] is the trajectory (same size as len(files)), while the second dimension is the frame.
- class dpgen2.exploration.report.report.ExplorationReport[source]
Bases:
ABC
Methods
clear
()Clear the report
If the exploration is converged
get_candidate_ids
([max_nframes])Get indexes of candidate configurations
If no candidate configuration is found
print
(stage_idx, idx_in_stage, iter_idx)Print the report
Print the header of report
record
(md_f[, md_v])Record the model deviations of the trajectories
- abstract get_candidate_ids(max_nframes: Optional[int] = None) List[List[int]] [source]
Get indexes of candidate configurations
- Parameters
- max_nframes int
The maximal number of frames of candidates.
- Returns
- idx: List[List[int]]
The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.
- abstract record(md_f: List[ndarray], md_v: Optional[List[ndarray]] = None)[source]
Record the model deviations of the trajectories
- Parameters
- mdfList[np.ndarray]
The force model deviations. mdf[ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory.
- mdvOptional[List[np.ndarray]]
The virial model deviations. mdv[ii][jj] is the virial model deviation of the jj-th frame of the ii-th trajectory.
- class dpgen2.exploration.report.report_trust_levels.ExplorationReportTrustLevels(trust_level, conv_accuracy)[source]
Bases:
ExplorationReport
Methods
clear
()Clear the report
If the exploration is converged
get_candidate_ids
([max_nframes])Get indexes of candidate configurations
no_candidate
()If no candidate configuration is found
print
(stage_idx, idx_in_stage, iter_idx)Print the report
Print the header of report
record
(md_f[, md_v_])Record the model deviations of the trajectories
accurate_ratio
candidate_ratio
failed_ratio
- get_candidate_ids(max_nframes: Optional[int] = None) List[List[int]] [source]
Get indexes of candidate configurations
- Parameters
- max_nframes int
The maximal number of frames of candidates.
- Returns
- idx: List[List[int]]
The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.
- record(md_f: List[ndarray], md_v_: Optional[List[ndarray]] = None)[source]
Record the model deviations of the trajectories
- Parameters
- mdfList[np.ndarray]
The force model deviations. mdf[ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory.
- mdvOptional[List[np.ndarray]]
The virial model deviations. mdv[ii][jj] is the virial model deviation of the jj-th frame of the ii-th trajectory.
- class dpgen2.exploration.scheduler.convergence_check_stage_scheduler.ConvergenceCheckStageScheduler(stage: ExplorationStage, selector: ConfSelector, max_numb_iter: Optional[int] = None, fatal_at_max: bool = True)[source]
Bases:
StageScheduler
Methods
complete
()Tell if the stage is complete
Tell if the stage is converged
For complete the stage
Return all exploration reports
Return the index of the next iteration
plan_next_iteration
([report, trajs])Make the plan for the next iteration of the stage.
reached_max_iteration
- get_reports()[source]
Return all exploration reports
- Returns
- reports List[ExplorationReport]
the reports
- next_iteration()[source]
Return the index of the next iteration
- Returns
- index int
the index of the next iteration
- plan_next_iteration(report: Optional[ExplorationReport] = None, trajs: Optional[List[Path]] = None) Tuple[bool, Optional[ExplorationTaskGroup], Optional[ConfSelector]] [source]
Make the plan for the next iteration of the stage.
It checks the report of the current and all historical iterations of the stage, and tells if the iterations are converged. If not converged, it will plan the next ieration for the stage.
- Parameters
- hist_reports: List[ExplorationReport]
The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.
- reportExplorationReport
The exploration report of this iteration.
- confs: List[Path]
A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.
- Returns
- stg_complete: bool
If the stage completed. Two cases may happen: 1. converged. 2. when not fatal_at_max, not converged but reached max number of iterations.
- task: ExplorationTaskGroup
A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if the stage is converged.
- conf_selector: ConfSelector
The configuration selector for the next iteration. Should be None if the stage is converged.
- class dpgen2.exploration.scheduler.scheduler.ExplorationScheduler[source]
Bases:
object
The exploration scheduler.
Methods
add_stage_scheduler
(stage_scheduler)Add stage scheduler.
complete
()Tell if all stages are converged.
Force complete the current stage
Get the accurate, candidate and failed ratios of the iterations
Get the index of the current iteration.
Get the index of current stage.
Get the stage index and the index in the stage of iterations.
plan_next_iteration
([report, trajs])Make the plan for the next DPGEN iteration.
print_convergence
print_last_iteration
- add_stage_scheduler(stage_scheduler: StageScheduler)[source]
Add stage scheduler.
All added schedulers can be treated as a list (order matters). Only one stage is converged, the iteration goes to the next iteration.
- Parameters
- stage_scheduler: StageScheduler
The added stage scheduler
- get_convergence_ratio()[source]
Get the accurate, candidate and failed ratios of the iterations
- Returns
- accu np.ndarray
The accurate ratio. length of array the same as # iterations.
- cand np.ndarray
The candidate ratio. length of array the same as # iterations.
- fail np.ndarray
The failed ration. length of array the same as # iterations.
- get_iteration()[source]
Get the index of the current iteration.
Iteration index increase when self.plan_next_iteration returns valid lmp_task_grp and conf_selector for the next iteration.
- get_stage()[source]
Get the index of current stage.
Stage index increases when the previous stage converges. Usually called after self.plan_next_iteration.
- plan_next_iteration(report: Optional[ExplorationReport] = None, trajs: Optional[List[Path]] = None) Tuple[bool, Optional[ExplorationTaskGroup], Optional[ConfSelector]] [source]
Make the plan for the next DPGEN iteration.
- Parameters
- reportExplorationReport
The exploration report of this iteration.
- confs: List[Path]
A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.
- Returns
- complete: bool
If all the DPGEN stages complete.
- task: ExplorationTaskGroup
A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if converged.
- conf_selector: ConfSelector
The configuration selector for the next iteration. Should be None if converged.
- class dpgen2.exploration.scheduler.stage_scheduler.StageScheduler[source]
Bases:
ABC
The scheduler for an exploration stage.
Methods
complete
()Tell if the stage is complete
Tell if the stage is converged
For complete the stage
Return all exploration reports
Return the index of the next iteration
plan_next_iteration
(report, trajs)Make the plan for the next iteration of the stage.
- abstract complete() bool [source]
Tell if the stage is complete
- Returns
- converged bool
if the stage is complete
- abstract converged() bool [source]
Tell if the stage is converged
- Returns
- converged bool
the convergence
- abstract get_reports() List[ExplorationReport] [source]
Return all exploration reports
- Returns
- reports List[ExplorationReport]
the reports
- abstract next_iteration() int [source]
Return the index of the next iteration
- Returns
- index int
the index of the next iteration
- abstract plan_next_iteration(report: ExplorationReport, trajs: List[Path]) Tuple[bool, ExplorationTaskGroup, ConfSelector] [source]
Make the plan for the next iteration of the stage.
It checks the report of the current and all historical iterations of the stage, and tells if the iterations are converged. If not converged, it will plan the next ieration for the stage.
- Parameters
- hist_reports: List[ExplorationReport]
The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.
- reportExplorationReport
The exploration report of this iteration.
- confs: List[Path]
A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.
- Returns
- stg_complete: bool
If the stage completed. Two cases may happen: 1. converged. 2. when not fatal_at_max, not converged but reached max number of iterations.
- task: ExplorationTaskGroup
A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if the stage is converged.
- conf_selector: ConfSelector
The configuration selector for the next iteration. Should be None if the stage is converged.
- class dpgen2.exploration.selector.conf_filter.ConfFilter[source]
Bases:
ABC
Methods
check
(coords, cell, atom_types, nopbc)Check if the configuration is valid.
- abstract check(coords: ndarray, cell: ndarray, atom_types: ndarray, nopbc: bool) bool [source]
Check if the configuration is valid.
- Parameters
- coordsnumpy.array
The coordinates, numpy array of shape natoms x 3
- cellnumpy.array
The cell tensor. numpy array of shape 3 x 3
- atom_typesnumpy.array
The atom types. numpy array of shape natoms
- nopbcbool
If no periodic boundary condition.
- Returns
- validbool
True if the configuration is a valid configuration, else False.
- class dpgen2.exploration.selector.conf_selector_frame.ConfSelectorFrames(traj_render: TrajRender, report: ExplorationReport, max_numb_sel: Optional[int] = None, conf_filters: Optional[ConfFilters] = None)[source]
Bases:
ConfSelector
Select frames from trajectories as confs.
Parameters: trust_level: TrustLevel
The trust level
- conf_filter: ConfFilters
The configuration filter
Methods
select
(trajs, model_devis[, type_map])Select configurations
- select(trajs: List[Path], model_devis: List[Path], type_map: Optional[List[str]] = None) Tuple[List[Path], ExplorationReport] [source]
Select configurations
- Parameters
- trajsList[Path]
A list of Path to trajectory files generated by LAMMPS
- model_devisList[Path]
A list of Path to model deviation files generated by LAMMPS. Format: each line has 7 numbers they are used as # frame_id md_v_max md_v_min md_v_mean md_f_max md_f_min md_f_mean where md stands for model deviation, v for virial and f for force
- type_mapList[str]
The type_map of the systems
- Returns
- confsList[Path]
The selected confgurations, stored in a folder in deepmd/npy format, can be parsed as dpdata.MultiSystems. The list only has one item.
- reportExplorationReport
The exploration report recoding the status of the exploration.
- dpgen2.exploration.task.lmp.lmp_input.make_lmp_input(conf_file: str, ensemble: str, graphs: List[str], nsteps: int, dt: float, neidelay: Optional[int], trj_freq: int, mass_map: List[float], temp: float, tau_t: float = 0.1, pres: Optional[float] = None, tau_p: float = 0.5, use_clusters: bool = False, relative_f_epsilon: Optional[float] = None, relative_v_epsilon: Optional[float] = None, pka_e: Optional[float] = None, ele_temp_f: Optional[float] = None, ele_temp_a: Optional[float] = None, nopbc: bool = False, max_seed: int = 1000000, deepmd_version='2.0', trj_seperate_files=True)[source]
- class dpgen2.exploration.task.conf_sampling_task_group.ConfSamplingTaskGroup[source]
Bases:
ExplorationTaskGroup
- Attributes
task_list
Get the list of ExplorationTask
Methods
add_group
(group)Add another group to the group.
add_task
(task)Add one task to the group.
count
(value)index
(value, [start, [stop]])Raises ValueError if the value is not present.
set_conf
(conf_list[, n_sample, random_sample])Set the configurations of exploration
clear
- set_conf(conf_list: List[str], n_sample: Optional[int] = None, random_sample: bool = False)[source]
Set the configurations of exploration
- Parameters
- conf_list str
A list of file contents
- n_sample int
Number of samples drawn from the conf list each time make_task is called. If set to None, n_sample is set to length of the conf_list.
- random_sample bool
If true the confs are randomly sampled, otherwise are consecutively sampled from the conf_list
- class dpgen2.exploration.task.lmp_template_task_group.LmpTemplateTaskGroup[source]
Bases:
ConfSamplingTaskGroup
- Attributes
task_list
Get the list of ExplorationTask
Methods
add_group
(group)Add another group to the group.
add_task
(task)Add one task to the group.
count
(value)index
(value, [start, [stop]])Raises ValueError if the value is not present.
set_conf
(conf_list[, n_sample, random_sample])Set the configurations of exploration
clear
make_cont
make_task
set_lmp
- make_task() ExplorationTaskGroup [source]
- class dpgen2.exploration.task.npt_task_group.NPTTaskGroup[source]
Bases:
ConfSamplingTaskGroup
- Attributes
task_list
Get the list of ExplorationTask
Methods
add_group
(group)Add another group to the group.
add_task
(task)Add one task to the group.
count
(value)index
(value, [start, [stop]])Raises ValueError if the value is not present.
Make the LAMMPS task group.
set_conf
(conf_list[, n_sample, random_sample])Set the configurations of exploration
set_md
(numb_models, mass_map, temps[, ...])Set MD parameters
clear
- make_task() ExplorationTaskGroup [source]
Make the LAMMPS task group.
- Returns
- task_grp: ExplorationTaskGroup
The returned lammps task group. The number of tasks is nconf*nT*nP. nconf is set by n_sample parameter of set_conf. nT and nP are lengths of the temps and press parameters of set_md.
- set_md(numb_models, mass_map, temps: List[float], press: Optional[List[float]] = None, ens: str = 'npt', dt: float = 0.001, nsteps: int = 1000, trj_freq: int = 10, tau_t: float = 0.1, tau_p: float = 0.5, pka_e: Optional[float] = None, neidelay: Optional[int] = None, no_pbc: bool = False, use_clusters: bool = False, relative_f_epsilon: Optional[float] = None, relative_v_epsilon: Optional[float] = None, ele_temp_f: Optional[float] = None, ele_temp_a: Optional[float] = None)[source]
Set MD parameters
- class dpgen2.exploration.task.stage.ExplorationStage[source]
Bases:
object
The exploration stage.
Methods
add_task_group
(grp)Add an exploration group
clear
()Clear all exploration group.
Make the LAMMPS task group.
- add_task_group(grp: ExplorationTaskGroup)[source]
Add an exploration group
- Parameters
- grp: ExplorationTaskGroup
The added exploration task group
- make_task() ExplorationTaskGroup [source]
Make the LAMMPS task group.
- Returns
- task_grp: ExplorationTaskGroup
The returned lammps task group. The number of tasks is equal to the summation of task groups defined by all the exploration groups added to the stage.
- class dpgen2.exploration.task.task.ExplorationTask[source]
Bases:
object
Define the files needed by an exploration task.
Examples
>>> # this example dumps all files needed by the task. >>> files = exploration_task.files() ... for file_name, file_content in files.items(): ... with open(file_name, 'w') as fp: ... fp.write(file_content)
Methods
add_file
(fname, fcont)Add file to the task
files
()Get all files for the task.
- class dpgen2.exploration.task.task.ExplorationTaskGroup[source]
Bases:
Sequence
A group of exploration tasks. Implemented as a list of ExplorationTask.
- Attributes
task_list
Get the list of ExplorationTask
Methods
add_group
(group)Add another group to the group.
add_task
(task)Add one task to the group.
count
(value)index
(value, [start, [stop]])Raises ValueError if the value is not present.
clear
- add_group(group: ExplorationTaskGroup)[source]
Add another group to the group.
- add_task(task: ExplorationTask)[source]
Add one task to the group.
- property task_list: List[ExplorationTask]
Get the list of ExplorationTask
- class dpgen2.exploration.task.task.FooTask(conf_name='conf.lmp', conf_cont='', inpu_name='in.lammps', inpu_cont='')[source]
Bases:
ExplorationTask
Methods
add_file
(fname, fcont)Add file to the task
files
()Get all files for the task.
- class dpgen2.exploration.task.task.FooTaskGroup(numb_task)[source]
Bases:
ExplorationTaskGroup
- Attributes
task_list
Get the list of ExplorationTask
Methods
add_group
(group)Add another group to the group.
add_task
(task)Add one task to the group.
count
(value)index
(value, [start, [stop]])Raises ValueError if the value is not present.
clear
- property task_list
Get the list of ExplorationTask
dpgen2.flow package
Submodules
dpgen2.flow.dpgen_loop module
- class dpgen2.flow.dpgen_loop.ConcurrentLearning(name: str, block_op: OPTemplate, step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
Bases:
Steps
- Attributes
- init_keys
- input_artifacts
- input_parameters
- loop_keys
- output_artifacts
- output_parameters
Methods
add
(step)Add a step or a list of parallel steps to the steps
convert_to_argo
handle_key
run
- property init_keys
- property input_artifacts
- property input_parameters
- property loop_keys
- property output_artifacts
- property output_parameters
- class dpgen2.flow.dpgen_loop.ConcurrentLearningLoop(name: str, block_op: OPTemplate, step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
Bases:
Steps
- Attributes
- input_artifacts
- input_parameters
- keys
- output_artifacts
- output_parameters
Methods
add
(step)Add a step or a list of parallel steps to the steps
convert_to_argo
handle_key
run
- property input_artifacts
- property input_parameters
- property keys
- property output_artifacts
- property output_parameters
- class dpgen2.flow.dpgen_loop.MakeBlockId(*args, **kwargs)[source]
Bases:
OP
Methods
execute
(ip)Run the OP
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- class dpgen2.flow.dpgen_loop.SchedulerWrapper(*args, **kwargs)[source]
Bases:
OP
Methods
execute
(ip)Run the OP
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
dpgen2.fp package
Submodules
dpgen2.fp.gaussian module
Prep and Run Gaussian tasks.
- class dpgen2.fp.gaussian.GaussianInputs(**kwargs: Any)[source]
Bases:
object
Methods
args
()The arguments of the GaussianInputs class.
- class dpgen2.fp.gaussian.PrepGaussian(*args, **kwargs)[source]
Bases:
PrepFp
Methods
execute
(ip)Execute the OP.
get_input_sign
()Get the signature of the inputs
get_output_sign
()Get the signature of the outputs
prep_task
(conf_frame, inputs)Define how one Gaussian task is prepared.
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- prep_task(conf_frame: System, inputs: GaussianInputs)[source]
Define how one Gaussian task is prepared.
- Parameters
- conf_framedpdata.System
One frame of configuration in the dpdata format.
- inputs: GaussianInputs
The GaussianInputs object handels all other input files of the task.
- class dpgen2.fp.gaussian.RunGaussian(*args, **kwargs)[source]
Bases:
RunFp
Methods
args
()The argument definition of the run_task method.
execute
(ip)Execute the OP.
get_input_sign
()Get the signature of the inputs
get_output_sign
()Get the signature of the outputs
The mandatory input files to run a Gaussian task.
normalize_config
([data, strict])Normalized the argument.
The optional input files to run a Gaussian task.
run_task
(command, out)Defines how one FP task runs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- static args() List[Argument] [source]
The argument definition of the run_task method.
- Returns
- arguments: List[dargs.Argument]
List of dargs.Argument defines the arguments of run_task method.
- input_files() List[str] [source]
The mandatory input files to run a Gaussian task.
- Returns
- files: List[str]
A list of madatory input files names.
- optional_input_files() List[str] [source]
The optional input files to run a Gaussian task.
- Returns
- files: List[str]
A list of optional input files names.
- run_task(command: str, out: str) Tuple[str, str] [source]
Defines how one FP task runs
- Parameters
- command: str
The command of running gaussian task
- out: str
The name of the output data file.
- Returns
- out_name: str
The file name of the output data in the dpdata.LabeledSystem format.
- log_name: str
The file name of the log.
dpgen2.fp.prep_fp module
- class dpgen2.fp.prep_fp.PrepFp(*args, **kwargs)[source]
-
Prepares the working directories for first-principles (FP) tasks.
A list of (same length as ip[“confs”]) working directories containing all files needed to start FP tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
prep_task
(conf_frame, inputs)Define how one FP task is prepared.
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components:
config : (dict) Should have config[‘inputs’], which defines the input files of the FP task.
confs : (Artifact(List[Path])) Configurations for the FP tasks. Stored in folders as deepmd/npy format. Can be parsed as dpdata.MultiSystems.
- Returns
- opdict
Output dict with components:
task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.
task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the FP. The order fo the Paths should be consistent with op[“task_names”]
- abstract prep_task(conf_frame: System, inputs: Any)[source]
Define how one FP task is prepared.
- Parameters
- conf_framedpdata.System
One frame of configuration in the dpdata format.
- inputs: Any
The class object handels all other input files of the task. For example, pseudopotential file, k-point file and so on.
dpgen2.fp.run_fp module
- class dpgen2.fp.run_fp.RunFp(*args, **kwargs)[source]
-
Execute a first-principles (FP) task.
A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The FP command is exectuted from directory task_name. The op[“labeled_data”] in “deepmd/npy” format (HF5 in the future) provided by dpdata will be created.
Methods
args
()The argument definition of the run_task method.
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
The mandatory input files to run a FP task.
normalize_config
([data, strict])Normalized the argument.
The optional input files to run a FP task.
run_task
(**kwargs)Defines how one FP task runs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- abstract static args() List[Argument] [source]
The argument definition of the run_task method.
- Returns
- arguments: List[dargs.Argument]
List of dargs.Argument defines the arguments of run_task method.
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components:
config: (dict) The config of FP task. Should have config[‘run’], which defines the runtime configuration of the FP task.
task_name: (str) The name of task.
task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepFp.
- Returns
- Output dict with components:
- log: (Artifact(Path)) The log file of FP.
- labeled_data: (Artifact(Path)) The path to the labeled data in “deepmd/npy” format provided by dpdata.
- abstract input_files() List[str] [source]
The mandatory input files to run a FP task.
- Returns
- files: List[str]
A list of madatory input files names.
- classmethod normalize_config(data: Dict = {}, strict: bool = True) Dict [source]
Normalized the argument.
- Parameters
- data: Dict
The input dict of arguments.
- strict: bool
Strictly check the arguments.
- Returns
- data: Dict
The normalized arguments.
- abstract optional_input_files() List[str] [source]
The optional input files to run a FP task.
- Returns
- files: List[str]
A list of optional input files names.
- abstract run_task(**kwargs) Tuple[str, str] [source]
Defines how one FP task runs
- Parameters
- kwargs
Keyword args defined by the developer. The fp/run_config session of the input file will be passed to this function.
- Returns
- out_name: str
The file name of the output data. Should be in dpdata.LabeledSystem format.
- log_name: str
The file name of the log.
dpgen2.fp.vasp module
- class dpgen2.fp.vasp.PrepVasp(*args, **kwargs)[source]
Bases:
PrepFp
Methods
execute
(ip)Execute the OP.
get_input_sign
()Get the signature of the inputs
get_output_sign
()Get the signature of the outputs
prep_task
(conf_frame, vasp_inputs)Define how one Vasp task is prepared.
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- prep_task(conf_frame: System, vasp_inputs: VaspInputs)[source]
Define how one Vasp task is prepared.
- Parameters
- conf_framedpdata.System
One frame of configuration in the dpdata format.
- inputs: VaspInputs
The VaspInputs object handels all other input files of the task.
- class dpgen2.fp.vasp.RunVasp(*args, **kwargs)[source]
Bases:
RunFp
Methods
args
()The argument definition of the run_task method.
execute
(ip)Execute the OP.
get_input_sign
()Get the signature of the inputs
get_output_sign
()Get the signature of the outputs
The mandatory input files to run a vasp task.
normalize_config
([data, strict])Normalized the argument.
The optional input files to run a vasp task.
run_task
(command, out, log)Defines how one FP task runs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- static args()[source]
The argument definition of the run_task method.
- Returns
- arguments: List[dargs.Argument]
List of dargs.Argument defines the arguments of run_task method.
- input_files() List[str] [source]
The mandatory input files to run a vasp task.
- Returns
- files: List[str]
A list of madatory input files names.
- optional_input_files() List[str] [source]
The optional input files to run a vasp task.
- Returns
- files: List[str]
A list of optional input files names.
- run_task(command: str, out: str, log: str) Tuple[str, str] [source]
Defines how one FP task runs
- Parameters
- command: str
The command of running vasp task
- out: str
The name of the output data file.
- log: str
The name of the log file
- Returns
- out_name: str
The file name of the output data in the dpdata.LabeledSystem format.
- log_name: str
The file name of the log.
dpgen2.fp.vasp_input module
- class dpgen2.fp.vasp_input.VaspInputs(kspacing: Union[float, List[float]], incar: str, pp_files: Dict[str, str], kgamma: bool = True)[source]
Bases:
object
- Attributes
- incar_template
- potcars
Methods
args
incar_from_file
make_kpoints
make_potcar
normalize_config
potcars_from_file
- property incar_template
- property potcars
dpgen2.op package
Submodules
dpgen2.op.collect_data module
- class dpgen2.op.collect_data.CollectData(*args, **kwargs)[source]
Bases:
OP
Collect labeled data and add to the iteration dataset.
After running FP tasks, the labeled data are scattered in task directories. This OP collect the labeled data in one data directory and add it to the iteration data. The data generated by this iteration will be place in ip[“name”] subdirectory of the iteration data directory.
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- execute(ip: OPIO) OPIO [source]
Execute the OP. This OP collect data scattered in directories given by ip[‘labeled_data’] in to one dpdata.Multisystems and store it in a directory named name. This directory is appended to the list iter_data.
- Parameters
- ipdict
Input dict with components:
name: (str) The name of this iteration. The data generated by this iteration will be place in a sub-directory of name.
labeled_data: (Artifact(List[Path])) The paths of labeled data generated by FP tasks of the current iteration.
iter_data: (Artifact(List[Path])) The data paths previous iterations.
- Returns
- Output dict with components:
- iter_data: (Artifact(List[Path])) The data paths of previous and the current iteration data.
dpgen2.op.md_settings module
- class dpgen2.op.md_settings.MDSettings(ens: str, dt: float, nsteps: int, trj_freq: int, temps: Optional[List[float]] = None, press: Optional[List[float]] = None, tau_t: float = 0.1, tau_p: float = 0.5, pka_e: Optional[float] = None, neidelay: Optional[int] = None, no_pbc: bool = False, use_clusters: bool = False, relative_epsilon: Optional[float] = None, relative_v_epsilon: Optional[float] = None, ele_temp_f: Optional[float] = None, ele_temp_a: Optional[float] = None)[source]
Bases:
object
Methods
to_str
dpgen2.op.prep_dp_train module
- class dpgen2.op.prep_dp_train.PrepDPTrain(*args, **kwargs)[source]
Bases:
OP
Prepares the working directories for DP training tasks.
A list of (numb_models) working directories containing all files needed to start training tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components:
template_script: (str or List[str]) A template of the training script. Can be a str or List[str]. In the case of str, all training tasks share the same training input template, the only difference is the random number used to initialize the network parameters. In the case of List[str], one training task uses one template from the list. The random numbers used to initialize the network parameters are differnt. The length of the list should be the same as numb_models.
numb_models: (int) Number of DP models to train.
- Returns
- opdict
Output dict with components:
task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.
task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. The order fo the Paths should be consistent with op[“task_names”]
dpgen2.op.prep_lmp module
- class dpgen2.op.prep_lmp.PrepLmp(*args, **kwargs)[source]
Bases:
OP
Prepare the working directories for LAMMPS tasks.
A list of working directories (defined by ip[“task”]) containing all files needed to start LAMMPS tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components: - lmp_task_grp : (Artifact(Path)) Can be pickle loaded as a ExplorationTaskGroup. Definitions for LAMMPS tasks
- Returns
- opdict
Output dict with components:
task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.
task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the LAMMPS simulation. The order fo the Paths should be consistent with op[“task_names”]
dpgen2.op.run_dp_train module
- class dpgen2.op.run_dp_train.RunDPTrain(*args, **kwargs)[source]
Bases:
OP
Execute a DP training task. Train and freeze a DP model.
A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The DeePMD-kit training and freezing commands are exectuted from directory task_name.
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
decide_init_model
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
normalize_config
skip_training
training_args
write_data_to_input_script
write_other_to_input_script
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components:
config: (dict) The config of training task. Check RunDPTrain.training_args for definitions.
task_name: (str) The name of training task.
task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepDPTrain.
init_model: (Artifact(Path)) A frozen model to initialize the training.
init_data: (Artifact(List[Path])) Initial training data.
iter_data: (Artifact(List[Path])) Training data generated in the DPGEN iterations.
- Returns
- Output dict with components:
- script: (Artifact(Path)) The training script.
- model: (Artifact(Path)) The trained frozen model.
- lcurve: (Artifact(Path)) The learning curve file.
- log: (Artifact(Path)) The log file of training.
- dpgen2.op.run_dp_train.config_args()
dpgen2.op.run_lmp module
- class dpgen2.op.run_lmp.RunLmp(*args, **kwargs)[source]
Bases:
OP
Execute a LAMMPS task.
A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The LAMMPS command is exectuted from directory task_name. The trajectory and the model deviation will be stored in files op[“traj”] and op[“model_devi”], respectively.
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
lmp_args
normalize_config
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components:
config: (dict) The config of lmp task. Check RunLmp.lmp_args for definitions.
task_name: (str) The name of the task.
task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepLmp.
models: (Artifact(List[Path])) The frozen model to estimate the model deviation. The first model with be used to drive molecular dynamics simulation.
- Returns
- Output dict with components:
- log: (Artifact(Path)) The log file of LAMMPS.
- traj: (Artifact(Path)) The output trajectory.
- model_devi: (Artifact(Path)) The model deviation. The order of recorded model deviations should be consistent with the order of frames in traj.
- dpgen2.op.run_lmp.config_args()
dpgen2.op.select_confs module
- class dpgen2.op.select_confs.SelectConfs(*args, **kwargs)[source]
Bases:
OP
Select configurations from exploration trajectories for labeling.
Methods
execute
(ip)Execute the OP.
Get the signature of the inputs
Get the signature of the outputs
exec_sign_check
function
get_info
get_input_artifact_link
get_input_artifact_storage_key
get_opio_info
get_output_artifact_link
get_output_artifact_storage_key
- execute(ip: OPIO) OPIO [source]
Execute the OP.
- Parameters
- ipdict
Input dict with components:
conf_selector: (ConfSelector) Configuration selector.
type_map: (List[str]) The type map.
trajs: (Artifact(List[Path])) The trajectories generated in the exploration.
model_devis: (Artifact(List[Path])) The file storing the model deviation of the trajectory. The order of model deviation storage is consistent with that of the trajectories. The order of frames of one model deviation storage is also consistent with tat of the corresponding trajectory.
- Returns
- Output dict with components:
- report: (ExplorationReport) The report on the exploration.
- conf: (Artifact(List[Path])) The selected configurations.
dpgen2.superop package
Submodules
dpgen2.superop.block module
- class dpgen2.superop.block.ConcurrentLearningBlock(name: str, prep_run_dp_train_op: OPTemplate, prep_run_lmp_op: OPTemplate, select_confs_op: OP, prep_run_fp_op: OPTemplate, collect_data_op: OP, select_confs_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, collect_data_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
Bases:
Steps
- Attributes
- input_artifacts
- input_parameters
- keys
- output_artifacts
- output_parameters
Methods
add
(step)Add a step or a list of parallel steps to the steps
convert_to_argo
handle_key
run
- property input_artifacts
- property input_parameters
- property keys
- property output_artifacts
- property output_parameters
dpgen2.superop.prep_run_dp_train module
- class dpgen2.superop.prep_run_dp_train.PrepRunDPTrain(name: str, prep_train_op: OP, run_train_op: OP, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
Bases:
Steps
- Attributes
- input_artifacts
- input_parameters
- keys
- output_artifacts
- output_parameters
Methods
add
(step)Add a step or a list of parallel steps to the steps
convert_to_argo
handle_key
run
- property input_artifacts
- property input_parameters
- property keys
- property output_artifacts
- property output_parameters
dpgen2.superop.prep_run_fp module
- class dpgen2.superop.prep_run_fp.PrepRunFp(name: str, prep_op: OP, run_op: OP, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
Bases:
Steps
- Attributes
- input_artifacts
- input_parameters
- keys
- output_artifacts
- output_parameters
Methods
add
(step)Add a step or a list of parallel steps to the steps
convert_to_argo
handle_key
run
- property input_artifacts
- property input_parameters
- property keys
- property output_artifacts
- property output_parameters
dpgen2.superop.prep_run_lmp module
- class dpgen2.superop.prep_run_lmp.PrepRunLmp(name: str, prep_op: OP, run_op: OP, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
Bases:
Steps
- Attributes
- input_artifacts
- input_parameters
- keys
- output_artifacts
- output_parameters
Methods
add
(step)Add a step or a list of parallel steps to the steps
convert_to_argo
handle_key
run
- property input_artifacts
- property input_parameters
- property keys
- property output_artifacts
- property output_parameters
dpgen2.utils package
Submodules
dpgen2.utils.bohrium_config module
dpgen2.utils.chdir module
dpgen2.utils.dflow_config module
dpgen2.utils.dflow_query module
- dpgen2.utils.dflow_query.find_slice_ranges(keys: List[str], sliced_subkey: str)[source]
find range of sliced OPs that matches the pattern ‘iter-[0-9]*–{sliced_subkey}-[0-9]*’
- dpgen2.utils.dflow_query.get_all_schedulers(wf: Any, keys: List[str])[source]
get the output Scheduler of the all the iterations
- dpgen2.utils.dflow_query.get_last_iteration(keys: List[str])[source]
get the index of the last iteraction from a list of step keys.
- dpgen2.utils.dflow_query.get_last_scheduler(wf: Any, keys: List[str])[source]
get the output Scheduler of the last successful iteration
- dpgen2.utils.dflow_query.matched_step_key(all_keys: List[str], step_keys: Optional[List[str]] = None)[source]
returns the keys in all_keys that matches any of the step_keys
dpgen2.utils.download_dpgen2_artifacts module
- class dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition[source]
Bases:
object
Methods
add_def
add_input
add_output
- dpgen2.utils.download_dpgen2_artifacts.download_dpgen2_artifacts(wf: Workflow, key: str, prefix: Optional[str] = None, chk_pnt: bool = False)[source]
download the artifacts of a step. the key should be of format ‘iter-xxxxxx–subkey-of-step-xxxxxx’ the input and output artifacts will be downloaded to prefix/iter-xxxxxx/key-of-step/inputs/ and prefix/iter-xxxxxx/key-of-step/outputs/
the downloaded input and output artifacts of steps are defined by op_download_setting