DPGEN2’s documentation

DPGEN2 is the 2nd generation of the Deep Potential GENerator.

Important

The project DeePMD-kit is licensed under GNU LGPLv3.0.

Guide on dpgen2 commands

One may use dpgen2 through command line interface. A full documentation of the cli is found here

Submit a workflow

The dpgen2 workflow can be submitted via the submit command

dpgen2 submit input.json

where input.json is the input script. A guide of writing the script is found here. When a workflow is submitted, a ID (WFID) of the workflow will be printed for later reference.

Check the convergence of a workflow

The convergence of stages of the workflow can be checked by the status command. It prints the indexes of the finished stages, iterations, and the accurate, candidate and failed ratio of explored configurations of each iteration.

$ dpgen2 status input.json WFID
#   stage  id_stg.    iter.      accu.      cand.      fail.
# Stage    0  --------------------
        0        0        0     0.8333     0.1667     0.0000
        0        1        1     0.7593     0.2407     0.0000
        0        2        2     0.7778     0.2222     0.0000
        0        3        3     1.0000     0.0000     0.0000
# Stage    0  converged YES  reached max numb iterations NO 
# All stages converged

Watch the progress of a workflow

The progress of a workflow can be watched on-the-fly

$ dpgen2 watch input.json WFID
INFO:root:steps iter-000000--prep-run-train----------------------- finished
INFO:root:steps iter-000000--prep-run-lmp------------------------- finished
INFO:root:steps iter-000000--prep-run-fp-------------------------- finished
INFO:root:steps iter-000000--collect-data------------------------- finished
INFO:root:steps iter-000001--prep-run-train----------------------- finished
INFO:root:steps iter-000001--prep-run-lmp------------------------- finished
...

The artifacts can be downloaded on-the-fly with -d flag. Note that the existing files are automatically skipped if one sets dflow_config["archive_mode"] = None.

Show the keys of steps

Each dpgen2 step is assigned a unique key. The keys of the finished steps can be checked with showkey command

                   0 : iter-000000--prep-train
              1 -> 4 : iter-000000--run-train-0000 -> iter-000000--run-train-0003
                   5 : iter-000000--prep-lmp
             6 -> 14 : iter-000000--run-lmp-000000 -> iter-000000--run-lmp-000008
                  15 : iter-000000--select-confs
                  16 : iter-000000--prep-fp
            17 -> 20 : iter-000000--run-fp-000000 -> iter-000000--run-fp-000003
                  21 : iter-000000--collect-data
                  22 : iter-000000--scheduler
                  23 : iter-000000--id
                  24 : iter-000001--prep-train
            25 -> 28 : iter-000001--run-train-0000 -> iter-000001--run-train-0003
                  29 : iter-000001--prep-lmp
            30 -> 38 : iter-000001--run-lmp-000000 -> iter-000001--run-lmp-000008
                  39 : iter-000001--select-confs
                  40 : iter-000001--prep-fp
            41 -> 44 : iter-000001--run-fp-000000 -> iter-000001--run-fp-000003
                  45 : iter-000001--collect-data
                  46 : iter-000001--scheduler
                  47 : iter-000001--id

Resubmit a workflow

If a workflow stopped abnormally, one may submit a new workflow with some steps of the old workflow reused.

dpgen2 resubmit input.json WFID --reuse 0-41

The steps of workflow WDID 0-41 (0<=id<41, note that 41 is not included) will be reused in the new workflow. The indexes of the steps are printed by dpgen2 showkey. In the example, all the steps before the iter-000001--run-fp-000000 will be used in the new workflow.

Command line interface

DPGEN2: concurrent learning workflow generating the machine learning potential energy models.

usage: dpgen2 [-h] [-v]
              {submit,resubmit,showkey,status,download,watch,terminate,stop,suspend,delete,retry,resume}
              ...

Named Arguments

-v, --version

show program’s version number and exit

Valid subcommands

command

Possible choices: submit, resubmit, showkey, status, download, watch, terminate, stop, suspend, delete, retry, resume

Sub-commands

submit

Submit DPGEN2 workflow

dpgen2 submit [-h] [-o] CONFIG
Positional Arguments
CONFIG

the config file in json format defining the workflow.

Named Arguments
-o, --old-compatible

compatible with old-style input script used in dpgen2 < 0.0.6.

Default: False

resubmit

Submit DPGEN2 workflow resuing steps from an existing workflow

dpgen2 resubmit [-h] [-l] [-u REUSE [REUSE ...]] [-k] [-o] CONFIG ID
Positional Arguments
CONFIG

the config file in json format defining the workflow.

ID

the ID of the existing workflow.

Named Arguments
-l, --list

list the Steps of the existing workflow.

Default: False

-u, --reuse

specify which Steps to reuse.

-k, --keep-schedule

if set then keep schedule of the old workflow. otherwise use the schedule defined in the input file

Default: False

-o, --old-compatible

compatible with old-style input script used in dpgen2 < 0.0.6.

Default: False

showkey

Print the keys of the successful DPGEN2 steps

dpgen2 showkey [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

status

Print the status (stage, iteration, convergence) of the DPGEN2 workflow

dpgen2 status [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

download

Download the artifacts of DPGEN2 steps

dpgen2 download [-h] [-k KEYS [KEYS ...]] [-p PREFIX] [-n] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

Named Arguments
-k, --keys

the keys of the downloaded steps. If not provided download all artifacts

-p, --prefix

the prefix of the path storing the download artifacts

-n, --no-check-point

if specified, download regardless whether check points exist.

Default: True

watch

Watch a DPGEN2 workflow

dpgen2 watch [-h] [-k KEYS [KEYS ...]] [-f FREQUENCY] [-d] [-p PREFIX] [-n]
             CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

Named Arguments
-k, --keys

the subkey to watch. For example, ‘prep-run-train’ ‘prep-run-lmp’

Default: [‘prep-run-train’, ‘prep-run-lmp’, ‘prep-run-fp’, ‘collect-data’]

-f, --frequency

the frequency of workflow status query. In unit of second

Default: 600.0

-d, --download

whether to download artifacts of a step when it finishes

Default: False

-p, --prefix

the prefix of the path storing the download artifacts

-n, --no-check-point

if specified, download regardless whether check points exist.

Default: True

terminate

Terminate a DPGEN2 workflow.

dpgen2 terminate [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

stop

Stop a DPGEN2 workflow.

dpgen2 stop [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

suspend

Suspend a DPGEN2 workflow.

dpgen2 suspend [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

delete

Delete a DPGEN2 workflow.

dpgen2 delete [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

retry

Retry a DPGEN2 workflow.

dpgen2 retry [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

resume

Resume a DPGEN2 workflow.

dpgen2 resume [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

Guide on writing input scripts for dpgen2 commands

Preliminaries

The reader of this doc is assumed to be familiar with the concurrent learning algorithm that the dpgen2 implements. If not, one may check this paper.

The input script for all dpgen2 commands

For all the dpgen2 commands, one need to provide dflow2 global configurations. For example,

    "dflow_config" : {
	"host" : "http://address.of.the.host:port"
    },
    "dflow_s3_config" : {
	"endpoint" : "address.of.the.s3.sever:port"
    },

The dpgen simply pass all keys of "dflow_config" to dflow.config and all keys of "dflow_s3_config" to dflow.s3_config.

The input script for submit and resubmit

The full documentation of the submit and resubmit script can be found here. This documentation provides a fast guide on how to write the input script.

In the input script of dpgen2 submit and dpgen2 resubmit, one needs to provide the definition of the workflow and how they are executed in the input script. One may find an example input script in the dpgen2 Al-Mg alloy example.

The definition of the workflow can be provided by the following sections:

Inputs

This section provides the inputs to start a dpgen2 workflow. An example for the Al-Mg alloy

"inputs": {
	"type_map":		["Al", "Mg"],
	"mass_map":		[27, 24],
	"init_data_sys":	[
		"path/to/init/data/system/0",
		"path/to/init/data/system/1"
	],
}

The key "init_data_sys" provides the initial training data to kick-off the training of deep potential (DP) models.

Training

This section defines how a model is trained.

"train" : {
	"type" : "dp",
	"numb_models" : 4,
	"config" : {},
	"template_script" : {
		"_comment" : "omitted content of tempalte script"
	},
	"_comment" : "all"
}

The "type" : "dp" tell the traning method is "dp", i.e. calling DeePMD-kit to train DP models. The "config" key defines the training configs, see the full documentation. The "template_script" provides the template training script in json format.

Exploration

This section defines how the configuration space is explored.

"explore" : {
	"type" : "lmp",
	"config" : {
		"command": "lmp -var restart 0"
	},
	"max_numb_iter" :	5,
	"conv_accuracy" :	0.9,
	"fatal_at_max" :	false,
	"f_trust_lo":		0.05,
	"f_trust_hi":		0.50,
	"configurations":	[
		{
		"lattice" : ["fcc", 4.57],
		"replicate" : [2, 2, 2],
		"numb_confs" : 30,
		"concentration" : [[1.0, 0.0], [0.5, 0.5], [0.0, 1.0]]
		}
		{
		"lattice" : ["fcc", 4.57],
		"replicate" : [3, 3, 3],
		"numb_confs" : 30,
		"concentration" : [[1.0, 0.0], [0.5, 0.5], [0.0, 1.0]]
		}
	],
	"stages":	[
	    [
		{
		    "_comment" : "stage 0, task group 0",
		    "type" : "lmp-md",
		    "ensemble": "nvt", "nsteps":  50, "temps": [50, 100], "trj_freq": 10,
		    "conf_idx": [0], "n_sample" : 3
		},
		{
		    "_comment" : "stage 1, task group 0",
		    "type" : "lmp-template",
		    "lmp" : "template.lammps", "plm" : "template.plumed", 
		    "trj_freq" : 10, "revisions" : {"V_NSTEPS" : [40], "V_TEMP" : [150, 200]},
		    "conf_idx": [0], "n_sample" : 3
		}
	    ],
	    [
		{
		    "_comment" : "stage 1, task group 0",
		    "type" : "lmp-md",
		    "ensemble": "npt", "nsteps":  50, "press": [1e0], "temps": [50, 100, 200], "trj_freq": 10,
		    "conf_idx": [1], "n_sample" : 3
		}
	    ],
	],
}

The "type" : "lmp" means that configurations are explored by LAMMPS DPMD runs. The "config" key defines the lmp configs, see the full documentation. The "configurations" provides the initial configurations (coordinates of atoms and the simulation cell) of the DPMD simulations. It is a list. The elements of the list can be

  • list[str]: The strings provides the path to the configuration files.

  • dict: Automatic alloy configuration generator. See the detailed doc of the allowed keys.

The "stages" defines the exploration stages. It is of type list[list[dict]]. The outer list enumerate the exploration stages, the inner list enumerate the task groups of the stage. Each dict defines a stage. See the full documentation of the target group for writting task groups.

"n_sample" tells the number of confgiruations randomly sampled from the set picked by "conf_idx" from configurations for each exploration task. All configurations has the equal possibility to be sampled. The default value of "n_sample" is null, in this case all picked configurations are sampled. In the example, we have 3 samples for stage 0 task group 0 and 2 thermodynamic states (NVT, T=50 and 100K), then the task group has 3x2=6 NVT DPMD tasks.

FP

This section defines the first-principle (FP) calculation .

"fp" : {
	"type" :	"vasp",
	"config" : {
		"command": "source /opt/intel/oneapi/setvars.sh && mpirun -n 16 vasp_std"
	},
	"task_max":	2,
	"pp_files":	{"Al" : "vasp/POTCAR.Al", "Mg" : "vasp/POTCAR.Mg"},
	"incar":         "vasp/INCAR",
	"_comment" : "all"
}

The "type" : "vasp" means that first-principles are VASP calculations. The "config" key defines the vasp configs, see the full documentation. The "task_max" key defines the maximal number of vasp calculations in each dpgen2 iteration. The "pp_files" and "incar" keys provides the pseudopotential files and the template incar file.

Configuration of dflow step

The execution units of the dpgen2 are the dflow Steps. How each step is executed is defined by the "step_configs".

"step_configs":{
	"prep_train_config" : {
		"_comment" : "content omitted"
	},
	"run_train_config" : {
		"_comment" : "content omitted"
	},
	"prep_explore_config" : {
		"_comment" : "content omitted"
	},
	"run_explore_config" : {
		"_comment" : "content omitted"
	},
	"prep_fp_config" : {
		"_comment" : "content omitted"
	},
	"run_fp_config" : {
		"_comment" : "content omitted"
	},
	"select_confs_config" : {
		"_comment" : "content omitted"
	},
	"collect_data_config" : {
		"_comment" : "content omitted"
	},
	"cl_step_config" : {
		"_comment" : "content omitted"
	},
	"_comment" : "all"
},

The configs for prepare training, run training, prepare exploration, run exploration, prepare fp, run fp, select configurations, collect data and concurrent learning steps are given correspondingly.

The readers are refered to this page for a full documentation of the step configs.

Any of the config in the step_configs can be ommitted. If so, the configs of the step is set to the default step configs, which is provided by the following section, for example,

"default_step_config" : {
	"template_config" : {
	    "image" : "dpgen2:x.x.x"
	}
},

The way of writing the default_step_config is the same as any step config in the step_configs. One may refer to this page for full documentation.

Arguments of the submit script

DPGEN2 configurations

Op configs

RunDPTrain

init_model_policy:
type: str, optional, default: no
argument path: init_model_policy

The policy of init-model training. It can be

  • ‘no’: No init-model training. Traing from scratch.

  • ‘yes’: Do init-model training.

  • ‘old_data_larger_than:XXX’: Do init-model if the training data size of the previous model is larger than XXX. XXX is an int number.

init_model_old_ratio:
type: float, optional, default: 0.9
argument path: init_model_old_ratio

The frequency ratio of old data over new data

init_model_numb_steps:
type: int, optional, default: 400000, alias: init_model_stop_batch
argument path: init_model_numb_steps

The number of training steps when init-model

init_model_start_lr:
type: float, optional, default: 0.0001
argument path: init_model_start_lr

The start learning rate when init-model

init_model_start_pref_e:
type: float, optional, default: 0.1
argument path: init_model_start_pref_e

The start energy prefactor in loss when init-model

init_model_start_pref_f:
type: int | float, optional, default: 100
argument path: init_model_start_pref_f

The start force prefactor in loss when init-model

init_model_start_pref_v:
type: float, optional, default: 0.0
argument path: init_model_start_pref_v

The start virial prefactor in loss when init-model

RunLmp

command:
type: str, optional, default: lmp
argument path: command

The command of LAMMPS

RunVasp

Alloy configs

Task group configs

task_group_configs:
type: dict
argument path: task_group_configs

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: task_group_configs/type
possible choices: lmp-md, lmp-template

the type of the task group

When type is set to lmp-md (or its alias lmp-npt):

temps:
type: list, alias: Ts
argument path: task_group_configs[lmp-md]/temps

A list of temperatures in K. Also used to initialize the temperature

press:
type: list, optional, alias: Ps
argument path: task_group_configs[lmp-md]/press

A list of pressures in bar.

ens:
type: str, optional, default: nve, alias: ensemble
argument path: task_group_configs[lmp-md]/ens

The ensemble. Allowd options are ‘nve’, ‘nvt’, ‘npt’, ‘npt-a’, ‘npt-t’. ‘npt-a’ stands for anisotrpic box sampling and ‘npt-t’ stands for triclinic box sampling.

dt:
type: float, optional, default: 0.001
argument path: task_group_configs[lmp-md]/dt

The time step

nsteps:
type: int, optional, default: 100
argument path: task_group_configs[lmp-md]/nsteps

The number of steps

trj_freq:
type: int, optional, default: 10, aliases: t_freq, trj_freq, traj_freq
argument path: task_group_configs[lmp-md]/trj_freq

The number of steps

tau_t:
type: float, optional, default: 0.05
argument path: task_group_configs[lmp-md]/tau_t

The time scale of thermostat

tau_p:
type: float, optional, default: 0.5
argument path: task_group_configs[lmp-md]/tau_p

The time scale of barostat

pka_e:
type: NoneType | float, optional, default: None
argument path: task_group_configs[lmp-md]/pka_e

The energy of primary knock-on atom

neidelay:
type: int | NoneType, optional, default: None
argument path: task_group_configs[lmp-md]/neidelay

The delay of updating the neighbor list

no_pbc:
type: bool, optional, default: False
argument path: task_group_configs[lmp-md]/no_pbc

Not using the periodic boundary condition

use_clusters:
type: bool, optional, default: False
argument path: task_group_configs[lmp-md]/use_clusters

Calculate atomic model deviation

relative_f_epsilon:
type: NoneType | float, optional, default: None
argument path: task_group_configs[lmp-md]/relative_f_epsilon

Calculate relative force model deviation

relative_v_epsilon:
type: NoneType | float, optional, default: None
argument path: task_group_configs[lmp-md]/relative_v_epsilon

Calculate relative virial model deviation

When type is set to lmp-template:

lmp_template_fname:
type: str, aliases: lmp_template, lmp
argument path: task_group_configs[lmp-template]/lmp_template_fname

The file name of lammps input template

plm_template_fname:
type: NoneType | str, optional, default: None, aliases: plm_template, plm
argument path: task_group_configs[lmp-template]/plm_template_fname

The file name of plumed input template

revisions:
type: dict, optional, default: {}
argument path: task_group_configs[lmp-template]/revisions
traj_freq:
type: int, optional, default: 10, aliases: t_freq, trj_freq, trj_freq
argument path: task_group_configs[lmp-template]/traj_freq

The frequency of dumping configurations and thermodynamic states

Step configs

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: template_config/image

The image to run the step.

timeout:
type: int | NoneType, optional, default: None
argument path: template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: int | NoneType, optional, default: None
argument path: template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: template_config/envs

The environmental variables.

continue_on_failed:
type: bool, optional, default: False
argument path: continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: int | NoneType, optional, default: None
argument path: continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: int | NoneType, optional, default: None
argument path: parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: executor/type
possible choices: lebesgue_v2, dispatcher

The type of the executor.

When type is set to lebesgue_v2:

extra:
type: dict, optional
argument path: executor[lebesgue_v2]/extra

The ‘extra’ key in the lebesgue executor. Note that we do not check if ‘the dict provided to the ‘extra’ key is valid or not.

scass_type:
type: str, optional
argument path: executor[lebesgue_v2]/extra/scass_type

The machine configuraiton.

program_id:
type: str, optional
argument path: executor[lebesgue_v2]/extra/program_id

The ID of the program.

job_type:
type: str, optional, default: container
argument path: executor[lebesgue_v2]/extra/job_type

The type of job.

template_cover_cmd_escape_bug:
type: bool, optional, default: True
argument path: executor[lebesgue_v2]/extra/template_cover_cmd_escape_bug

The key for hacking around a bug in Lebesgue.

When type is set to dispatcher:

Developers’ guide

  • The concurrent learning algorithm

  • Overview of the DPGEN2 implementation

  • The DPGEN2 workflow

  • How to contribute

The concurrent learning algorithm

DPGEN2 implements the concurrent learning algorithm named DP-GEN, described in this paper. It is noted that other types of workflows, like active learning, should be easily implemented within the infrastructure of DPGEN2.

The DP-GEN algorithm is iterative. In each iteration, four steps are consecutively executed: training, exploration, selection, and labeling.

  1. Training. A set of DP models are trained with the same dataset and the same hyperparameters. The only difference is the random seed initializing the model parameters.

  2. Exploration. One of the DP models is used to explore the configuration space. The strategy of exploration highly depends on the purpose of the application case of the model. The simulation technique for exploration can be molecular dynamics, Monte Carlo, structure search/optimization, enhanced sampling, or any combination of them. Current DPGEN2 only supports exploration based on molecular simulation platform LAMMPS.

  3. Selection. Not all the explored configurations are labeled, rather, the model prediction errors on the configurations are estimated by the model deviation, which is defined as the standard deviation in predictions of the set of the models. The critical configurations with large and not-that-large errors are selected for labeling. The configurations with very large errors are not selected because the large error is usually caused by non-physical configurations, e.g. overlapping atoms.

  4. Labeling. The selected configurations are labeled with energy, forces, and virial calculated by a method of first-principles accuracy. The usually used method is the density functional theory implemented in VASP, Quantum Expresso, CP2K, and etc.. The labeled data are finally added to the training dataset to start the next iteration.

In each iteration, the quality of the model is improved by selecting and labeling more critical data and adding them to the training dataset. The DP-GEN iteration is converged when no more critical data can be selected.

Overview of the DPGEN2 Implementation

The implementation DPGEN2 is based on the workflow platform dflow, which is a python wrapper of the Argo Workflows, an open-source container-native workflow engine on Kubernetes.

The DP-GEN algorithm is conceptually modeled as a computational graph. The implementation is then considered as two lines: the operators and the workflow.

  1. Operators. Operators are implemented in Python v3. The operators should be implemented and tested without the workflow.

  2. Workflow. Workflow is implemented on dflow. Ideally, the workflow is implemented and tested with all operators mocked.

The DPGEN2 workflow

The workflow of DPGEN2 is illustrated in the following figure

dpgen flowchart

In the center is the block operator, which is a super-OP (an OP composed by several OPs) for one DP-GEN iteration, i.e. the super-OP of the training, exploration, selection, and labeling steps. The inputs of the block OP are lmp_task_group, conf_selector and dataset.

  • lmp_task_group: definition of a group of LAMMPS tasks that explore the configuration space.

  • conf_selector: defines the rule by which the configurations are selected for labeling.

  • dataset: the training dataset.

The outputs of the block OP are

  • exploration_report: a report recording the result of the exploration. For example, home many configurations are accurate enough and how many are selected as candidates for labeling.

  • dataset_incr: the increment of the training dataset.

The dataset_incr is added to the training dataset.

The exploration_report is passed to the exploration_strategy OP. The exploration_strategy implements the strategy of exploration. It reads the exploration_report generated by each iteration (block), then tells if the iteration is converged. If not, it generates a group of LAMMPS tasks (lmp_task_group) and the criteria of selecting configurations (conf_selector). The lmp_task_group and conf_selector are then used by block of the next iteration. The iteration closes.

Inside the block operator

The inside of the super-OP block is displayed on the right-hand side of the figure. It contains the following steps to finish one DPGEN iteration

  • prep_run_dp_train: prepares training tasks of DP models and runs them.

  • prep_run_lmp: prepares the LAMMPS exploration tasks and runs them.

  • select_confs: selects configurations for labeling from the explored configurations.

  • prep_run_fp: prepares and runs first-principles tasks.

  • collect_data: collects the dataset_incr and adds it to the dataset.

The exploration strategy

The exploration strategy defines how the configuration space is explored by the concurrent learning algorithm. The design of the exploration strategy is graphically illustrated in the following figure. The exploration is composed of stages. Only the DP-GEN exploration is converged at one stage (no configuration with a large error is explored), the exploration goes to the next iteration. The whole procedure is controlled by exploration_scheduler. Each stage has its schedule, which talks to the exploration_scheduler to generate the schedule for the DP-GEN algorithm.

exploration strategy

Some concepts are explained below:

  • Exploration group. A group of LAMMPS tasks shares similar settings. For example, a group of NPT MD simulations in a certain thermodynamic space.

  • Exploration stage. The exploration_stage contains a list of exploration groups. It contains all information needed to define the lmp_task_group used by the block in the DP-GEN iteration.

  • Stage scheduler. It guarantees the convergence of the DP-GEN algorithm in each exploration_stage. If the exploration is not converged, the stage_scheduler generates lmp_task_group and conf_selector from the exploration_stage for the next iteration (probably with a different initial condition, i.e. different initial configurations and randomly generated initial velocity).

  • Exploration scheduler. The scheduler for the DP-GEN algorithm. When DP-GEN is converged in one of the stages, it goes to the next stage until all planned stages are used.

How to contribute

Anyone interested in the DPGEN2 project may contribute OPs, workflows, and exploration strategies.

Operators

There are two types of OPs in DPGEN2

  • OP. An execution unit the the workflow. It can be roughly viewed as a piece of Python script taking some input and gives some outputs. An OP cannot be used in the dflow until it is embedded in a super-OP.

  • Super-OP. An execution unite that is composed by one or more OP and/or super-OPs.

Techinically, OP is a Python class derived from dflow.python.OP. It serves as the PythonOPTemplate of dflow.Step.

The super-OP is a Python class derived from dflow.Steps. It contains dflow.Steps as building blocks, and can be used as OP template to generate a dflow.Step. The explanation of the concepts dflow.Step and dflow.Steps, one may refer to the manual of dflow.

The super-OP PrepRunDPTrain

In the following we will take the PrepRunDPTrain super-OP as an example to illustrate how to write OPs in DPGEN2.

PrepRunDPTrain is a super-OP that prepares several DeePMD-kit training tasks, and submit all of them. This super-OP is composed by two dflow.Steps building from dflow.python.OPs PrepDPTrain and RunDPTrain.

from dflow import (
    Step,
    Steps,
)
from dflow.python import(
    PythonOPTemplate,
    OP,
    Slices,
)

class PrepRunDPTrain(Steps):
    def __init__(
            self,
            name : str,
            prep_train_op : OP,
            run_train_op : OP,
            prep_train_image : str = "dflow:v1.0",
            run_train_image : str = "dflow:v1.0",
    ):
		...
        self = _prep_run_dp_train(
            self, 
            self.step_keys,
            prep_train_op,
            run_train_op,
            prep_train_image = prep_train_image,
            run_train_image = run_train_image,
        )            

The construction of the PrepRunDPTrain takes prepare-training OP and run-training OP and their docker images as input, and implemented in internal method _prep_run_dp_train.

def _prep_run_dp_train(
        train_steps,
        step_keys,
        prep_train_op : OP = PrepDPTrain,
        run_train_op : OP = RunDPTrain,
        prep_train_image : str = "dflow:v1.0",
        run_train_image : str = "dflow:v1.0",
):
    prep_train = Step(
        ...
        template=PythonOPTemplate(
            prep_train_op,
            image=prep_train_image,
            ...
        ),
        ...
    )
    train_steps.add(prep_train)

    run_train = Step(
        ...
        template=PythonOPTemplate(
            run_train_op,
            image=run_train_image,
            ...
        ),
        ...
    )
    train_steps.add(run_train)

    train_steps.outputs.artifacts["scripts"]._from = run_train.outputs.artifacts["script"]
    train_steps.outputs.artifacts["models"]._from = run_train.outputs.artifacts["model"]
    train_steps.outputs.artifacts["logs"]._from = run_train.outputs.artifacts["log"]
    train_steps.outputs.artifacts["lcurves"]._from = run_train.outputs.artifacts["lcurve"]

    return train_steps	

In _prep_run_dp_train, two instances of dflow.Step, i.e. prep_train and run_train, generated from prep_train_op and run_train_op, respectively, are added to train_steps. Both of prep_train_op and run_train_op are OPs (python classes derived from dflow.python.OPs) that will be illustrated later. train_steps is an instance of dflow.Steps. The outputs of the second OP run_train are assigned to the outputs of the train_steps.

The prep_train prepares a list of paths, each of which contains all necessary files to start a DeePMD-kit training tasks.

The run_train slices the list of paths, and assign each item in the list to a DeePMD-kit task. The task is executed by run_train_op. This is a very nice feature of dflow, because the developer only needs to implement how one DeePMD-kit task is executed, and then all the items in the task list will be executed in parallel. See the following code to see how it works

    run_train = Step(
        'run-train',
        template=PythonOPTemplate(
            run_train_op,
            image=run_train_image,
            slices = Slices(
                "int('{{item}}')",
                input_parameter = ["task_name"],
                input_artifact = ["task_path", "init_model"],
                output_artifact = ["model", "lcurve", "log", "script"],
            ),
        ),
        parameters={
            "config" : train_steps.inputs.parameters["train_config"],
            "task_name" : prep_train.outputs.parameters["task_names"],
        },
        artifacts={
            'task_path' : prep_train.outputs.artifacts['task_paths'],
            "init_model" : train_steps.inputs.artifacts['init_models'],
            "init_data": train_steps.inputs.artifacts['init_data'],
            "iter_data": train_steps.inputs.artifacts['iter_data'],
        },
        with_sequence=argo_sequence(argo_len(prep_train.outputs.parameters["task_names"]), format=train_index_pattern),
        key = step_keys['run-train'],
    )

The input parameter "task_names" and artifacts "task_paths" and "init_model" are sliced and supplied to each DeePMD-kit task. The output artifacts of the tasks ("model", "lcurve", "log" and "script") are stacked in the same order as the input lists. These lists are assigned as the outputs of train_steps by

    train_steps.outputs.artifacts["scripts"]._from = run_train.outputs.artifacts["script"]
    train_steps.outputs.artifacts["models"]._from = run_train.outputs.artifacts["model"]
    train_steps.outputs.artifacts["logs"]._from = run_train.outputs.artifacts["log"]
    train_steps.outputs.artifacts["lcurves"]._from = run_train.outputs.artifacts["lcurve"]

The OP RunDPTrain

We will take RunDPTrain as an example to illustrate how to implement an OP in DPGEN2. The source code of this OP is found here

Firstly of all, an OP should be implemented as a derived class of dflow.python.OP.

The dflow.python.OP requires static type define for the input and output variables, i.e. the signatures of an OP. The input and output signatures of the dflow.python.OP are given by classmethods get_input_sign and get_output_sign.

from dflow.python import (
    OP,
    OPIO,
    OPIOSign,
    Artifact,
)
class RunDPTrain(OP):
    @classmethod
    def get_input_sign(cls):
        return OPIOSign({
            "config" : dict,
            "task_name" : str,
            "task_path" : Artifact(Path),
            "init_model" : Artifact(Path),
            "init_data" : Artifact(List[Path]),
            "iter_data" : Artifact(List[Path]),
        })
    
    @classmethod
    def get_output_sign(cls):
        return OPIOSign({
            "script" : Artifact(Path),
            "model" : Artifact(Path),
            "lcurve" : Artifact(Path),
            "log" : Artifact(Path),
        })

All items not defined as Artifact are treated as parameters of the OP. The concept of parameter and artifact are explained in the dflow document. To be short, the artifacts can be pathlib.Path or a list of pathlib.Path. The artifacts are passed by the file system. Other data structures are treated as parameters, they are passed as variables encoded in str. Therefore, a large amout of information should be stored in artifacts, otherwise they can be considered as parameters.

The operation of the OP is implemented in method execute, and are run in docker containers. Again taking the execute method of RunDPTrain as an example

    @OP.exec_sign_check
    def execute(
            self,
            ip : OPIO,
    ) -> OPIO:
        ...
        task_name = ip['task_name']
        task_path = ip['task_path']
        init_model = ip['init_model']
        init_data = ip['init_data']
        iter_data = ip['iter_data']
        ...
        work_dir = Path(task_name)
        ...
        # here copy all files in task_path to work_dir
        ...
        with set_directory(work_dir):
            fplog = open('train.log', 'w')
            def clean_before_quit():
                fplog.close()
            # train model
            command = ['dp', 'train', train_script_name]
            ret, out, err = run_command(command)
            if ret != 0:
                clean_before_quit()
                raise FatalError('dp train failed')
            fplog.write(out)
            # freeze model
            ret, out, err = run_command(['dp', 'freeze', '-o', 'frozen_model.pb'])
            if ret != 0:
                clean_before_quit()
                raise FatalError('dp freeze failed')
            fplog.write(out)
            clean_before_quit()

        return OPIO({
            "script" : work_dir / train_script_name,
            "model" : work_dir / "frozen_model.pb",
            "lcurve" : work_dir / "lcurve.out",
            "log" : work_dir / "train.log",
        })

The inputs and outputs variables are recorded in data structure dflow.python.OPIO, which is initialized by a Python dict. The keys in the input/output dict, and the types of the input/output variables will be checked against their signatures by decorator OP.exec_sign_check. If any key or type does not match, an exception will be raised.

It is noted that all input artifacts of the OP are read-only, therefore, the first step of the RunDPTrain.execute is to copy all necessary input files from the directory task_path prepared by PrepDPTrain to the working directory work_dir.

with_directory method creates the work_dir and swithes to the directory before the execution, and then exits the directoy when the task finishes or an error is raised.

In what follows, the training and model frozen bash commands are executed consecutively. The return code is check and a FatalError is raised if a non-zero code is detected.

Finally the trained model file, input script, learning curve file and the log file are recored in a dflow.python.OPIO and returned.

Exploration

DPGEN2 allows developers to contribute exploration strategies. The exploration strategy defines how the configuration space is explored by molecular simulations in each DPGEN iteration. Notice that we are not restricted to molecular dynamics, any molecular simulation is, in priciple, allowed. For example, Monte Carlo, enhanced sampling, structure optimization, and so on.

An exploration strategy takes the history of exploration as input, and gives back DPGEN the exploration tasks (we call it task group) and the rule to select configurations from the trajectories generated by the tasks (we call it configuration selector).

One can contribute from three aspects:

  • The stage scheduler

  • The exploration task groups

  • Configuration selector

Stage scheduler

The stage scheduler takes an exploration report passed from the exploration scheduler as input, and tells the exploration scheduler if the exploration in the stage is converged, if not, returns a group of exploration tasks and a configuration selector that are used in the next DPGEN iteration.

Detailed explanation of the concepts are found here.

All the stage schedulers are derived from the abstract base class StageScheduler. The only interface to be implemented is StageScheduler.plan_next_iteration. One may check the doc string for the explanation of the interface.

class StageScheduler(ABC):
    """
    The scheduler for an exploration stage.
    """

    @abstractmethod
    def plan_next_iteration(
            self,
            hist_reports : List[ExplorationReport],
            report : ExplorationReport,
            confs : List[Path],
    ) -> Tuple[bool, ExplorationTaskGroup, ConfSelector] :
        """
        Make the plan for the next iteration of the stage.

        It checks the report of the current and all historical iterations of the stage, 
        and tells if the iterations are converged. 
        If not converged, it will plan the next ieration for the stage. 

        Parameters
        ----------
        hist_reports: List[ExplorationReport]
            The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.
        report : ExplorationReport
            The exploration report of this iteration.
        confs: List[Path]
            A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration. 

        Returns
        -------
        converged: bool
            If the stage converged.
        task: ExplorationTaskGroup
            A `ExplorationTaskGroup` defining the exploration of the next iteration. Should be `None` if the stage is converged.
        conf_selector: ConfSelector
            The configuration selector for the next iteration. Should be `None` if the stage is converged.

        """

One may check more details on the exploratin task group and the configuration selector.

Exploration task groups

DPGEN2 defines a python class ExplorationTask to manage all necessry files needed to run a exploration task. It can be used as the example provided in the doc string.

class ExplorationTask():
    """Define the files needed by an exploration task. 

    Examples
    --------
    >>> # this example dumps all files needed by the task.
    >>> files = exploration_task.files()
    ... for file_name, file_content in files.items():
    ...     with open(file_name, 'w') as fp:
    ...         fp.write(file_content)    

    """	

A collection of the exploration tasks is called exploration task group. All tasks groups are derived from the base class ExplorationTaskGroup. The exploration task group can be viewd as a list of ExplorationTasks, one may get the list by using property ExplorationTaskGroup.task_list. One may add tasks, or ExplorationTaskGroup to the group by methods ExplorationTaskGroup.add_task and ExplorationTaskGroup.add_group, respectively.

class ExplorationTaskGroup(Sequence):
    @property
    def task_list(self) -> List[ExplorationTask]:
        """Get the `list` of `ExplorationTask`""" 
        ...

    def add_task(self, task: ExplorationTask):
        """Add one task to the group."""
        ...

    def add_group(
            self,
            group : 'ExplorationTaskGroup',
    ):
        """Add another group to the group."""
        ...

An example of generating a group of NPT MD simulations may illustrate how to implement the ExplorationTaskGroups.

Configuration selector

The configuration selectors are derived from the abstract base class ConfSelector

class ConfSelector(ABC):
    """Select configurations from trajectory and model deviation files.
    """
    @abstractmethod
    def select (
            self,
            trajs : List[Path],
            model_devis : List[Path],
            traj_fmt : str = 'deepmd/npy',
            type_map : List[str] = None,
    ) -> Tuple[List[ Path ], ExplorationReport]:

The abstractmethod to implement is ConfSelector.select. trajs and model_devis are lists of files that recording the simulations trajectories and model deviations respectively. traj_fmt and type_map are parameters that may be needed for loading the trajectories by dpdata.

The ConfSelector.select returns a Path, each of which can be treated as a dpdata.MultiSystems, and a ExplorationReport.

An example of selecting configurations from LAMMPS trajectories may illustrate how to implement the ConfSelectors.

DPGEN2 API

dpgen2 package

Subpackages

dpgen2.conf package
Submodules
dpgen2.conf.alloy_conf module
class dpgen2.conf.alloy_conf.AlloyConf(lattice: Union[System, Tuple[str, float]], type_map: List[str], replicate: Optional[Union[List[int], Tuple[int], int]] = None)[source]

Bases: object

Parameters
lattice Union[dpdata.System, Tuple[str,float]]

Lattice of the alloy confs. can be dpdata.System: lattice in dpdata.System Tuple[str, float]: pair of lattice type and lattice constant. lattice type can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”

replicate Union[List[int], Tuple[int], int]

replicate of the lattice

type_map List[str]

The type map

Methods

generate_file_content(numb_confs[, ...])

Parameters

generate_systems(numb_confs[, ...])

Parameters

generate_file_content(numb_confs, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0, fmt: str = 'lammps/lmp') List[str][source]
Parameters
numb_confs int

Number of configurations to generate

concentration List[List[float]] or List[float] or None

If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac float

fraction of cell perturbation

atom_pert_dist float

the atom perturbation distance (unit angstrom).

fmt str

the format of the returned conf strings. Should be one of the formats supported by dpdata

Returns
conf_list List[str]

A list of file content of configurations.

generate_systems(numb_confs, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0) List[str][source]
Parameters
numb_confs int

Number of configurations to generate

concentration List[List[float]] or List[float] or None

If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac float

fraction of cell perturbation

atom_pert_dist float

the atom perturbation distance (unit angstrom).

Returns
conf_list List[dpdata.System]

A list of generated confs in dpdata.System.

class dpgen2.conf.alloy_conf.AlloyConfGenerator(numb_confs, lattice: Union[System, Tuple[str, float]], replicate: Optional[Union[List[int], Tuple[int], int]] = None, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0)[source]

Bases: ConfGenerator

Parameters
numb_confs int

Number of configurations to generate

lattice Union[dpdata.System, Tuple[str,float]]

Lattice of the alloy confs. can be dpdata.System: lattice in dpdata.System Tuple[str, float]: pair of lattice type and lattice constant. lattice type can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”

replicate Union[List[int], Tuple[int], int]

replicate of the lattice

concentration List[List[float]] or List[float] or None

If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac float

fraction of cell perturbation

atom_pert_dist float

the atom perturbation distance (unit angstrom).

Methods

generate(type_map)

Method of generating configurations.

get_file_content(type_map[, fmt])

Get the file content of configurations

normalize_config([data, strict])

Normalized the argument.

args

static args() List[Argument][source]
generate(type_map) MultiSystems[source]

Method of generating configurations.

Parameters
type_map: List[str]

The type map.

Returns
confs: dpdata.MultiSystems

The returned configurations in dpdata.MultiSystems format

dpgen2.conf.alloy_conf.gen_doc(*, make_anchor=True, make_link=True, **kwargs)[source]
dpgen2.conf.alloy_conf.generate_alloy_conf_args()[source]
dpgen2.conf.alloy_conf.generate_alloy_conf_file_content(lattice: Union[System, Tuple[str, float]], type_map: List[str], numb_confs, replicate: Optional[Union[List[int], Tuple[int], int]] = None, concentration: Optional[Union[List[List[float]], List[float]]] = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0, fmt: str = 'lammps/lmp')[source]
dpgen2.conf.alloy_conf.normalize(data)[source]
dpgen2.conf.conf_generator module
class dpgen2.conf.conf_generator.ConfGenerator[source]

Bases: ABC

Methods

generate(type_map)

Method of generating configurations.

get_file_content(type_map[, fmt])

Get the file content of configurations

normalize_config([data, strict])

Normalized the argument.

args

abstract static args() List[Argument][source]
abstract generate(type_map) MultiSystems[source]

Method of generating configurations.

Parameters
type_map: List[str]

The type map.

Returns
confs: dpdata.MultiSystems

The returned configurations in dpdata.MultiSystems format

get_file_content(type_map, fmt='lammps/lmp') List[str][source]

Get the file content of configurations

Parameters
type_map: List[str]

The type map.

Returns
conf_list: List[str]

A list of file content of configurations.

classmethod normalize_config(data: Dict = {}, strict: bool = True) Dict[source]

Normalized the argument.

Parameters
data: Dict

The input dict of arguments.

strict: bool

Strictly check the arguments.

Returns
data: Dict

The normalized arguments.

dpgen2.conf.file_conf module
class dpgen2.conf.file_conf.FileConfGenerator(files: Union[str, List[str]], fmt: str = 'auto', prefix: Optional[str] = None, remove_pbc: Optional[bool] = False)[source]

Bases: ConfGenerator

Methods

generate(type_map)

Method of generating configurations.

get_file_content(type_map[, fmt])

Get the file content of configurations

normalize_config([data, strict])

Normalized the argument.

args

static args() List[Argument][source]
generate(type_map) MultiSystems[source]

Method of generating configurations.

Parameters
type_map: List[str]

The type map.

Returns
confs: dpdata.MultiSystems

The returned configurations in dpdata.MultiSystems format

dpgen2.conf.unit_cells module
class dpgen2.conf.unit_cells.BCC[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.DIAMOND[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.FCC[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.HCP[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.SC[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
dpgen2.conf.unit_cells.generate_unit_cell(crystal: str, latt: float = 1.0) System[source]
dpgen2.entrypoint package
Submodules
dpgen2.entrypoint.args module
dpgen2.entrypoint.args.bohrium_conf_args()[source]
dpgen2.entrypoint.args.default_step_config_args()[source]
dpgen2.entrypoint.args.dflow_conf_args()[source]
dpgen2.entrypoint.args.dp_train_args()[source]
dpgen2.entrypoint.args.dpgen_step_config_args(default_config)[source]
dpgen2.entrypoint.args.fp_args(inputs, run)[source]
dpgen2.entrypoint.args.gen_doc(*, make_anchor=True, make_link=True, **kwargs)[source]
dpgen2.entrypoint.args.input_args()[source]
dpgen2.entrypoint.args.lebesgue_conf_args()[source]
dpgen2.entrypoint.args.lmp_args()[source]
dpgen2.entrypoint.args.normalize(data)[source]
dpgen2.entrypoint.args.submit_args(default_step_config={'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}})[source]
dpgen2.entrypoint.args.variant_conf()[source]
dpgen2.entrypoint.args.variant_explore()[source]
dpgen2.entrypoint.args.variant_fp()[source]
dpgen2.entrypoint.args.variant_train()[source]
dpgen2.entrypoint.common module
dpgen2.entrypoint.common.expand_idx(in_list)[source]
dpgen2.entrypoint.common.expand_sys_str(root_dir: Union[str, Path]) List[str][source]
dpgen2.entrypoint.common.global_config_workflow(wf_config, do_lebesgue: bool = False)[source]
dpgen2.entrypoint.download module
dpgen2.entrypoint.download.download(workflow_id, wf_config: Optional[Dict] = {}, wf_keys: Optional[List] = None, prefix: Optional[str] = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.main module
dpgen2.entrypoint.main.main()[source]
dpgen2.entrypoint.main.main_parser() ArgumentParser[source]

DPGEN2 commandline options argument parser.

Returns
argparse.ArgumentParser

the argument parser

Notes

This function is used by documentation.

dpgen2.entrypoint.main.parse_args(args: Optional[List[str]] = None)[source]

DPGEN2 commandline options argument parsing.

Parameters
args: List[str]

list of command line arguments, main purpose is testing default option None takes arguments from sys.argv

dpgen2.entrypoint.showkey module
dpgen2.entrypoint.showkey.showkey(wf_id, wf_config)[source]
dpgen2.entrypoint.status module
dpgen2.entrypoint.status.status(workflow_id, wf_config: Optional[Dict] = {})[source]
dpgen2.entrypoint.submit module
dpgen2.entrypoint.submit.copy_scheduler_plans(scheduler_new, scheduler_old)[source]
dpgen2.entrypoint.submit.get_kspacing_kgamma_from_incar(fname)[source]
dpgen2.entrypoint.submit.get_resubmit_keys(wf)[source]
dpgen2.entrypoint.submit.get_scheduler_ids(reuse_step)[source]
dpgen2.entrypoint.submit.make_concurrent_learning_op(train_style: str = 'dp', explore_style: str = 'lmp', fp_style: str = 'vasp', prep_train_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_train_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, prep_explore_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_explore_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, prep_fp_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_fp_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, select_confs_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, collect_data_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, cl_step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]
dpgen2.entrypoint.submit.make_naive_exploration_scheduler(config, old_style=False)[source]
dpgen2.entrypoint.submit.print_list_steps(steps)[source]
dpgen2.entrypoint.submit.resubmit_concurrent_learning(wf_config, wfid, list_steps=False, reuse=None, old_style=False, replace_scheduler=False)[source]
dpgen2.entrypoint.submit.submit_concurrent_learning(wf_config, reuse_step: Optional[List[Step]] = None, old_style: bool = False, replace_scheduler: bool = False)[source]
dpgen2.entrypoint.submit.successful_step_keys(wf)[source]
dpgen2.entrypoint.submit.update_reuse_step_scheduler(reuse_step, scheduler_new)[source]
dpgen2.entrypoint.submit.workflow_concurrent_learning(config: Dict, old_style: bool = False)[source]
dpgen2.entrypoint.watch module
dpgen2.entrypoint.watch.update_finished_steps(wf, finished_keys: Optional[List[str]] = None, download: Optional[bool] = False, watching_keys: Optional[List[str]] = None, prefix: Optional[str] = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.watch.watch(workflow_id, wf_config: Optional[Dict] = {}, watching_keys: Optional[List] = ['prep-run-train', 'prep-run-lmp', 'prep-run-fp', 'collect-data'], frequency: float = 600.0, download: bool = False, prefix: Optional[str] = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.workflow module
dpgen2.entrypoint.workflow.add_subparser_workflow_subcommand(subparsers, command: str)[source]
dpgen2.entrypoint.workflow.execute_workflow_subcommand(command: str, wfid: str, wf_config: Optional[dict] = {})[source]
dpgen2.exploration package
Subpackages
dpgen2.exploration.render package
Submodules
dpgen2.exploration.render.traj_render module
class dpgen2.exploration.render.traj_render.TrajRender[source]

Bases: ABC

Methods

get_confs(traj, id_selected[, type_map, ...])

Get configurations from trajectory by selection.

get_model_devi(files)

Get model deviations from recording files.

abstract get_confs(traj: List[Path], id_selected: List[List[int]], type_map: Optional[List[str]] = None, conf_filters: Optional[ConfFilters] = None) MultiSystems[source]

Get configurations from trajectory by selection.

Parameters
traj: List[Path]

Trajectory files

id_selected: List[List[int]]

The selected frames. id_selected[ii][jj] is the jj-th selected frame from the ii-th trajectory. id_selected[ii] may be an empty list.

type_map: List[str]

The type map.

Returns
ms: dpdata.MultiSystems

The configurations in dpdata.MultiSystems format

abstract get_model_devi(files: List[Path]) Tuple[List[ndarray], Optional[List[ndarray]]][source]

Get model deviations from recording files.

Parameters
files: List[Path]

The paths to the model deviation recording files

Returns
model_devis: Tuple[List[np.array], Union[List[np.array],None]]

A tuple. model_devis[0] is the force model deviations, model_devis[1] is the virial model deviations. The model_devis[1] can be None. If not None, model_devis[i] is List[np.array], where np.array is a one-dimensional array. The first dimension of model_devis[i] is the trajectory (same size as len(files)), while the second dimension is the frame.

dpgen2.exploration.render.traj_render_lammps module
class dpgen2.exploration.render.traj_render_lammps.TrajRenderLammps(nopbc: bool = False)[source]

Bases: TrajRender

Methods

get_confs(trajs, id_selected[, type_map, ...])

Get configurations from trajectory by selection.

get_model_devi(files)

Get model deviations from recording files.

get_confs(trajs: List[Path], id_selected: List[List[int]], type_map: Optional[List[str]] = None, conf_filters: Optional[ConfFilters] = None) MultiSystems[source]

Get configurations from trajectory by selection.

Parameters
traj: List[Path]

Trajectory files

id_selected: List[List[int]]

The selected frames. id_selected[ii][jj] is the jj-th selected frame from the ii-th trajectory. id_selected[ii] may be an empty list.

type_map: List[str]

The type map.

Returns
ms: dpdata.MultiSystems

The configurations in dpdata.MultiSystems format

get_model_devi(files: List[Path]) Tuple[List[ndarray], Optional[List[ndarray]]][source]

Get model deviations from recording files.

Parameters
files: List[Path]

The paths to the model deviation recording files

Returns
model_devis: Tuple[List[np.array], Union[List[np.array],None]]

A tuple. model_devis[0] is the force model deviations, model_devis[1] is the virial model deviations. The model_devis[1] can be None. If not None, model_devis[i] is List[np.array], where np.array is a one-dimensional array. The first dimension of model_devis[i] is the trajectory (same size as len(files)), while the second dimension is the frame.

dpgen2.exploration.report package
Submodules
dpgen2.exploration.report.report module
class dpgen2.exploration.report.report.ExplorationReport[source]

Bases: ABC

Methods

clear()

Clear the report

converged()

If the exploration is converged

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(md_f[, md_v])

Record the model deviations of the trajectories

abstract clear()[source]

Clear the report

abstract converged() bool[source]

If the exploration is converged

abstract get_candidate_ids(max_nframes: Optional[int] = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters
max_nframes int

The maximal number of frames of candidates.

Returns
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

no_candidate() bool[source]

If no candidate configuration is found

abstract print(stage_idx: int, idx_in_stage: int, iter_idx: int) str[source]

Print the report

abstract print_header() str[source]

Print the header of report

abstract record(md_f: List[ndarray], md_v: Optional[List[ndarray]] = None)[source]

Record the model deviations of the trajectories

Parameters
mdfList[np.ndarray]

The force model deviations. mdf[ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory.

mdvOptional[List[np.ndarray]]

The virial model deviations. mdv[ii][jj] is the virial model deviation of the jj-th frame of the ii-th trajectory.

dpgen2.exploration.report.report_trust_levels module
class dpgen2.exploration.report.report_trust_levels.ExplorationReportTrustLevels(trust_level, conv_accuracy)[source]

Bases: ExplorationReport

Methods

clear()

Clear the report

converged()

If the exploration is converged

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(md_f[, md_v_])

Record the model deviations of the trajectories

accurate_ratio

candidate_ratio

failed_ratio

accurate_ratio(tag=None)[source]
candidate_ratio(tag=None)[source]
clear()[source]

Clear the report

converged()[source]

If the exploration is converged

failed_ratio(tag=None)[source]
get_candidate_ids(max_nframes: Optional[int] = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters
max_nframes int

The maximal number of frames of candidates.

Returns
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

print(stage_idx: int, idx_in_stage: int, iter_idx: int) str[source]

Print the report

print_header() str[source]

Print the header of report

record(md_f: List[ndarray], md_v_: Optional[List[ndarray]] = None)[source]

Record the model deviations of the trajectories

Parameters
mdfList[np.ndarray]

The force model deviations. mdf[ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory.

mdvOptional[List[np.ndarray]]

The virial model deviations. mdv[ii][jj] is the virial model deviation of the jj-th frame of the ii-th trajectory.

dpgen2.exploration.scheduler package
Submodules
dpgen2.exploration.scheduler.convergence_check_stage_scheduler module
class dpgen2.exploration.scheduler.convergence_check_stage_scheduler.ConvergenceCheckStageScheduler(stage: ExplorationStage, selector: ConfSelector, max_numb_iter: Optional[int] = None, fatal_at_max: bool = True)[source]

Bases: StageScheduler

Methods

complete()

Tell if the stage is complete

converged()

Tell if the stage is converged

force_complete()

For complete the stage

get_reports()

Return all exploration reports

next_iteration()

Return the index of the next iteration

plan_next_iteration([report, trajs])

Make the plan for the next iteration of the stage.

reached_max_iteration

complete()[source]

Tell if the stage is complete

Returns
converged bool

if the stage is complete

converged()[source]

Tell if the stage is converged

Returns
converged bool

the convergence

force_complete()[source]

For complete the stage

get_reports()[source]

Return all exploration reports

Returns
reports List[ExplorationReport]

the reports

next_iteration()[source]

Return the index of the next iteration

Returns
index int

the index of the next iteration

plan_next_iteration(report: Optional[ExplorationReport] = None, trajs: Optional[List[Path]] = None) Tuple[bool, Optional[ExplorationTaskGroup], Optional[ConfSelector]][source]

Make the plan for the next iteration of the stage.

It checks the report of the current and all historical iterations of the stage, and tells if the iterations are converged. If not converged, it will plan the next ieration for the stage.

Parameters
hist_reports: List[ExplorationReport]

The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.

reportExplorationReport

The exploration report of this iteration.

confs: List[Path]

A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

Returns
stg_complete: bool

If the stage completed. Two cases may happen: 1. converged. 2. when not fatal_at_max, not converged but reached max number of iterations.

task: ExplorationTaskGroup

A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if the stage is converged.

conf_selector: ConfSelector

The configuration selector for the next iteration. Should be None if the stage is converged.

reached_max_iteration()[source]
dpgen2.exploration.scheduler.scheduler module
class dpgen2.exploration.scheduler.scheduler.ExplorationScheduler[source]

Bases: object

The exploration scheduler.

Methods

add_stage_scheduler(stage_scheduler)

Add stage scheduler.

complete()

Tell if all stages are converged.

force_stage_complete()

Force complete the current stage

get_convergence_ratio()

Get the accurate, candidate and failed ratios of the iterations

get_iteration()

Get the index of the current iteration.

get_stage()

Get the index of current stage.

get_stage_of_iterations()

Get the stage index and the index in the stage of iterations.

plan_next_iteration([report, trajs])

Make the plan for the next DPGEN iteration.

print_convergence

print_last_iteration

add_stage_scheduler(stage_scheduler: StageScheduler)[source]

Add stage scheduler.

All added schedulers can be treated as a list (order matters). Only one stage is converged, the iteration goes to the next iteration.

Parameters
stage_scheduler: StageScheduler

The added stage scheduler

complete()[source]

Tell if all stages are converged.

force_stage_complete()[source]

Force complete the current stage

get_convergence_ratio()[source]

Get the accurate, candidate and failed ratios of the iterations

Returns
accu np.ndarray

The accurate ratio. length of array the same as # iterations.

cand np.ndarray

The candidate ratio. length of array the same as # iterations.

fail np.ndarray

The failed ration. length of array the same as # iterations.

get_iteration()[source]

Get the index of the current iteration.

Iteration index increase when self.plan_next_iteration returns valid lmp_task_grp and conf_selector for the next iteration.

get_stage()[source]

Get the index of current stage.

Stage index increases when the previous stage converges. Usually called after self.plan_next_iteration.

get_stage_of_iterations()[source]

Get the stage index and the index in the stage of iterations.

plan_next_iteration(report: Optional[ExplorationReport] = None, trajs: Optional[List[Path]] = None) Tuple[bool, Optional[ExplorationTaskGroup], Optional[ConfSelector]][source]

Make the plan for the next DPGEN iteration.

Parameters
reportExplorationReport

The exploration report of this iteration.

confs: List[Path]

A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

Returns
complete: bool

If all the DPGEN stages complete.

task: ExplorationTaskGroup

A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if converged.

conf_selector: ConfSelector

The configuration selector for the next iteration. Should be None if converged.

print_convergence()[source]
print_last_iteration(print_header=False)[source]
dpgen2.exploration.scheduler.stage_scheduler module
class dpgen2.exploration.scheduler.stage_scheduler.StageScheduler[source]

Bases: ABC

The scheduler for an exploration stage.

Methods

complete()

Tell if the stage is complete

converged()

Tell if the stage is converged

force_complete()

For complete the stage

get_reports()

Return all exploration reports

next_iteration()

Return the index of the next iteration

plan_next_iteration(report, trajs)

Make the plan for the next iteration of the stage.

abstract complete() bool[source]

Tell if the stage is complete

Returns
converged bool

if the stage is complete

abstract converged() bool[source]

Tell if the stage is converged

Returns
converged bool

the convergence

abstract force_complete()[source]

For complete the stage

abstract get_reports() List[ExplorationReport][source]

Return all exploration reports

Returns
reports List[ExplorationReport]

the reports

abstract next_iteration() int[source]

Return the index of the next iteration

Returns
index int

the index of the next iteration

abstract plan_next_iteration(report: ExplorationReport, trajs: List[Path]) Tuple[bool, ExplorationTaskGroup, ConfSelector][source]

Make the plan for the next iteration of the stage.

It checks the report of the current and all historical iterations of the stage, and tells if the iterations are converged. If not converged, it will plan the next ieration for the stage.

Parameters
hist_reports: List[ExplorationReport]

The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.

reportExplorationReport

The exploration report of this iteration.

confs: List[Path]

A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

Returns
stg_complete: bool

If the stage completed. Two cases may happen: 1. converged. 2. when not fatal_at_max, not converged but reached max number of iterations.

task: ExplorationTaskGroup

A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if the stage is converged.

conf_selector: ConfSelector

The configuration selector for the next iteration. Should be None if the stage is converged.

dpgen2.exploration.selector package
Submodules
dpgen2.exploration.selector.conf_filter module
class dpgen2.exploration.selector.conf_filter.ConfFilter[source]

Bases: ABC

Methods

check(coords, cell, atom_types, nopbc)

Check if the configuration is valid.

abstract check(coords: ndarray, cell: ndarray, atom_types: ndarray, nopbc: bool) bool[source]

Check if the configuration is valid.

Parameters
coordsnumpy.array

The coordinates, numpy array of shape natoms x 3

cellnumpy.array

The cell tensor. numpy array of shape 3 x 3

atom_typesnumpy.array

The atom types. numpy array of shape natoms

nopbcbool

If no periodic boundary condition.

Returns
validbool

True if the configuration is a valid configuration, else False.

class dpgen2.exploration.selector.conf_filter.ConfFilters[source]

Bases: object

Methods

add

check

add(conf_filter: ConfFilter) ConfFilters[source]
check(conf: System) bool[source]
dpgen2.exploration.selector.conf_selector module
class dpgen2.exploration.selector.conf_selector.ConfSelector[source]

Bases: ABC

Select configurations from trajectory and model deviation files.

Methods

select

abstract select(trajs: List[Path], model_devis: List[Path], type_map: Optional[List[str]] = None) Tuple[List[Path], ExplorationReport][source]
dpgen2.exploration.selector.conf_selector_frame module
class dpgen2.exploration.selector.conf_selector_frame.ConfSelectorFrames(traj_render: TrajRender, report: ExplorationReport, max_numb_sel: Optional[int] = None, conf_filters: Optional[ConfFilters] = None)[source]

Bases: ConfSelector

Select frames from trajectories as confs.

Parameters: trust_level: TrustLevel

The trust level

conf_filter: ConfFilters

The configuration filter

Methods

select(trajs, model_devis[, type_map])

Select configurations

select(trajs: List[Path], model_devis: List[Path], type_map: Optional[List[str]] = None) Tuple[List[Path], ExplorationReport][source]

Select configurations

Parameters
trajsList[Path]

A list of Path to trajectory files generated by LAMMPS

model_devisList[Path]

A list of Path to model deviation files generated by LAMMPS. Format: each line has 7 numbers they are used as # frame_id md_v_max md_v_min md_v_mean md_f_max md_f_min md_f_mean where md stands for model deviation, v for virial and f for force

type_mapList[str]

The type_map of the systems

Returns
confsList[Path]

The selected confgurations, stored in a folder in deepmd/npy format, can be parsed as dpdata.MultiSystems. The list only has one item.

reportExplorationReport

The exploration report recoding the status of the exploration.

dpgen2.exploration.selector.trust_level module
class dpgen2.exploration.selector.trust_level.TrustLevel(level_f_lo, level_f_hi, level_v_lo=None, level_v_hi=None)[source]

Bases: object

Attributes
level_f_hi
level_f_lo
level_v_hi
level_v_lo
property level_f_hi
property level_f_lo
property level_v_hi
property level_v_lo
dpgen2.exploration.task package
Subpackages
dpgen2.exploration.task.lmp package
Submodules
dpgen2.exploration.task.lmp.lmp_input module
dpgen2.exploration.task.lmp.lmp_input.make_lmp_input(conf_file: str, ensemble: str, graphs: List[str], nsteps: int, dt: float, neidelay: Optional[int], trj_freq: int, mass_map: List[float], temp: float, tau_t: float = 0.1, pres: Optional[float] = None, tau_p: float = 0.5, use_clusters: bool = False, relative_f_epsilon: Optional[float] = None, relative_v_epsilon: Optional[float] = None, pka_e: Optional[float] = None, ele_temp_f: Optional[float] = None, ele_temp_a: Optional[float] = None, nopbc: bool = False, max_seed: int = 1000000, deepmd_version='2.0', trj_seperate_files=True)[source]
Submodules
dpgen2.exploration.task.conf_sampling_task_group module
class dpgen2.exploration.task.conf_sampling_task_group.ConfSamplingTaskGroup[source]

Bases: ExplorationTaskGroup

Attributes
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

clear

set_conf(conf_list: List[str], n_sample: Optional[int] = None, random_sample: bool = False)[source]

Set the configurations of exploration

Parameters
conf_list str

A list of file contents

n_sample int

Number of samples drawn from the conf list each time make_task is called. If set to None, n_sample is set to length of the conf_list.

random_sample bool

If true the confs are randomly sampled, otherwise are consecutively sampled from the conf_list

dpgen2.exploration.task.lmp_template_task_group module
class dpgen2.exploration.task.lmp_template_task_group.LmpTemplateTaskGroup[source]

Bases: ConfSamplingTaskGroup

Attributes
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

clear

make_cont

make_task

set_lmp

make_cont(templates: list, revisions: dict)[source]
make_task() ExplorationTaskGroup[source]
set_lmp(numb_models: int, lmp_template_fname: str, plm_template_fname: Optional[str] = None, revisions: dict = {}, traj_freq: int = 10) None[source]
dpgen2.exploration.task.lmp_template_task_group.find_only_one_key(lmp_lines, key)[source]
dpgen2.exploration.task.lmp_template_task_group.revise_by_keys(lmp_lines, keys, values)[source]
dpgen2.exploration.task.lmp_template_task_group.revise_lmp_input_dump(lmp_lines, trj_freq)[source]
dpgen2.exploration.task.lmp_template_task_group.revise_lmp_input_model(lmp_lines, task_model_list, trj_freq, deepmd_version='1')[source]
dpgen2.exploration.task.lmp_template_task_group.revise_lmp_input_plm(lmp_lines, in_plm, out_plm='output.plumed')[source]
dpgen2.exploration.task.make_task_group_from_config module
dpgen2.exploration.task.make_task_group_from_config.lmp_template_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.make_task_group_from_config(numb_models, mass_map, config)[source]
dpgen2.exploration.task.make_task_group_from_config.normalize(data)[source]
dpgen2.exploration.task.make_task_group_from_config.npt_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.variant_task_group()[source]
dpgen2.exploration.task.npt_task_group module
class dpgen2.exploration.task.npt_task_group.NPTTaskGroup[source]

Bases: ConfSamplingTaskGroup

Attributes
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the LAMMPS task group.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

set_md(numb_models, mass_map, temps[, ...])

Set MD parameters

clear

make_task() ExplorationTaskGroup[source]

Make the LAMMPS task group.

Returns
task_grp: ExplorationTaskGroup

The returned lammps task group. The number of tasks is nconf*nT*nP. nconf is set by n_sample parameter of set_conf. nT and nP are lengths of the temps and press parameters of set_md.

set_md(numb_models, mass_map, temps: List[float], press: Optional[List[float]] = None, ens: str = 'npt', dt: float = 0.001, nsteps: int = 1000, trj_freq: int = 10, tau_t: float = 0.1, tau_p: float = 0.5, pka_e: Optional[float] = None, neidelay: Optional[int] = None, no_pbc: bool = False, use_clusters: bool = False, relative_f_epsilon: Optional[float] = None, relative_v_epsilon: Optional[float] = None, ele_temp_f: Optional[float] = None, ele_temp_a: Optional[float] = None)[source]

Set MD parameters

dpgen2.exploration.task.stage module
class dpgen2.exploration.task.stage.ExplorationStage[source]

Bases: object

The exploration stage.

Methods

add_task_group(grp)

Add an exploration group

clear()

Clear all exploration group.

make_task()

Make the LAMMPS task group.

add_task_group(grp: ExplorationTaskGroup)[source]

Add an exploration group

Parameters
grp: ExplorationTaskGroup

The added exploration task group

clear()[source]

Clear all exploration group.

make_task() ExplorationTaskGroup[source]

Make the LAMMPS task group.

Returns
task_grp: ExplorationTaskGroup

The returned lammps task group. The number of tasks is equal to the summation of task groups defined by all the exploration groups added to the stage.

dpgen2.exploration.task.task module
class dpgen2.exploration.task.task.ExplorationTask[source]

Bases: object

Define the files needed by an exploration task.

Examples

>>> # this example dumps all files needed by the task.
>>> files = exploration_task.files()
... for file_name, file_content in files.items():
...     with open(file_name, 'w') as fp:
...         fp.write(file_content)    

Methods

add_file(fname, fcont)

Add file to the task

files()

Get all files for the task.

add_file(fname: str, fcont: str)[source]

Add file to the task

Parameters
fnamestr

The name of the file

fcontstr

The content of the file.

files() Dict[source]

Get all files for the task.

Returns
filesdict

The dict storing all files for the task. The file name is a key of the dict, and the file content is the corresponding value.

class dpgen2.exploration.task.task.ExplorationTaskGroup[source]

Bases: Sequence

A group of exploration tasks. Implemented as a list of ExplorationTask.

Attributes
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

clear

add_group(group: ExplorationTaskGroup)[source]

Add another group to the group.

add_task(task: ExplorationTask)[source]

Add one task to the group.

clear() None[source]
property task_list: List[ExplorationTask]

Get the list of ExplorationTask

class dpgen2.exploration.task.task.FooTask(conf_name='conf.lmp', conf_cont='', inpu_name='in.lammps', inpu_cont='')[source]

Bases: ExplorationTask

Methods

add_file(fname, fcont)

Add file to the task

files()

Get all files for the task.

class dpgen2.exploration.task.task.FooTaskGroup(numb_task)[source]

Bases: ExplorationTaskGroup

Attributes
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

clear

property task_list

Get the list of ExplorationTask

dpgen2.flow package
Submodules
dpgen2.flow.dpgen_loop module
class dpgen2.flow.dpgen_loop.ConcurrentLearning(name: str, block_op: OPTemplate, step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]

Bases: Steps

Attributes
init_keys
input_artifacts
input_parameters
loop_keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

convert_to_argo

handle_key

run

property init_keys
property input_artifacts
property input_parameters
property loop_keys
property output_artifacts
property output_parameters
class dpgen2.flow.dpgen_loop.ConcurrentLearningLoop(name: str, block_op: OPTemplate, step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]

Bases: Steps

Attributes
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

convert_to_argo

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
class dpgen2.flow.dpgen_loop.MakeBlockId(*args, **kwargs)[source]

Bases: OP

Methods

execute(ip)

Run the OP

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

class dpgen2.flow.dpgen_loop.SchedulerWrapper(*args, **kwargs)[source]

Bases: OP

Methods

execute(ip)

Run the OP

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.fp package
Submodules
dpgen2.fp.gaussian module

Prep and Run Gaussian tasks.

class dpgen2.fp.gaussian.GaussianInputs(**kwargs: Any)[source]

Bases: object

Methods

args()

The arguments of the GaussianInputs class.

static args() List[Argument][source]

The arguments of the GaussianInputs class.

class dpgen2.fp.gaussian.PrepGaussian(*args, **kwargs)[source]

Bases: PrepFp

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, inputs)

Define how one Gaussian task is prepared.

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

prep_task(conf_frame: System, inputs: GaussianInputs)[source]

Define how one Gaussian task is prepared.

Parameters
conf_framedpdata.System

One frame of configuration in the dpdata format.

inputs: GaussianInputs

The GaussianInputs object handels all other input files of the task.

class dpgen2.fp.gaussian.RunGaussian(*args, **kwargs)[source]

Bases: RunFp

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a Gaussian task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a Gaussian task.

run_task(command, out)

Defines how one FP task runs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

static args() List[Argument][source]

The argument definition of the run_task method.

Returns
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

input_files() List[str][source]

The mandatory input files to run a Gaussian task.

Returns
files: List[str]

A list of madatory input files names.

optional_input_files() List[str][source]

The optional input files to run a Gaussian task.

Returns
files: List[str]

A list of optional input files names.

run_task(command: str, out: str) Tuple[str, str][source]

Defines how one FP task runs

Parameters
command: str

The command of running gaussian task

out: str

The name of the output data file.

Returns
out_name: str

The file name of the output data in the dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.prep_fp module
class dpgen2.fp.prep_fp.PrepFp(*args, **kwargs)[source]

Bases: OP, ABC

Prepares the working directories for first-principles (FP) tasks.

A list of (same length as ip[“confs”]) working directories containing all files needed to start FP tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, inputs)

Define how one FP task is prepared.

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components:

  • config : (dict) Should have config[‘inputs’], which defines the input files of the FP task.

  • confs : (Artifact(List[Path])) Configurations for the FP tasks. Stored in folders as deepmd/npy format. Can be parsed as dpdata.MultiSystems.

Returns
opdict

Output dict with components:

  • task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the FP. The order fo the Paths should be consistent with op[“task_names”]

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

abstract prep_task(conf_frame: System, inputs: Any)[source]

Define how one FP task is prepared.

Parameters
conf_framedpdata.System

One frame of configuration in the dpdata format.

inputs: Any

The class object handels all other input files of the task. For example, pseudopotential file, k-point file and so on.

dpgen2.fp.run_fp module
class dpgen2.fp.run_fp.RunFp(*args, **kwargs)[source]

Bases: OP, ABC

Execute a first-principles (FP) task.

A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The FP command is exectuted from directory task_name. The op[“labeled_data”] in “deepmd/npy” format (HF5 in the future) provided by dpdata will be created.

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a FP task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a FP task.

run_task(**kwargs)

Defines how one FP task runs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

abstract static args() List[Argument][source]

The argument definition of the run_task method.

Returns
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components:

  • config: (dict) The config of FP task. Should have config[‘run’], which defines the runtime configuration of the FP task.

  • task_name: (str) The name of task.

  • task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepFp.

Returns
Output dict with components:
  • log: (Artifact(Path)) The log file of FP.
  • labeled_data: (Artifact(Path)) The path to the labeled data in “deepmd/npy” format provided by dpdata.
classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

abstract input_files() List[str][source]

The mandatory input files to run a FP task.

Returns
files: List[str]

A list of madatory input files names.

classmethod normalize_config(data: Dict = {}, strict: bool = True) Dict[source]

Normalized the argument.

Parameters
data: Dict

The input dict of arguments.

strict: bool

Strictly check the arguments.

Returns
data: Dict

The normalized arguments.

abstract optional_input_files() List[str][source]

The optional input files to run a FP task.

Returns
files: List[str]

A list of optional input files names.

abstract run_task(**kwargs) Tuple[str, str][source]

Defines how one FP task runs

Parameters
kwargs

Keyword args defined by the developer. The fp/run_config session of the input file will be passed to this function.

Returns
out_name: str

The file name of the output data. Should be in dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.vasp module
class dpgen2.fp.vasp.PrepVasp(*args, **kwargs)[source]

Bases: PrepFp

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, vasp_inputs)

Define how one Vasp task is prepared.

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

prep_task(conf_frame: System, vasp_inputs: VaspInputs)[source]

Define how one Vasp task is prepared.

Parameters
conf_framedpdata.System

One frame of configuration in the dpdata format.

inputs: VaspInputs

The VaspInputs object handels all other input files of the task.

class dpgen2.fp.vasp.RunVasp(*args, **kwargs)[source]

Bases: RunFp

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a vasp task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a vasp task.

run_task(command, out, log)

Defines how one FP task runs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

static args()[source]

The argument definition of the run_task method.

Returns
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

input_files() List[str][source]

The mandatory input files to run a vasp task.

Returns
files: List[str]

A list of madatory input files names.

optional_input_files() List[str][source]

The optional input files to run a vasp task.

Returns
files: List[str]

A list of optional input files names.

run_task(command: str, out: str, log: str) Tuple[str, str][source]

Defines how one FP task runs

Parameters
command: str

The command of running vasp task

out: str

The name of the output data file.

log: str

The name of the log file

Returns
out_name: str

The file name of the output data in the dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.vasp_input module
class dpgen2.fp.vasp_input.VaspInputs(kspacing: Union[float, List[float]], incar: str, pp_files: Dict[str, str], kgamma: bool = True)[source]

Bases: object

Attributes
incar_template
potcars

Methods

args

incar_from_file

make_kpoints

make_potcar

normalize_config

potcars_from_file

static args()[source]
incar_from_file(fname: str)[source]
property incar_template
make_kpoints(box: ndarray) str[source]
make_potcar(atom_names) str[source]
static normalize_config(data={}, strict=True)[source]
property potcars
potcars_from_file(dict_fnames: Dict[str, str])[source]
dpgen2.fp.vasp_input.make_kspacing_kpoints(box, kspacing, kgamma)[source]
dpgen2.op package
Submodules
dpgen2.op.collect_data module
class dpgen2.op.collect_data.CollectData(*args, **kwargs)[source]

Bases: OP

Collect labeled data and add to the iteration dataset.

After running FP tasks, the labeled data are scattered in task directories. This OP collect the labeled data in one data directory and add it to the iteration data. The data generated by this iteration will be place in ip[“name”] subdirectory of the iteration data directory.

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Execute the OP. This OP collect data scattered in directories given by ip[‘labeled_data’] in to one dpdata.Multisystems and store it in a directory named name. This directory is appended to the list iter_data.

Parameters
ipdict

Input dict with components:

  • name: (str) The name of this iteration. The data generated by this iteration will be place in a sub-directory of name.

  • labeled_data: (Artifact(List[Path])) The paths of labeled data generated by FP tasks of the current iteration.

  • iter_data: (Artifact(List[Path])) The data paths previous iterations.

Returns
Output dict with components:
  • iter_data: (Artifact(List[Path])) The data paths of previous and the current iteration data.
classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.md_settings module
class dpgen2.op.md_settings.MDSettings(ens: str, dt: float, nsteps: int, trj_freq: int, temps: Optional[List[float]] = None, press: Optional[List[float]] = None, tau_t: float = 0.1, tau_p: float = 0.5, pka_e: Optional[float] = None, neidelay: Optional[int] = None, no_pbc: bool = False, use_clusters: bool = False, relative_epsilon: Optional[float] = None, relative_v_epsilon: Optional[float] = None, ele_temp_f: Optional[float] = None, ele_temp_a: Optional[float] = None)[source]

Bases: object

Methods

to_str

to_str() str[source]
dpgen2.op.prep_dp_train module
class dpgen2.op.prep_dp_train.PrepDPTrain(*args, **kwargs)[source]

Bases: OP

Prepares the working directories for DP training tasks.

A list of (numb_models) working directories containing all files needed to start training tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components:

  • template_script: (str or List[str]) A template of the training script. Can be a str or List[str]. In the case of str, all training tasks share the same training input template, the only difference is the random number used to initialize the network parameters. In the case of List[str], one training task uses one template from the list. The random numbers used to initialize the network parameters are differnt. The length of the list should be the same as numb_models.

  • numb_models: (int) Number of DP models to train.

Returns
opdict

Output dict with components:

  • task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. The order fo the Paths should be consistent with op[“task_names”]

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.prep_lmp module
dpgen2.op.prep_lmp.PrepExplorationTaskGroup

alias of PrepLmp

class dpgen2.op.prep_lmp.PrepLmp(*args, **kwargs)[source]

Bases: OP

Prepare the working directories for LAMMPS tasks.

A list of working directories (defined by ip[“task”]) containing all files needed to start LAMMPS tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components: - lmp_task_grp : (Artifact(Path)) Can be pickle loaded as a ExplorationTaskGroup. Definitions for LAMMPS tasks

Returns
opdict

Output dict with components:

  • task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the LAMMPS simulation. The order fo the Paths should be consistent with op[“task_names”]

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.run_dp_train module
class dpgen2.op.run_dp_train.RunDPTrain(*args, **kwargs)[source]

Bases: OP

Execute a DP training task. Train and freeze a DP model.

A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The DeePMD-kit training and freezing commands are exectuted from directory task_name.

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

decide_init_model

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

normalize_config

skip_training

training_args

write_data_to_input_script

write_other_to_input_script

static decide_init_model(config, init_model, init_data, iter_data)[source]
execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components:

  • config: (dict) The config of training task. Check RunDPTrain.training_args for definitions.

  • task_name: (str) The name of training task.

  • task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepDPTrain.

  • init_model: (Artifact(Path)) A frozen model to initialize the training.

  • init_data: (Artifact(List[Path])) Initial training data.

  • iter_data: (Artifact(List[Path])) Training data generated in the DPGEN iterations.

Returns
Output dict with components:
  • script: (Artifact(Path)) The training script.
  • model: (Artifact(Path)) The trained frozen model.
  • lcurve: (Artifact(Path)) The learning curve file.
  • log: (Artifact(Path)) The log file of training.
classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

static normalize_config(data={})[source]
static skip_training(work_dir, train_dict, init_model, iter_data)[source]
static training_args()[source]
static write_data_to_input_script(idict: dict, init_data: List[Path], iter_data: List[Path], auto_prob_str: str = 'prob_sys_size', major_version: str = '1')[source]
static write_other_to_input_script(idict, config, do_init_model, major_version: str = '1')[source]
dpgen2.op.run_dp_train.config_args()
dpgen2.op.run_lmp module
class dpgen2.op.run_lmp.RunLmp(*args, **kwargs)[source]

Bases: OP

Execute a LAMMPS task.

A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The LAMMPS command is exectuted from directory task_name. The trajectory and the model deviation will be stored in files op[“traj”] and op[“model_devi”], respectively.

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

lmp_args

normalize_config

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components:

  • config: (dict) The config of lmp task. Check RunLmp.lmp_args for definitions.

  • task_name: (str) The name of the task.

  • task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepLmp.

  • models: (Artifact(List[Path])) The frozen model to estimate the model deviation. The first model with be used to drive molecular dynamics simulation.

Returns
Output dict with components:
  • log: (Artifact(Path)) The log file of LAMMPS.
  • traj: (Artifact(Path)) The output trajectory.
  • model_devi: (Artifact(Path)) The model deviation. The order of recorded model deviations should be consistent with the order of frames in traj.
classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

static lmp_args()[source]
static normalize_config(data={})[source]
dpgen2.op.run_lmp.config_args()
dpgen2.op.select_confs module
class dpgen2.op.select_confs.SelectConfs(*args, **kwargs)[source]

Bases: OP

Select configurations from exploration trajectories for labeling.

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

exec_sign_check

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters
ipdict

Input dict with components:

  • conf_selector: (ConfSelector) Configuration selector.

  • type_map: (List[str]) The type map.

  • trajs: (Artifact(List[Path])) The trajectories generated in the exploration.

  • model_devis: (Artifact(List[Path])) The file storing the model deviation of the trajectory. The order of model deviation storage is consistent with that of the trajectories. The order of frames of one model deviation storage is also consistent with tat of the corresponding trajectory.

Returns
Output dict with components:
  • report: (ExplorationReport) The report on the exploration.
  • conf: (Artifact(List[Path])) The selected configurations.
classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.superop package
Submodules
dpgen2.superop.block module
class dpgen2.superop.block.ConcurrentLearningBlock(name: str, prep_run_dp_train_op: OPTemplate, prep_run_lmp_op: OPTemplate, select_confs_op: OP, prep_run_fp_op: OPTemplate, collect_data_op: OP, select_confs_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, collect_data_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]

Bases: Steps

Attributes
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

convert_to_argo

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_dp_train module
class dpgen2.superop.prep_run_dp_train.PrepRunDPTrain(name: str, prep_train_op: OP, run_train_op: OP, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]

Bases: Steps

Attributes
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

convert_to_argo

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_fp module
class dpgen2.superop.prep_run_fp.PrepRunFp(name: str, prep_op: OP, run_op: OP, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]

Bases: Steps

Attributes
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

convert_to_argo

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_lmp module
class dpgen2.superop.prep_run_lmp.PrepRunLmp(name: str, prep_op: OP, run_op: OP, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: Optional[List[PathLike]] = None)[source]

Bases: Steps

Attributes
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

convert_to_argo

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.utils package
Submodules
dpgen2.utils.bohrium_config module
dpgen2.utils.bohrium_config.bohrium_config_from_dict(bohrium_config)[source]
dpgen2.utils.chdir module
dpgen2.utils.chdir.chdir(path_key: str)[source]

Returns a decorator that can change the current working path.

Parameters
path_keystr

key to OPIO

Examples

>>> class SomeOP(OP):
...     @chdir("path")
...     def execute(self, ip: OPIO):
...         do_something() 
dpgen2.utils.chdir.set_directory(path: Path)[source]

Sets the current working path within the context.

Parameters
pathPath

The path to the cwd

Yields
None

Examples

>>> with set_directory("some_path"):
...    do_something()
dpgen2.utils.dflow_config module
dpgen2.utils.dflow_config.dflow_config(config_data)[source]

set the dflow config by config_data

the keys starting with “s3_” will be treated as s3_config keys, other keys are treated as config keys.

dpgen2.utils.dflow_config.dflow_config_lower(dflow_config)[source]
dpgen2.utils.dflow_config.dflow_s3_config(config_data)[source]

set the s3 config by config_data

dpgen2.utils.dflow_config.dflow_s3_config_lower(dflow_s3_config_data)[source]
dpgen2.utils.dflow_config.workflow_config_from_dict(wf_config)[source]
dpgen2.utils.dflow_query module
dpgen2.utils.dflow_query.find_slice_ranges(keys: List[str], sliced_subkey: str)[source]

find range of sliced OPs that matches the pattern ‘iter-[0-9]*–{sliced_subkey}-[0-9]*’

dpgen2.utils.dflow_query.get_all_schedulers(wf: Any, keys: List[str])[source]

get the output Scheduler of the all the iterations

dpgen2.utils.dflow_query.get_iteration(key: str)[source]
dpgen2.utils.dflow_query.get_last_iteration(keys: List[str])[source]

get the index of the last iteraction from a list of step keys.

dpgen2.utils.dflow_query.get_last_scheduler(wf: Any, keys: List[str])[source]

get the output Scheduler of the last successful iteration

dpgen2.utils.dflow_query.get_subkey(key: str, idx: int = -1)[source]
dpgen2.utils.dflow_query.matched_step_key(all_keys: List[str], step_keys: Optional[List[str]] = None)[source]

returns the keys in all_keys that matches any of the step_keys

dpgen2.utils.dflow_query.print_keys_in_nice_format(keys: List[str], sliced_subkey: List[str], idx_fmt_len: int = 8)[source]
dpgen2.utils.dflow_query.sort_slice_ops(keys: List[str], sliced_subkey: List[str])[source]

sort the keys of the sliced ops. the keys of the sliced ops contains sliced_subkey

dpgen2.utils.download_dpgen2_artifacts module
class dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition[source]

Bases: object

Methods

add_def

add_input

add_output

add_def(tdict, key, suffix=None)[source]
add_input(input_key, suffix=None)[source]
add_output(output_key, suffix=None)[source]
dpgen2.utils.download_dpgen2_artifacts.download_dpgen2_artifacts(wf: Workflow, key: str, prefix: Optional[str] = None, chk_pnt: bool = False)[source]

download the artifacts of a step. the key should be of format ‘iter-xxxxxx–subkey-of-step-xxxxxx’ the input and output artifacts will be downloaded to prefix/iter-xxxxxx/key-of-step/inputs/ and prefix/iter-xxxxxx/key-of-step/outputs/

the downloaded input and output artifacts of steps are defined by op_download_setting

dpgen2.utils.obj_artifact module
dpgen2.utils.obj_artifact.dump_object_to_file(obj, fname)[source]

pickle dump object to a file

dpgen2.utils.obj_artifact.load_object_from_file(fname)[source]

pickle load object from a file

dpgen2.utils.run_command module
dpgen2.utils.run_command.run_command(cmd: Union[str, List[str]], shell: bool = False) Tuple[int, str, str][source]
dpgen2.utils.step_config module
dpgen2.utils.step_config.dispatcher_args()[source]

free style dispatcher args

dpgen2.utils.step_config.gen_doc(*, make_anchor=True, make_link=True, **kwargs)[source]
dpgen2.utils.step_config.init_executor(executor_dict)[source]
dpgen2.utils.step_config.lebesgue_executor_args()[source]
dpgen2.utils.step_config.lebesgue_extra_args()[source]
dpgen2.utils.step_config.normalize(data)[source]
dpgen2.utils.step_config.step_conf_args()[source]
dpgen2.utils.step_config.template_conf_args()[source]
dpgen2.utils.step_config.variant_executor()[source]

Submodules

dpgen2.constants module