DPGEN2’s documentation

DPGEN2 is the 2nd generation of the Deep Potential GENerator.

Important

The project DeePMD-kit is licensed under GNU LGPLv3.0.

Guide on dpgen2 commands

One may use dpgen2 through command line interface. A full documentation of the cli is found here

Submit a workflow

The dpgen2 workflow can be submitted via the submit command

dpgen2 submit input.json

where input.json is the input script. A guide of writing the script is found here. When a workflow is submitted, a ID (WFID) of the workflow will be printed for later reference.

Check the convergence of a workflow

The convergence of stages of the workflow can be checked by the status command. It prints the indexes of the finished stages, iterations, and the accurate, candidate and failed ratio of explored configurations of each iteration.

$ dpgen2 status input.json WFID
#   stage  id_stg.    iter.      accu.      cand.      fail.
# Stage    0  --------------------
        0        0        0     0.8333     0.1667     0.0000
        0        1        1     0.7593     0.2407     0.0000
        0        2        2     0.7778     0.2222     0.0000
        0        3        3     1.0000     0.0000     0.0000
# Stage    0  converged YES  reached max numb iterations NO
# All stages converged

Watch the progress of a workflow

The progress of a workflow can be watched on-the-fly

$ dpgen2 watch input.json WFID
INFO:root:steps iter-000000--prep-run-train----------------------- finished
INFO:root:steps iter-000000--prep-run-lmp------------------------- finished
INFO:root:steps iter-000000--prep-run-fp-------------------------- finished
INFO:root:steps iter-000000--collect-data------------------------- finished
INFO:root:steps iter-000001--prep-run-train----------------------- finished
INFO:root:steps iter-000001--prep-run-lmp------------------------- finished
...

The artifacts can be downloaded on-the-fly with -d flag. Note that the existing files are automatically skipped if one sets dflow_config["archive_mode"] = None.

Show the keys of steps

Each dpgen2 step is assigned a unique key. The keys of the finished steps can be checked with showkey command

                   0 : iter-000000--prep-train
              1 -> 4 : iter-000000--run-train-0000 -> iter-000000--run-train-0003
                   5 : iter-000000--prep-lmp
             6 -> 14 : iter-000000--run-lmp-000000 -> iter-000000--run-lmp-000008
                  15 : iter-000000--select-confs
                  16 : iter-000000--prep-fp
            17 -> 20 : iter-000000--run-fp-000000 -> iter-000000--run-fp-000003
                  21 : iter-000000--collect-data
                  22 : iter-000000--scheduler
                  23 : iter-000000--id
                  24 : iter-000001--prep-train
            25 -> 28 : iter-000001--run-train-0000 -> iter-000001--run-train-0003
                  29 : iter-000001--prep-lmp
            30 -> 38 : iter-000001--run-lmp-000000 -> iter-000001--run-lmp-000008
                  39 : iter-000001--select-confs
                  40 : iter-000001--prep-fp
            41 -> 44 : iter-000001--run-fp-000000 -> iter-000001--run-fp-000003
                  45 : iter-000001--collect-data
                  46 : iter-000001--scheduler
                  47 : iter-000001--id

Resubmit a workflow

If a workflow stopped abnormally, one may submit a new workflow with some steps of the old workflow reused.

dpgen2 resubmit input.json WFID --reuse 0-41

The steps of workflow WDID 0-41 (0<=id<41, note that 41 is not included) will be reused in the new workflow. The indexes of the steps are printed by dpgen2 showkey. In the example, all the steps before the iter-000001--run-fp-000000 will be used in the new workflow.

Command line interface

DPGEN2: concurrent learning workflow generating the machine learning potential energy models.

usage: dpgen2 [-h] [-v]
              {submit,resubmit,showkey,status,download,watch,gui,terminate,stop,suspend,delete,retry,resume,restart}
              ...

Named Arguments

-v, --version

show program’s version number and exit

Valid subcommands

command

Possible choices: submit, resubmit, showkey, status, download, watch, gui, terminate, stop, suspend, delete, retry, resume, restart

Sub-commands

submit

Submit DPGEN2 workflow

dpgen2 submit [-h] CONFIG
Positional Arguments
CONFIG

the config file in json format defining the workflow.

resubmit

Submit DPGEN2 workflow resuing steps from an existing workflow

dpgen2 resubmit [-h] [-l] [-u REUSE [REUSE ...]] [-k] [-f] CONFIG ID
Positional Arguments
CONFIG

the config file in json format defining the workflow.

ID

the ID of the existing workflow.

Named Arguments
-l, --list

list the Steps of the existing workflow.

Default: False

-u, --reuse

specify which Steps to reuse.

-k, --keep-schedule

if set then keep schedule of the old workflow. otherwise use the schedule defined in the input file

Default: False

-f, --fold

if set then super OPs are folded to be reused in the new workflow

Default: False

showkey

Print the keys of the successful DPGEN2 steps

dpgen2 showkey [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

status

Print the status (stage, iteration, convergence) of the DPGEN2 workflow

dpgen2 status [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

download

Typically there are three ways of using the command

1. list all supported steps and their input/output artifacts $ dpgen2 download CONFIG ID -l

2. donwload all the input/output of all the steps. $ dpgen2 download CONFIG ID

3. donwload specified input/output artifacts of certain steps. For example $ dpgen2 download CONFIG ID -i 0-8 8 9 -d prep-run-train/input/init_data prep-run-lmp/output/trajs

The command will download the init_data of prep-run-train’s input and trajs of the prep-run-lmp’s output from iterations 0 to 9 (by -i 0-8 8 9). The supported step and the names of input/output can be checked by the -l flag.

dpgen2 download [-h] [-l] [-k KEYS [KEYS ...]]
                [-i ITERATIONS [ITERATIONS ...]]
                [-d STEP_DEFINITIONS [STEP_DEFINITIONS ...]] [-p PREFIX] [-n]
                CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

Named Arguments
-l, --list-supported

list all supported steps artifacts

Default: False

-k, --keys

the keys of the downloaded steps. If not provided download all artifacts

-i, --iterations

the iterations to be downloaded, support ranging expression as 0-10.

-d, --step-definitions

the definition for downloading step artifacts

-p, --prefix

the prefix of the path storing the download artifacts

-n, --no-check-point

if specified, download regardless whether check points exist.

Default: True

watch

Watch a DPGEN2 workflow

dpgen2 watch [-h] [-k KEYS [KEYS ...]] [-f FREQUENCY] [-d] [-p PREFIX] [-n]
             CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the existing workflow.

Named Arguments
-k, --keys

the subkey to watch. For example, ‘prep-run-train’ ‘prep-run-lmp’

Default: [‘prep-run-train’, ‘prep-run-lmp’, ‘prep-run-fp’, ‘collect-data’]

-f, --frequency

the frequency of workflow status query. In unit of second

Default: 600.0

-d, --download

whether to download artifacts of a step when it finishes

Default: False

-p, --prefix

the prefix of the path storing the download artifacts

-n, --no-check-point

if specified, download regardless whether check points exist.

Default: True

gui

Serve DP-GUI.

dpgen2 gui [-h] [-p PORT] [--bind_all]
Named Arguments
-p, --port

The port to serve DP-GUI on.

Default: 6042

--bind_all

Serve on all public interfaces. This will expose your DP-GUI instance to the network on both IPv4 and IPv6 (where available).

Default: False

terminate

Terminate a DPGEN2 workflow.

dpgen2 terminate [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

stop

Stop a DPGEN2 workflow.

dpgen2 stop [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

suspend

Suspend a DPGEN2 workflow.

dpgen2 suspend [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

delete

Delete a DPGEN2 workflow.

dpgen2 delete [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

retry

Retry a DPGEN2 workflow.

dpgen2 retry [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

resume

Resume a DPGEN2 workflow.

dpgen2 resume [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

restart

restart a DPGEN2 workflow (for debug mode only).

dpgen2 restart [-h] CONFIG ID
Positional Arguments
CONFIG

the config file in json format.

ID

the ID of the workflow.

Guide on writing input scripts for dpgen2 commands

Preliminaries

The reader of this doc is assumed to be familiar with the concurrent learning algorithm that the dpgen2 implements. If not, one may check this paper.

The input script for all dpgen2 commands

For all the dpgen2 commands, one need to provide dflow2 global configurations. For example,

    "dflow_config" : {
	"host" : "http://address.of.the.host:port"
    },
    "dflow_s3_config" : {
	"endpoint" : "address.of.the.s3.sever:port"
    },

The dpgen simply pass all keys of "dflow_config" to dflow.config and all keys of "dflow_s3_config" to dflow.s3_config.

The input script for submit and resubmit

The full documentation of the submit and resubmit script can be found here. This documentation provides a fast guide on how to write the input script.

In the input script of dpgen2 submit and dpgen2 resubmit, one needs to provide the definition of the workflow and how they are executed in the input script. One may find an example input script in the dpgen2 Al-Mg alloy example.

The definition of the workflow can be provided by the following sections:

Inputs

This section provides the inputs to start a dpgen2 workflow. An example for the Al-Mg alloy

"inputs": {
	"type_map":		["Al", "Mg"],
	"mass_map":		[27, 24],
	"init_data_sys":	[
		"path/to/init/data/system/0",
		"path/to/init/data/system/1"
	],
}

The key "init_data_sys" provides the initial training data to kick-off the training of deep potential (DP) models.

Training

This section defines how a model is trained.

"train" : {
	"type" : "dp",
	"numb_models" : 4,
	"config" : {},
	"template_script" : "/path/to/the/template/input.json",
	"_comment" : "all"
}

The "type" : "dp" tell the traning method is "dp", i.e. calling DeePMD-kit to train DP models. The "config" key defines the training configs, see the full documentation. The "template_script" provides the template training script in json format.

Exploration

This section defines how the configuration space is explored.

"explore" : {
	"type" : "lmp",
	"config" : {
		"command": "lmp -var restart 0"
	},
	"convergence": {
	    "type" :	"fixed-levels",
	    "conv_accuracy" :	0.9,
	    "level_f_lo":	0.05,
	    "level_f_hi":	0.50,
	    "_comment" : "all"
	},
	"max_numb_iter" :	5,
	"fatal_at_max" :	false,
	"configurations":	[
		{
		"type": "alloy",
		"lattice" : ["fcc", 4.57],
		"replicate" : [2, 2, 2],
		"numb_confs" : 30,
		"concentration" : [[1.0, 0.0], [0.5, 0.5], [0.0, 1.0]]
		},
		{
		"type" : "file",
		"prefix": "/file/prefix",
		"files" : ["relpath/to/confs/*"],
		"fmt" : "deepmd/npy"
		}
	],
	"stages":	[
	    [
		{
		    "_comment" : "stage 0, task group 0",
		    "type" : "lmp-md",
		    "ensemble": "nvt", "nsteps":  50, "temps": [50, 100], "trj_freq": 10,
		    "conf_idx": [0], "n_sample" : 3
		},
		{
		    "_comment" : "stage 0, task group 1",
		    "type" : "lmp-template",
		    "lmp" : "template.lammps", "plm" : "template.plumed",
		    "trj_freq" : 10, "revisions" : {"V_NSTEPS" : [40], "V_TEMP" : [150, 200]},
		    "conf_idx": [0], "n_sample" : 3
		}
	    ],
	    [
		{
		    "_comment" : "stage 1, task group 0",
		    "type" : "lmp-md",
		    "ensemble": "npt", "nsteps":  50, "press": [1e0], "temps": [50, 100, 200], "trj_freq": 10,
		    "conf_idx": [1], "n_sample" : 3
		}
	    ]
	]
}

The "type" : "lmp" means that configurations are explored by LAMMPS DPMD runs. The "config" key defines the lmp configs. The "configurations" provides the initial configurations (coordinates of atoms and the simulation cell) of the DPMD simulations. It is a list. The elements of the list are dicts that defines how the configurations are generated

  • Automatic alloy configuration generator. See the detailed doc for the allowed keys.

  • Configurations load from files. See the detailed doc for the allowed keys.

The "stages" defines the exploration stages. It is of type list[list[dict]]. The outer list enumerate the exploration stages, the inner list enumerate the task groups of the stage. Each dict defines a stage. See the full documentation of the task group for writting task groups.

The "n_sample" tells the number of confgiruations randomly sampled from the set picked by "conf_idx" from "configurations" for each exploration task. All configurations has the equal possibility to be sampled. The default value of "n_sample" is null, in this case all picked configurations are sampled. In the example, we have 3 samples for stage 0 task group 0 and 2 thermodynamic states (NVT, T=50 and 100K), then the task group has 3x2=6 NVT DPMD tasks.

FP

This section defines the first-principle (FP) calculation .

"fp" : {
	"type": "vasp",
	"task_max":	2,
	"run_config": {
		"command": "source /opt/intel/oneapi/setvars.sh && mpirun -n 16 vasp_std"
	},
	"inputs_config": {
		"pp_files":	{"Al" : "vasp/POTCAR.Al", "Mg" : "vasp/POTCAR.Mg"},
		"kspacing":	0.32,
		"incar": "vasp/INCAR"
	}
}

The "type" : "vasp" means that first-principles are VASP calculations. The "run_config" key defines the configs for running VASP tasks. The "task_max" key defines the maximal number of vasp calculations in each dpgen2 iteration. The "pp_files", "kspacing" and "incar" keys provides the pseudopotential files, spacing for kspace sampling and the template incar file, respectively.

Configuration of dflow step

The execution units of the dpgen2 are the dflow Steps. How each step is executed is defined by the "step_configs".

"step_configs":{
	"prep_train_config" : {
		"_comment" : "content omitted"
	},
	"run_train_config" : {
		"_comment" : "content omitted"
	},
	"prep_explore_config" : {
		"_comment" : "content omitted"
	},
	"run_explore_config" : {
		"_comment" : "content omitted"
	},
	"prep_fp_config" : {
		"_comment" : "content omitted"
	},
	"run_fp_config" : {
		"_comment" : "content omitted"
	},
	"select_confs_config" : {
		"_comment" : "content omitted"
	},
	"collect_data_config" : {
		"_comment" : "content omitted"
	},
	"cl_step_config" : {
		"_comment" : "content omitted"
	},
	"_comment" : "all"
},

The configs for prepare training, run training, prepare exploration, run exploration, prepare fp, run fp, select configurations, collect data and concurrent learning steps are given correspondingly.

Any of the config in the "step_configs" can be ommitted. If so, the configs of the step is set to the default step configs, which is provided by the following section, for example,

"default_step_config" : {
	"template_config" : {
	    "image" : "dpgen2:x.x.x"
	}
},

The way of writing the "default_step_config" is the same as any step config in the "step_configs".

Arguments of the submit script

Note

One can load, modify, and export the input file by using our effective web-based tool DP-GUI online or hosted using the command line interface dpgen2 gui. All parameters below can be set in DP-GUI. By clicking “SAVE JSON”, one can download the input file.

dflow_config:
type: dict | NoneType, optional, default: None
argument path: dflow_config

The configuration passed to dflow

dflow_s3_config:
type: dict | NoneType, optional, default: None
argument path: dflow_s3_config

The S3 configuration passed to dflow

default_step_config:
type: dict, optional, default: {}
argument path: default_step_config

The default step configuration.

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: default_step_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: default_step_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: default_step_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: default_step_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: default_step_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: default_step_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: default_step_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: default_step_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: default_step_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: default_step_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: default_step_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: default_step_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: default_step_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: default_step_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: default_step_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

bohrium_config:
type: dict | NoneType, optional, default: None
argument path: bohrium_config

Configurations for the Bohrium platform.

username:
type: str
argument path: bohrium_config/username

The username of the Bohrium platform

password:
type: str
argument path: bohrium_config/password

The password of the Bohrium platform

project_id:
type: int
argument path: bohrium_config/project_id

The project ID of the Bohrium platform

host:
type: str, optional, default: https://workflows.deepmodeling.com
argument path: bohrium_config/host

The host name of the Bohrium platform. Will overwrite dflow_config[‘host’]

k8s_api_server:
type: str, optional, default: https://workflows.deepmodeling.com
argument path: bohrium_config/k8s_api_server

The k8s server of the Bohrium platform. Will overwrite dflow_config[‘k8s_api_server’]

repo_key:
type: str, optional, default: oss-bohrium
argument path: bohrium_config/repo_key

The repo key of the Bohrium platform. Will overwrite dflow_s3_config[‘repo_key’]

storage_client:
type: str, optional, default: dflow.plugins.bohrium.TiefblueClient
argument path: bohrium_config/storage_client

The storage client of the Bohrium platform. Will overwrite dflow_s3_config[‘storage_client’]

step_configs:
type: dict, optional, default: {}
argument path: step_configs

Configurations for executing dflow steps

prep_train_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/prep_train_config

Configuration for prepare train

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/prep_train_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/prep_train_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_train_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_train_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/prep_train_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/prep_train_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/prep_train_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_train_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_train_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/prep_train_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_train_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/prep_train_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_train_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/prep_train_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/prep_train_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

run_train_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/run_train_config

Configuration for run train

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/run_train_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/run_train_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/run_train_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/run_train_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/run_train_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/run_train_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/run_train_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/run_train_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/run_train_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/run_train_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/run_train_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/run_train_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/run_train_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/run_train_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/run_train_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

prep_explore_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/prep_explore_config

Configuration for prepare exploration

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/prep_explore_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/prep_explore_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_explore_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_explore_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/prep_explore_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/prep_explore_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/prep_explore_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_explore_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_explore_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/prep_explore_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_explore_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/prep_explore_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_explore_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/prep_explore_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/prep_explore_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

run_explore_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/run_explore_config

Configuration for run exploration

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/run_explore_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/run_explore_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/run_explore_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/run_explore_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/run_explore_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/run_explore_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/run_explore_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/run_explore_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/run_explore_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/run_explore_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/run_explore_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/run_explore_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/run_explore_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/run_explore_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/run_explore_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

prep_fp_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/prep_fp_config

Configuration for prepare fp

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/prep_fp_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/prep_fp_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_fp_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_fp_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/prep_fp_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/prep_fp_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/prep_fp_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_fp_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_fp_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/prep_fp_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_fp_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/prep_fp_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/prep_fp_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/prep_fp_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/prep_fp_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

run_fp_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/run_fp_config

Configuration for run fp

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/run_fp_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/run_fp_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/run_fp_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/run_fp_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/run_fp_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/run_fp_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/run_fp_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/run_fp_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/run_fp_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/run_fp_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/run_fp_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/run_fp_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/run_fp_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/run_fp_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/run_fp_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

select_confs_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/select_confs_config

Configuration for the select confs

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/select_confs_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/select_confs_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/select_confs_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/select_confs_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/select_confs_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/select_confs_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/select_confs_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/select_confs_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/select_confs_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/select_confs_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/select_confs_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/select_confs_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/select_confs_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/select_confs_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/select_confs_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

collect_data_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/collect_data_config

Configuration for the collect data

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/collect_data_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/collect_data_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/collect_data_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/collect_data_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/collect_data_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/collect_data_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/collect_data_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/collect_data_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/collect_data_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/collect_data_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/collect_data_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/collect_data_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/collect_data_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/collect_data_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/collect_data_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

cl_step_config:
type: dict, optional, default: {'template_config': {'image': 'dptechnology/dpgen2:latest', 'timeout': None, 'retry_on_transient_error': None, 'timeout_as_transient_error': False, 'envs': None}, 'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'parallelism': None, 'executor': None}
argument path: step_configs/cl_step_config

Configuration for the concurrent learning step

template_config:
type: dict, optional, default: {'image': 'dptechnology/dpgen2:latest'}
argument path: step_configs/cl_step_config/template_config

The configs passed to the PythonOPTemplate.

image:
type: str, optional, default: dptechnology/dpgen2:latest
argument path: step_configs/cl_step_config/template_config/image

The image to run the step.

timeout:
type: NoneType | int, optional, default: None
argument path: step_configs/cl_step_config/template_config/timeout

The time limit of the OP. Unit is second.

retry_on_transient_error:
type: NoneType | int, optional, default: None
argument path: step_configs/cl_step_config/template_config/retry_on_transient_error

The number of retry times if a TransientError is raised.

timeout_as_transient_error:
type: bool, optional, default: False
argument path: step_configs/cl_step_config/template_config/timeout_as_transient_error

Treat the timeout as TransientError.

envs:
type: dict | NoneType, optional, default: None
argument path: step_configs/cl_step_config/template_config/envs

The environmental variables.

template_slice_config:
type: dict, optional
argument path: step_configs/cl_step_config/template_slice_config

The configs passed to the Slices.

group_size:
type: NoneType | int, optional, default: None
argument path: step_configs/cl_step_config/template_slice_config/group_size

The number of tasks running on a single node. It is efficient for a large number of short tasks.

pool_size:
type: NoneType | int, optional, default: None
argument path: step_configs/cl_step_config/template_slice_config/pool_size

The number of tasks running at the same time on one node.

continue_on_failed:
type: bool, optional, default: False
argument path: step_configs/cl_step_config/continue_on_failed

If continue the the step is failed (FatalError, TransientError, A certain number of retrial is reached…).

continue_on_num_success:
type: NoneType | int, optional, default: None
argument path: step_configs/cl_step_config/continue_on_num_success

Only in the sliced OP case. Continue the workflow if a certain number of the sliced jobs are successful.

continue_on_success_ratio:
type: NoneType | float, optional, default: None
argument path: step_configs/cl_step_config/continue_on_success_ratio

Only in the sliced OP case. Continue the workflow if a certain ratio of the sliced jobs are successful.

parallelism:
type: NoneType | int, optional, default: None
argument path: step_configs/cl_step_config/parallelism

The parallelism for the step

executor:
type: dict | NoneType, optional, default: None
argument path: step_configs/cl_step_config/executor

The executor of the step.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: step_configs/cl_step_config/executor/type
possible choices: dispatcher

The type of the executor.

When type is set to dispatcher:

upload_python_packages:
type: str | typing.List[str] | NoneType, optional, default: None, alias: upload_python_package
argument path: upload_python_packages

Upload python package, for debug purpose

inputs:
type: dict
argument path: inputs

The input parameter and artifacts for dpgen2

type_map:
type: typing.List[str]
argument path: inputs/type_map

The type map. e.g. [“Al”, “Mg”]. Al and Mg will have type 0 and 1, respectively.

mass_map:
type: typing.List[float]
argument path: inputs/mass_map

The mass map. e.g. [27., 24.]. Al and Mg will be set with mass 27. and 24. amu, respectively.

init_data_prefix:
type: str | NoneType, optional, default: None
argument path: inputs/init_data_prefix

The prefix of initial data systems

mixed_type:
type: bool, optional, default: False
argument path: inputs/mixed_type

Use deepmd/npy/mixed format for storing training data.

do_finetune:
type: bool, optional, default: False
argument path: inputs/do_finetune

Finetune the pretrained model before the first iteration. If it is set to True, then an additional step, finetune-step, which is based on a branch of “PrepRunDPTrain,” will be added before the dpgen_step. In the finetune-step, the internal flag finetune_mode is set to “finetune,” which means SuperOP “PrepRunDPTrain” is now used as the “Finetune.” In this step, we finetune the pretrained model in the train step and modify the template after training. After that, in the normal dpgen-step, the flag do_finetune is set as “train-init,” which means we use –init-frz-model to train based on models from the previous iteration. The “do_finetune” flag is set to False by default, while the internal flag finetune_mode is set to “no,” which means anything related to finetuning will not be done.

init_data_sys:
type: str | typing.List[str]
argument path: inputs/init_data_sys

The inital data systems

train:
type: dict
argument path: train

The configuration for training

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: train/type
possible choices: dp, dp-dist

the type of the training

When type is set to dp:

config:
type: dict, optional, default: {'init_model_policy': 'no', 'init_model_old_ratio': 0.9, 'init_model_numb_steps': 400000, 'init_model_start_lr': 0.0001, 'init_model_start_pref_e': 0.1, 'init_model_start_pref_f': 100, 'init_model_start_pref_v': 0.0}
argument path: train[dp]/config

Number of models trained for evaluating the model deviation

init_model_policy:
type: str, optional, default: no
argument path: train[dp]/config/init_model_policy

The policy of init-model training. It can be

  • ‘no’: No init-model training. Traing from scratch.

  • ‘yes’: Do init-model training.

  • ‘old_data_larger_than:XXX’: Do init-model if the training data size of the previous model is larger than XXX. XXX is an int number.

init_model_old_ratio:
type: float, optional, default: 0.9
argument path: train[dp]/config/init_model_old_ratio

The frequency ratio of old data over new data

init_model_numb_steps:
type: int, optional, default: 400000, alias: init_model_stop_batch
argument path: train[dp]/config/init_model_numb_steps

The number of training steps when init-model

init_model_start_lr:
type: float, optional, default: 0.0001
argument path: train[dp]/config/init_model_start_lr

The start learning rate when init-model

init_model_start_pref_e:
type: float, optional, default: 0.1
argument path: train[dp]/config/init_model_start_pref_e

The start energy prefactor in loss when init-model

init_model_start_pref_f:
type: float, optional, default: 100
argument path: train[dp]/config/init_model_start_pref_f

The start force prefactor in loss when init-model

init_model_start_pref_v:
type: float, optional, default: 0.0
argument path: train[dp]/config/init_model_start_pref_v

The start virial prefactor in loss when init-model

numb_models:
type: int, optional, default: 4
argument path: train[dp]/numb_models

Number of models trained for evaluating the model deviation

template_script:
type: str | typing.List[str]
argument path: train[dp]/template_script

File names of the template training script. It can be a List[str], the length of which is the same as numb_models. Each template script in the list is used to train a model. Can be a str, the models share the same template training script.

init_models_paths:
type: typing.List[str] | NoneType, optional, default: None, alias: training_iter0_model_path
argument path: train[dp]/init_models_paths

the paths to initial models

When type is set to dp-dist:

config:
type: dict, optional, default: {'init_model_policy': 'no', 'init_model_old_ratio': 0.9, 'init_model_numb_steps': 400000, 'init_model_start_lr': 0.0001, 'init_model_start_pref_e': 0.1, 'init_model_start_pref_f': 100, 'init_model_start_pref_v': 0.0}
argument path: train[dp-dist]/config

Configuration of training

init_model_policy:
type: str, optional, default: no
argument path: train[dp-dist]/config/init_model_policy

The policy of init-model training. It can be

  • ‘no’: No init-model training. Traing from scratch.

  • ‘yes’: Do init-model training.

  • ‘old_data_larger_than:XXX’: Do init-model if the training data size of the previous model is larger than XXX. XXX is an int number.

init_model_old_ratio:
type: float, optional, default: 0.9
argument path: train[dp-dist]/config/init_model_old_ratio

The frequency ratio of old data over new data

init_model_numb_steps:
type: int, optional, default: 400000, alias: init_model_stop_batch
argument path: train[dp-dist]/config/init_model_numb_steps

The number of training steps when init-model

init_model_start_lr:
type: float, optional, default: 0.0001
argument path: train[dp-dist]/config/init_model_start_lr

The start learning rate when init-model

init_model_start_pref_e:
type: float, optional, default: 0.1
argument path: train[dp-dist]/config/init_model_start_pref_e

The start energy prefactor in loss when init-model

init_model_start_pref_f:
type: float, optional, default: 100
argument path: train[dp-dist]/config/init_model_start_pref_f

The start force prefactor in loss when init-model

init_model_start_pref_v:
type: float, optional, default: 0.0
argument path: train[dp-dist]/config/init_model_start_pref_v

The start virial prefactor in loss when init-model

template_script:
type: str | typing.List[str]
argument path: train[dp-dist]/template_script

File names of the template training script. It can be a List[str], the length of which is the same as numb_models. Each template script in the list is used to train a model. Can be a str, the models share the same template training script.

student_model_path:
type: str, optional
argument path: train[dp-dist]/student_model_path

The path of student model

explore:
type: dict
argument path: explore

The configuration for exploration

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: explore/type
possible choices: lmp, calypso

The type of the exploration

When type is set to lmp:

The exploration by LAMMPS simulations

config:
type: dict, optional, default: {'command': 'lmp', 'teacher_model_path': None, 'shuffle_models': False}
argument path: explore[lmp]/config

Configuration of lmp exploration

command:
type: str, optional, default: lmp
argument path: explore[lmp]/config/command

The command of LAMMPS

teacher_model_path:
type: str | BinaryFileInput | NoneType, optional, default: None
argument path: explore[lmp]/config/teacher_model_path

The teacher model in Knowledge Distillation

shuffle_models:
type: bool, optional, default: False
argument path: explore[lmp]/config/shuffle_models

Randomly pick a model from the group of models to drive theexploration MD simulation

max_numb_iter:
type: int, optional, default: 10
argument path: explore[lmp]/max_numb_iter

Maximum number of iterations per stage

fatal_at_max:
type: bool, optional, default: True
argument path: explore[lmp]/fatal_at_max

Fatal when the number of iteration per stage reaches the max_numb_iter

output_nopbc:
type: bool, optional, default: False
argument path: explore[lmp]/output_nopbc

Remove pbc of the output configurations

convergence:
type: dict
argument path: explore[lmp]/convergence

The method of convergence check.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: explore[lmp]/convergence/type

the type of the condidate selection and convergence check method.

When type is set to fixed-levels:

The configurations with force model deviation between level_f_lo, level_f_hi or virial model deviation between level_v_lo and level_v_hi are treated as candidates (The virial model deviation check is optional). The configurations will be randomly sampled from candidates for FP calculations. If the ratio of accurate (below level_f_lo and level_v_lo) is higher then conv_accuracy, the stage is treated as converged.

level_f_lo:
type: float
argument path: explore[lmp]/convergence[fixed-levels]/level_f_lo

The lower trust level of force model deviation

level_f_hi:
type: float
argument path: explore[lmp]/convergence[fixed-levels]/level_f_hi

The higher trust level of force model deviation

level_v_lo:
type: NoneType | float, optional, default: None
argument path: explore[lmp]/convergence[fixed-levels]/level_v_lo

The lower trust level of virial model deviation

level_v_hi:
type: NoneType | float, optional, default: None
argument path: explore[lmp]/convergence[fixed-levels]/level_v_hi

The higher trust level of virial model deviation

conv_accuracy:
type: float, optional, default: 0.9
argument path: explore[lmp]/convergence[fixed-levels]/conv_accuracy

If the ratio of accurate frames is larger than this value, the stage is converged

When type is set to fixed-levels-max-select:

The configurations with force model deviation between level_f_lo, level_f_hi or virial model deviation between level_v_lo and level_v_hi are treated as candidates (The virial model deviation check is optional). The configurations with maximal model deviation in the candidates are sent for FP calculations. If the ratio of accurate (below level_f_lo and level_v_lo) is higher then conv_accuracy, the stage is treated as converged.

level_f_lo:
type: float
argument path: explore[lmp]/convergence[fixed-levels-max-select]/level_f_lo

The lower trust level of force model deviation

level_f_hi:
type: float
argument path: explore[lmp]/convergence[fixed-levels-max-select]/level_f_hi

The higher trust level of force model deviation

level_v_lo:
type: NoneType | float, optional, default: None
argument path: explore[lmp]/convergence[fixed-levels-max-select]/level_v_lo

The lower trust level of virial model deviation

level_v_hi:
type: NoneType | float, optional, default: None
argument path: explore[lmp]/convergence[fixed-levels-max-select]/level_v_hi

The higher trust level of virial model deviation

conv_accuracy:
type: float, optional, default: 0.9
argument path: explore[lmp]/convergence[fixed-levels-max-select]/conv_accuracy

If the ratio of accurate frames is larger than this value, the stage is converged

When type is set to adaptive-lower:

The method of adaptive adjust the lower trust levels. In each step of iterations, a number (set by numb_candi_f or numb_candi_v) or a ratio (set by rate_candi_f or rate_candi_v) of configurations with a model deviation lower than the higher trust level (level_f_hi, level_v_hi) are treated as candidates. The lowest model deviation of the candidates are treated as the lower trust level. If the lower trust level does not change significant (controlled by conv_tolerance) in n_checked_steps, the stage is treated as converged.

level_f_hi:
type: float, optional, default: 0.5
argument path: explore[lmp]/convergence[adaptive-lower]/level_f_hi

The higher trust level of force model deviation

numb_candi_f:
type: int, optional, default: 200
argument path: explore[lmp]/convergence[adaptive-lower]/numb_candi_f

The number of force frames that has a model deviation lower than level_f_hi treated as candidate.

rate_candi_f:
type: float, optional, default: 0.01
argument path: explore[lmp]/convergence[adaptive-lower]/rate_candi_f

The ratio of force frames that has a model deviation lower than level_f_hi treated as candidate.

level_v_hi:
type: NoneType | float, optional, default: None
argument path: explore[lmp]/convergence[adaptive-lower]/level_v_hi

The higher trust level of virial model deviation

numb_candi_v:
type: int, optional, default: 0
argument path: explore[lmp]/convergence[adaptive-lower]/numb_candi_v

The number of virial frames that has a model deviation lower than level_v_hi treated as candidate.

rate_candi_v:
type: float, optional, default: 0.0
argument path: explore[lmp]/convergence[adaptive-lower]/rate_candi_v

The ratio of virial frames that has a model deviation lower than level_v_hi treated as candidate.

n_checked_steps:
type: int, optional, default: 2
argument path: explore[lmp]/convergence[adaptive-lower]/n_checked_steps

The number of steps to check the convergence.

conv_tolerance:
type: float, optional, default: 0.05
argument path: explore[lmp]/convergence[adaptive-lower]/conv_tolerance

The convergence tolerance.

candi_sel_prob:
type: str, optional, default: uniform
argument path: explore[lmp]/convergence[adaptive-lower]/candi_sel_prob

The method for selecting candidates. It can be ‘uniform’: all candidates are of the same probability. ‘inv_pop_f’ or ‘inv_pop_f:nhist’: the probability is inversely propotional to the population of a histogram between leven_f_lo and level_f_hi. The number of bins in the histogram is set by nhist, which should be an integer. The default is 10.

configurations:
type: list, alias: configuration
argument path: explore[lmp]/configurations

A list of initial configurations.

This argument takes a list with each element containing the following:

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: explore[lmp]/configurations/type
possible choices: alloy, file

the type of the initial configuration generator.

When type is set to alloy:

Generate alloys with a certain lattice or user proided structure, the elements randomly occuping the lattice with user provided probability .

numb_confs:
type: int, optional, default: 1
argument path: explore[lmp]/configurations[alloy]/numb_confs

The number of configurations to generate

lattice:
type: list | tuple
argument path: explore[lmp]/configurations[alloy]/lattice

The lattice. Should be a list providing [ “lattice_type”, lattice_const ], or a list providing [ “/path/to/dpdata/system”, “fmt” ]. The two styles are distinguished by the type of the second element. Currently “lattice_type” can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”.

replicate:
type: list | NoneType, optional, default: None
argument path: explore[lmp]/configurations[alloy]/replicate

The number of replicates in each direction

concentration:
type: list | NoneType, optional, default: None
argument path: explore[lmp]/configurations[alloy]/concentration

The concentration of each element. List[List[float]] or List[float] or None. If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac:
type: float, optional, default: 0.0
argument path: explore[lmp]/configurations[alloy]/cell_pert_frac

The faction of cell perturbation

atom_pert_dist:
type: float, optional, default: 0.0
argument path: explore[lmp]/configurations[alloy]/atom_pert_dist

The distance of atomic position perturbation

When type is set to file:

Generate alloys from user provided file(s). The file(s) are assume to be load by dpdata.

files:
type: str | list
argument path: explore[lmp]/configurations[file]/files

The paths to the configuration files. widecards are supported.

prefix:
type: str | NoneType, optional, default: None
argument path: explore[lmp]/configurations[file]/prefix

The prefix of file paths.

fmt:
type: str, optional, default: auto
argument path: explore[lmp]/configurations[file]/fmt

The format (dpdata accepted formats) of the files.

remove_pbc:
type: bool, optional, default: False
argument path: explore[lmp]/configurations[file]/remove_pbc

The remove the pbc of the data. shift the coords to the center of box so it can be used with lammps.

stages:
type: typing.List[typing.List[dict]]
argument path: explore[lmp]/stages

The definition of exploration stages of type List[List[ExplorationTaskGroup]. The outer list provides the enumeration of the exploration stages. Then each stage is defined by a list of exploration task groups. Each task group is described in the task group definition

When type is set to calypso:

The exploration by CALYPSO structure prediction

config:
type: dict, optional, default: {'command': 'lmp', 'teacher_model_path': None, 'shuffle_models': False}
argument path: explore[calypso]/config

Configuration of lmp exploration

command:
type: str, optional, default: lmp
argument path: explore[calypso]/config/command

The command of LAMMPS

teacher_model_path:
type: str | BinaryFileInput | NoneType, optional, default: None
argument path: explore[calypso]/config/teacher_model_path

The teacher model in Knowledge Distillation

shuffle_models:
type: bool, optional, default: False
argument path: explore[calypso]/config/shuffle_models

Randomly pick a model from the group of models to drive theexploration MD simulation

max_numb_iter:
type: int, optional, default: 10
argument path: explore[calypso]/max_numb_iter

Maximum number of iterations per stage

fatal_at_max:
type: bool, optional, default: True
argument path: explore[calypso]/fatal_at_max

Fatal when the number of iteration per stage reaches the max_numb_iter

output_nopbc:
type: bool, optional, default: False
argument path: explore[calypso]/output_nopbc

Remove pbc of the output configurations

convergence:
type: dict
argument path: explore[calypso]/convergence

The method of convergence check.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: explore[calypso]/convergence/type

the type of the condidate selection and convergence check method.

When type is set to fixed-levels:

The configurations with force model deviation between level_f_lo, level_f_hi or virial model deviation between level_v_lo and level_v_hi are treated as candidates (The virial model deviation check is optional). The configurations will be randomly sampled from candidates for FP calculations. If the ratio of accurate (below level_f_lo and level_v_lo) is higher then conv_accuracy, the stage is treated as converged.

level_f_lo:
type: float
argument path: explore[calypso]/convergence[fixed-levels]/level_f_lo

The lower trust level of force model deviation

level_f_hi:
type: float
argument path: explore[calypso]/convergence[fixed-levels]/level_f_hi

The higher trust level of force model deviation

level_v_lo:
type: NoneType | float, optional, default: None
argument path: explore[calypso]/convergence[fixed-levels]/level_v_lo

The lower trust level of virial model deviation

level_v_hi:
type: NoneType | float, optional, default: None
argument path: explore[calypso]/convergence[fixed-levels]/level_v_hi

The higher trust level of virial model deviation

conv_accuracy:
type: float, optional, default: 0.9
argument path: explore[calypso]/convergence[fixed-levels]/conv_accuracy

If the ratio of accurate frames is larger than this value, the stage is converged

When type is set to fixed-levels-max-select:

The configurations with force model deviation between level_f_lo, level_f_hi or virial model deviation between level_v_lo and level_v_hi are treated as candidates (The virial model deviation check is optional). The configurations with maximal model deviation in the candidates are sent for FP calculations. If the ratio of accurate (below level_f_lo and level_v_lo) is higher then conv_accuracy, the stage is treated as converged.

level_f_lo:
type: float
argument path: explore[calypso]/convergence[fixed-levels-max-select]/level_f_lo

The lower trust level of force model deviation

level_f_hi:
type: float
argument path: explore[calypso]/convergence[fixed-levels-max-select]/level_f_hi

The higher trust level of force model deviation

level_v_lo:
type: NoneType | float, optional, default: None
argument path: explore[calypso]/convergence[fixed-levels-max-select]/level_v_lo

The lower trust level of virial model deviation

level_v_hi:
type: NoneType | float, optional, default: None
argument path: explore[calypso]/convergence[fixed-levels-max-select]/level_v_hi

The higher trust level of virial model deviation

conv_accuracy:
type: float, optional, default: 0.9
argument path: explore[calypso]/convergence[fixed-levels-max-select]/conv_accuracy

If the ratio of accurate frames is larger than this value, the stage is converged

When type is set to adaptive-lower:

The method of adaptive adjust the lower trust levels. In each step of iterations, a number (set by numb_candi_f or numb_candi_v) or a ratio (set by rate_candi_f or rate_candi_v) of configurations with a model deviation lower than the higher trust level (level_f_hi, level_v_hi) are treated as candidates. The lowest model deviation of the candidates are treated as the lower trust level. If the lower trust level does not change significant (controlled by conv_tolerance) in n_checked_steps, the stage is treated as converged.

level_f_hi:
type: float, optional, default: 0.5
argument path: explore[calypso]/convergence[adaptive-lower]/level_f_hi

The higher trust level of force model deviation

numb_candi_f:
type: int, optional, default: 200
argument path: explore[calypso]/convergence[adaptive-lower]/numb_candi_f

The number of force frames that has a model deviation lower than level_f_hi treated as candidate.

rate_candi_f:
type: float, optional, default: 0.01
argument path: explore[calypso]/convergence[adaptive-lower]/rate_candi_f

The ratio of force frames that has a model deviation lower than level_f_hi treated as candidate.

level_v_hi:
type: NoneType | float, optional, default: None
argument path: explore[calypso]/convergence[adaptive-lower]/level_v_hi

The higher trust level of virial model deviation

numb_candi_v:
type: int, optional, default: 0
argument path: explore[calypso]/convergence[adaptive-lower]/numb_candi_v

The number of virial frames that has a model deviation lower than level_v_hi treated as candidate.

rate_candi_v:
type: float, optional, default: 0.0
argument path: explore[calypso]/convergence[adaptive-lower]/rate_candi_v

The ratio of virial frames that has a model deviation lower than level_v_hi treated as candidate.

n_checked_steps:
type: int, optional, default: 2
argument path: explore[calypso]/convergence[adaptive-lower]/n_checked_steps

The number of steps to check the convergence.

conv_tolerance:
type: float, optional, default: 0.05
argument path: explore[calypso]/convergence[adaptive-lower]/conv_tolerance

The convergence tolerance.

candi_sel_prob:
type: str, optional, default: uniform
argument path: explore[calypso]/convergence[adaptive-lower]/candi_sel_prob

The method for selecting candidates. It can be ‘uniform’: all candidates are of the same probability. ‘inv_pop_f’ or ‘inv_pop_f:nhist’: the probability is inversely propotional to the population of a histogram between leven_f_lo and level_f_hi. The number of bins in the histogram is set by nhist, which should be an integer. The default is 10.

configurations:
type: list, alias: configuration
argument path: explore[calypso]/configurations

A list of initial configurations.

This argument takes a list with each element containing the following:

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: explore[calypso]/configurations/type
possible choices: alloy, file

the type of the initial configuration generator.

When type is set to alloy:

Generate alloys with a certain lattice or user proided structure, the elements randomly occuping the lattice with user provided probability .

numb_confs:
type: int, optional, default: 1
argument path: explore[calypso]/configurations[alloy]/numb_confs

The number of configurations to generate

lattice:
type: list | tuple
argument path: explore[calypso]/configurations[alloy]/lattice

The lattice. Should be a list providing [ “lattice_type”, lattice_const ], or a list providing [ “/path/to/dpdata/system”, “fmt” ]. The two styles are distinguished by the type of the second element. Currently “lattice_type” can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”.

replicate:
type: list | NoneType, optional, default: None
argument path: explore[calypso]/configurations[alloy]/replicate

The number of replicates in each direction

concentration:
type: list | NoneType, optional, default: None
argument path: explore[calypso]/configurations[alloy]/concentration

The concentration of each element. List[List[float]] or List[float] or None. If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac:
type: float, optional, default: 0.0
argument path: explore[calypso]/configurations[alloy]/cell_pert_frac

The faction of cell perturbation

atom_pert_dist:
type: float, optional, default: 0.0
argument path: explore[calypso]/configurations[alloy]/atom_pert_dist

The distance of atomic position perturbation

When type is set to file:

Generate alloys from user provided file(s). The file(s) are assume to be load by dpdata.

files:
type: str | list
argument path: explore[calypso]/configurations[file]/files

The paths to the configuration files. widecards are supported.

prefix:
type: str | NoneType, optional, default: None
argument path: explore[calypso]/configurations[file]/prefix

The prefix of file paths.

fmt:
type: str, optional, default: auto
argument path: explore[calypso]/configurations[file]/fmt

The format (dpdata accepted formats) of the files.

remove_pbc:
type: bool, optional, default: False
argument path: explore[calypso]/configurations[file]/remove_pbc

The remove the pbc of the data. shift the coords to the center of box so it can be used with lammps.

stages:
type: typing.List[typing.List[dict]]
argument path: explore[calypso]/stages

The definition of exploration stages of type List[List[ExplorationTaskGroup]. The outer list provides the enumeration of the exploration stages. Then each stage is defined by a list of exploration task groups. Each task group is described in the task group definition

fp:
type: dict
argument path: fp

The configuration for FP

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: fp/type
possible choices: vasp, gaussian, deepmd, fpop_abacus

the type of the fp

When type is set to vasp:

inputs_config:
type: dict
argument path: fp[vasp]/inputs_config

Configuration for preparing vasp inputs

incar:
type: str
argument path: fp[vasp]/inputs_config/incar

The path to the template incar file

pp_files:
type: dict
argument path: fp[vasp]/inputs_config/pp_files

The pseudopotential files set by a dict, e.g. {“Al” : “path/to/the/al/pp/file”, “Mg” : “path/to/the/mg/pp/file”}

kspacing:
type: float
argument path: fp[vasp]/inputs_config/kspacing

The spacing of k-point sampling. ksapcing will overwrite the incar template

kgamma:
type: bool, optional, default: True
argument path: fp[vasp]/inputs_config/kgamma

If the k-mesh includes the gamma point. kgamma will overwrite the incar template

run_config:
type: dict
argument path: fp[vasp]/run_config

Configuration for running vasp tasks

command:
type: str, optional, default: vasp
argument path: fp[vasp]/run_config/command

The command of VASP

out:
type: str, optional, default: data
argument path: fp[vasp]/run_config/out

The output dir name of labeled data. In deepmd/npy format provided by dpdata.

log:
type: str, optional, default: fp.log
argument path: fp[vasp]/run_config/log

The log file name of VASP

task_max:
type: int, optional, default: 10
argument path: fp[vasp]/task_max

Maximum number of vasp tasks for each iteration

When type is set to gaussian:

inputs_config:
type: dict
argument path: fp[gaussian]/inputs_config

Configuration for preparing vasp inputs

keywords:
type: str | list
argument path: fp[gaussian]/inputs_config/keywords

Gaussian keywords, e.g. force b3lyp/6-31g**. If a list, run multiple steps.

multiplicity:
type: str | int, optional, default: auto
argument path: fp[gaussian]/inputs_config/multiplicity

spin multiplicity state. It can be a number. If auto, multiplicity will be detected automatically, with the following rules:

fragment_guesses=True multiplicity will +1 for each radical, and +2 for each oxygen molecule

fragment_guesses=False multiplicity will be 1 or 2, but +2 for each oxygen molecule.

charge:
type: int, optional, default: 0
argument path: fp[gaussian]/inputs_config/charge

molecule charge. Only used when charge is not provided by the system

basis_set:
type: str, optional
argument path: fp[gaussian]/inputs_config/basis_set

custom basis set

keywords_high_multiplicity:
type: str, optional
argument path: fp[gaussian]/inputs_config/keywords_high_multiplicity

keywords for points with multiple raicals. multiplicity should be auto. If not set, fallback to normal keywords

fragment_guesses:
type: bool, optional, default: False
argument path: fp[gaussian]/inputs_config/fragment_guesses

initial guess generated from fragment guesses. If True, multiplicity should be auto

nproc:
type: int, optional, default: 1
argument path: fp[gaussian]/inputs_config/nproc

Number of CPUs to use

run_config:
type: dict
argument path: fp[gaussian]/run_config

Configuration for running vasp tasks

command:
type: str, optional, default: g16
argument path: fp[gaussian]/run_config/command

The command of Gaussian

out:
type: str, optional, default: data
argument path: fp[gaussian]/run_config/out

The output dir name of labeled data. In deepmd/npy format provided by dpdata.

task_max:
type: int, optional, default: 10
argument path: fp[gaussian]/task_max

Maximum number of vasp tasks for each iteration

When type is set to deepmd:

inputs_config:
type: dict
argument path: fp[deepmd]/inputs_config

Configuration for preparing vasp inputs

run_config:
type: dict
argument path: fp[deepmd]/run_config

Configuration for running vasp tasks

teacher_model_path:
type: str | BinaryFileInput
argument path: fp[deepmd]/run_config/teacher_model_path

The path of teacher model, which can be loaded by deepmd.infer.DeepPot

out:
type: str, optional, default: data
argument path: fp[deepmd]/run_config/out

The output dir name of labeled data. In deepmd/npy format provided by dpdata.

log:
type: str, optional, default: fp.log
argument path: fp[deepmd]/run_config/log

The log file name of dp

task_max:
type: int, optional, default: 10
argument path: fp[deepmd]/task_max

Maximum number of vasp tasks for each iteration

When type is set to fpop_abacus:

inputs_config:
type: dict
argument path: fp[fpop_abacus]/inputs_config

Configuration for preparing vasp inputs

input_file:
type: str
argument path: fp[fpop_abacus]/inputs_config/input_file

A template INPUT file.

pp_files:
type: dict
argument path: fp[fpop_abacus]/inputs_config/pp_files

The pseudopotential files for the elements. For example: {“H”: “/path/to/H.upf”, “O”: “/path/to/O.upf”}.

element_mass:
type: dict | NoneType, optional, default: None
argument path: fp[fpop_abacus]/inputs_config/element_mass

Specify the mass of some elements. For example: {“H”: 1.0079, “O”: 15.9994}.

kpt_file:
type: str | NoneType, optional, default: None
argument path: fp[fpop_abacus]/inputs_config/kpt_file

The KPT file, by default None.

orb_files:
type: dict | NoneType, optional, default: None
argument path: fp[fpop_abacus]/inputs_config/orb_files

The numerical orbital fiels for the elements, by default None. For example: {“H”: “/path/to/H.orb”, “O”: “/path/to/O.orb”}.

deepks_descriptor:
type: str | NoneType, optional, default: None
argument path: fp[fpop_abacus]/inputs_config/deepks_descriptor

The deepks descriptor file, by default None.

deepks_model:
type: str | NoneType, optional, default: None
argument path: fp[fpop_abacus]/inputs_config/deepks_model

The deepks model file, by default None.

run_config:
type: dict
argument path: fp[fpop_abacus]/run_config

Configuration for running vasp tasks

command:
type: str, optional, default: abacus
argument path: fp[fpop_abacus]/run_config/command

The command of abacus

out:
type: str, optional, default: data
argument path: fp[fpop_abacus]/run_config/out

The output dir name of labeled data. In deepmd/npy format provided by dpdata.

task_max:
type: int, optional, default: 10
argument path: fp[fpop_abacus]/task_max

Maximum number of vasp tasks for each iteration

name:
type: str, optional, default: dpgen
argument path: name

The workflow name, ‘dpgen’ for default

Task group definition

LAMMPS task group

task_group:
type: dict
argument path: task_group

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: task_group/type

the type of the task group

When type is set to lmp-md (or its alias lmp-npt):

Lammps MD tasks. DPGEN will generate the lammps input script

conf_idx:
type: list, alias: sys_idx
argument path: task_group[lmp-md]/conf_idx

The configurations of configurations[conf_idx] will be used to generate the initial configurations of the tasks. This key provides the index of selected item in the configurations array.

n_sample:
type: NoneType | int, optional, default: None
argument path: task_group[lmp-md]/n_sample

Number of configurations. If this number is smaller than the number of configruations in configruations[conf_idx], then n_sample configruations are randomly sampled from configruations[conf_idx], otherwise all configruations in configruations[conf_idx] will be used. If not provided, all configruations in configruations[conf_idx] will be used.

temps:
type: list, alias: Ts
argument path: task_group[lmp-md]/temps

A list of temperatures in K. Also used to initialize the temperature

press:
type: list, optional, alias: Ps
argument path: task_group[lmp-md]/press

A list of pressures in bar.

ens:
type: str, optional, default: nve, alias: ensemble
argument path: task_group[lmp-md]/ens

The ensemble. Allowd options are ‘nve’, ‘nvt’, ‘npt’, ‘npt-a’, ‘npt-t’. ‘npt-a’ stands for anisotrpic box sampling and ‘npt-t’ stands for triclinic box sampling.

dt:
type: float, optional, default: 0.001
argument path: task_group[lmp-md]/dt

The time step

nsteps:
type: int, optional, default: 100
argument path: task_group[lmp-md]/nsteps

The number of steps

trj_freq:
type: int, optional, default: 10, aliases: t_freq, trj_freq, traj_freq
argument path: task_group[lmp-md]/trj_freq

The number of steps

tau_t:
type: float, optional, default: 0.05
argument path: task_group[lmp-md]/tau_t

The time scale of thermostat

tau_p:
type: float, optional, default: 0.5
argument path: task_group[lmp-md]/tau_p

The time scale of barostat

pka_e:
type: NoneType | float, optional, default: None
argument path: task_group[lmp-md]/pka_e

The energy of primary knock-on atom

neidelay:
type: NoneType | int, optional, default: None
argument path: task_group[lmp-md]/neidelay

The delay of updating the neighbor list

no_pbc:
type: bool, optional, default: False
argument path: task_group[lmp-md]/no_pbc

Not using the periodic boundary condition

use_clusters:
type: bool, optional, default: False
argument path: task_group[lmp-md]/use_clusters

Calculate atomic model deviation

relative_f_epsilon:
type: NoneType | float, optional, default: None
argument path: task_group[lmp-md]/relative_f_epsilon

Calculate relative force model deviation

relative_v_epsilon:
type: NoneType | float, optional, default: None
argument path: task_group[lmp-md]/relative_v_epsilon

Calculate relative virial model deviation

When type is set to lmp-template:

Lammps MD tasks defined by templates. User provide lammps (and plumed) template for lammps tasks. The variables in templates are revised by the revisions key. Notice that the lines for pair style, dump and plumed are reserved for the revision of dpgen2, and the users should not write these lines by themselves. Rather, users notify dpgen2 the poistion of the line for pair_style by writting ‘pair_style deepmd’, the line for dump by writting ‘dump dpgen_dump’. If plumed is used, the line for fix plumed shouldbe written exactly as ‘fix dpgen_plm’.

conf_idx:
type: list, alias: sys_idx
argument path: task_group[lmp-template]/conf_idx

The configurations of configurations[conf_idx] will be used to generate the initial configurations of the tasks. This key provides the index of selected item in the configurations array.

n_sample:
type: NoneType | int, optional, default: None
argument path: task_group[lmp-template]/n_sample

Number of configurations. If this number is smaller than the number of configruations in configruations[conf_idx], then n_sample configruations are randomly sampled from configruations[conf_idx], otherwise all configruations in configruations[conf_idx] will be used. If not provided, all configruations in configruations[conf_idx] will be used.

lmp_template_fname:
type: str, aliases: lmp_template, lmp
argument path: task_group[lmp-template]/lmp_template_fname

The file name of lammps input template

plm_template_fname:
type: str | NoneType, optional, default: None, aliases: plm_template, plm
argument path: task_group[lmp-template]/plm_template_fname

The file name of plumed input template

revisions:
type: dict, optional, default: {}
argument path: task_group[lmp-template]/revisions
traj_freq:
type: int, optional, default: 10, aliases: t_freq, trj_freq, trj_freq
argument path: task_group[lmp-template]/traj_freq

The frequency of dumping configurations and thermodynamic states

When type is set to customized-lmp-template:

Lammps MD tasks defined by user customized shell commands and templates. User provided shell script generates a series of folders, and each folder contains a lammps template task group.

conf_idx:
type: list, alias: sys_idx
argument path: task_group[customized-lmp-template]/conf_idx

The configurations of configurations[conf_idx] will be used to generate the initial configurations of the tasks. This key provides the index of selected item in the configurations array.

n_sample:
type: NoneType | int, optional, default: None
argument path: task_group[customized-lmp-template]/n_sample

Number of configurations. If this number is smaller than the number of configruations in configruations[conf_idx], then n_sample configruations are randomly sampled from configruations[conf_idx], otherwise all configruations in configruations[conf_idx] will be used. If not provided, all configruations in configruations[conf_idx] will be used.

custom_shell_commands:
type: list
argument path: task_group[customized-lmp-template]/custom_shell_commands

Customized shell commands to be run for each configuration. The commands require input_lmp_conf_name as input conf file, input_lmp_tmpl_name and input_plm_tmpl_name as templates, and input_extra_files as extra input files. By running the commands a series folders in pattern output_dir_pattern are supposed to be generated, and each folder is supposed to contain a configuration file output_lmp_conf_name, a lammps template file output_lmp_tmpl_name and a plumed template file output_plm_tmpl_name.

revisions:
type: dict, optional, default: {}
argument path: task_group[customized-lmp-template]/revisions

The revisions. Should be a dict providing the key - list of desired values pair. Key is the word to be replaced in the templates, and it may appear in both the lammps and plumed input templates. All values in the value list will be enmerated.

traj_freq:
type: int, optional, default: 10, aliases: t_freq, trj_freq, trj_freq
argument path: task_group[customized-lmp-template]/traj_freq

The frequency of dumping configurations and thermodynamic states

input_lmp_conf_name:
type: str, optional, default: conf.lmp
argument path: task_group[customized-lmp-template]/input_lmp_conf_name

Input conf file name for the shell commands.

input_lmp_tmpl_name:
type: str, optional, default: in.lammps, aliases: lmp_template, lmp
argument path: task_group[customized-lmp-template]/input_lmp_tmpl_name

The file name of lammps input template

input_plm_tmpl_name:
type: str | NoneType, optional, default: None, aliases: plm_template, plm
argument path: task_group[customized-lmp-template]/input_plm_tmpl_name

The file name of plumed input template

input_extra_files:
type: list, optional, default: []
argument path: task_group[customized-lmp-template]/input_extra_files

Extra files that may be needed to execute the shell commands

output_dir_pattern:
type: str | list, optional, default: *
argument path: task_group[customized-lmp-template]/output_dir_pattern

Pattern of resultant folders generated by the shell commands.

output_lmp_conf_name:
type: str, optional, default: conf.lmp
argument path: task_group[customized-lmp-template]/output_lmp_conf_name

Generated conf file name.

output_lmp_tmpl_name:
type: str, optional, default: in.lammps
argument path: task_group[customized-lmp-template]/output_lmp_tmpl_name

Generated lmp input file name.

output_plm_tmpl_name:
type: str, optional, default: input.plumed
argument path: task_group[customized-lmp-template]/output_plm_tmpl_name

Generated plm input file name.

CALYPSO task group

task_group:
type: dict
argument path: task_group

CALYPSO structure prediction tasks. DPGEN will generate the calypso input script

numb_of_species:
type: int
argument path: task_group/numb_of_species

number of species.

name_of_atoms:
type: list
argument path: task_group/name_of_atoms

name of atoms.

atomic_number:
type: list
argument path: task_group/atomic_number

atomic number of each element.

numb_of_atoms:
type: list
argument path: task_group/numb_of_atoms

number of each atom.

distance_of_ions:
type: list
argument path: task_group/distance_of_ions

the distance matrix between different elements.

pop_size:
type: int, optional, default: 30
argument path: task_group/pop_size

the number of structures would be generated in each step.

max_step:
type: int, optional, default: 5
argument path: task_group/max_step

the max iteration number of CALYPSO loop.

system_name:
type: str, optional, default: CALYPSO
argument path: task_group/system_name

system name.

numb_of_formula:
type: list, optional, default: [1, 1]
argument path: task_group/numb_of_formula

the formula range of simulation cell.

pressure:
type: float, optional, default: 0.001
argument path: task_group/pressure

the aim pressure (in Kbar) when using MLP to optimize structures.

fmax:
type: float, optional, default: 0.01
argument path: task_group/fmax

the converge criterion. The force on all individual atoms should be less than fmax.

volume:
type: float, optional, default: 0
argument path: task_group/volume

the volume of simulation cell in one formula.

ialgo:
type: int, optional, default: 2
argument path: task_group/ialgo

the evolution algorithm of CALYPSO. 1: global pso, 2: local pso, 3: sabc.

pso_ratio:
type: float, optional, default: 0.6
argument path: task_group/pso_ratio

the ratio of structures generated by evolution algorithm in one step.

icode:
type: int, optional, default: 15
argument path: task_group/icode

the software of structure optimization. 1: VASP, 15: DP.

numb_of_lbest:
type: int, optional, default: 4
argument path: task_group/numb_of_lbest

the number of evolution direction when using LPSO.

numb_of_local_optim:
type: int, optional, default: 3
argument path: task_group/numb_of_local_optim

the number of making structure optimization when using dft.

command:
type: str, optional, default: sh submit.sh
argument path: task_group/command

the command of running structure optimization.

max_time:
type: int, optional, default: 9000
argument path: task_group/max_time

the max time (in second) of structure optimization.

pick_up:
type: bool, optional, default: False
argument path: task_group/pick_up

whether to continue the calculation.

pick_step:
type: int, optional, default: 0
argument path: task_group/pick_step

from which step to continue the calculation.

parallel:
type: bool, optional, default: False
argument path: task_group/parallel

whether to run calypso in parallel.

split:
type: bool, optional, default: True
argument path: task_group/split

sperate generating structures and structure optimizations. in dpgen2, split must be True.

spec_space_group:
type: list, optional, default: [2, 230]
argument path: task_group/spec_space_group

the range of spacegroup.

vsc:
type: bool, optional, default: False
argument path: task_group/vsc

whether to run calypso in variational stoichiometry way.

ctrl_range:
type: list, optional, default: [[1, 10]]
argument path: task_group/ctrl_range

the atom range of each atoms.

max_numb_atoms:
type: int, optional, default: 100
argument path: task_group/max_numb_atoms

the max number of atoms.

Developers’ guide

The concurrent learning algorithm

DPGEN2 implements the concurrent learning algorithm named DP-GEN, described in this paper. It is noted that other types of workflows, like active learning, should be easily implemented within the infrastructure of DPGEN2.

The DP-GEN algorithm is iterative. In each iteration, four steps are consecutively executed: training, exploration, selection, and labeling.

  1. Training. A set of DP models are trained with the same dataset and the same hyperparameters. The only difference is the random seed initializing the model parameters.

  2. Exploration. One of the DP models is used to explore the configuration space. The strategy of exploration highly depends on the purpose of the application case of the model. The simulation technique for exploration can be molecular dynamics, Monte Carlo, structure search/optimization, enhanced sampling, or any combination of them. Current DPGEN2 only supports exploration based on molecular simulation platform LAMMPS.

  3. Selection. Not all the explored configurations are labeled, rather, the model prediction errors on the configurations are estimated by the model deviation, which is defined as the standard deviation in predictions of the set of the models. The critical configurations with large and not-that-large errors are selected for labeling. The configurations with very large errors are not selected because the large error is usually caused by non-physical configurations, e.g. overlapping atoms.

  4. Labeling. The selected configurations are labeled with energy, forces, and virial calculated by a method of first-principles accuracy. The usually used method is the density functional theory implemented in VASP, Quantum Expresso, CP2K, and etc.. The labeled data are finally added to the training dataset to start the next iteration.

In each iteration, the quality of the model is improved by selecting and labeling more critical data and adding them to the training dataset. The DP-GEN iteration is converged when no more critical data can be selected.

Overview of the DPGEN2 Implementation

The implementation DPGEN2 is based on the workflow platform dflow, which is a python wrapper of the Argo Workflows, an open-source container-native workflow engine on Kubernetes.

The DP-GEN algorithm is conceptually modeled as a computational graph. The implementation is then considered as two lines: the operators and the workflow.

  1. Operators. Operators are implemented in Python v3. The operators should be implemented and tested without the workflow.

  2. Workflow. Workflow is implemented on dflow. Ideally, the workflow is implemented and tested with all operators mocked.

The DPGEN2 workflow

The workflow of DPGEN2 is illustrated in the following figure

dpgen flowchart

In the center is the block operator, which is a super-OP (an OP composed by several OPs) for one DP-GEN iteration, i.e. the super-OP of the training, exploration, selection, and labeling steps. The inputs of the block OP are lmp_task_group, conf_selector and dataset.

  • lmp_task_group: definition of a group of LAMMPS tasks that explore the configuration space.

  • conf_selector: defines the rule by which the configurations are selected for labeling.

  • dataset: the training dataset.

The outputs of the block OP are

  • exploration_report: a report recording the result of the exploration. For example, home many configurations are accurate enough and how many are selected as candidates for labeling.

  • dataset_incr: the increment of the training dataset.

The dataset_incr is added to the training dataset.

The exploration_report is passed to the exploration_strategy OP. The exploration_strategy implements the strategy of exploration. It reads the exploration_report generated by each iteration (block), then tells if the iteration is converged. If not, it generates a group of LAMMPS tasks (lmp_task_group) and the criteria of selecting configurations (conf_selector). The lmp_task_group and conf_selector are then used by block of the next iteration. The iteration closes.

Inside the block operator

The inside of the super-OP block is displayed on the right-hand side of the figure. It contains the following steps to finish one DPGEN iteration

  • prep_run_dp_train: prepares training tasks of DP models and runs them.

  • prep_run_lmp: prepares the LAMMPS exploration tasks and runs them.

  • select_confs: selects configurations for labeling from the explored configurations.

  • prep_run_fp: prepares and runs first-principles tasks.

  • collect_data: collects the dataset_incr and adds it to the dataset.

The exploration strategy

The exploration strategy defines how the configuration space is explored by the concurrent learning algorithm. The design of the exploration strategy is graphically illustrated in the following figure. The exploration is composed of stages. Only the DP-GEN exploration is converged at one stage (no configuration with a large error is explored), the exploration goes to the next iteration. The whole procedure is controlled by exploration_scheduler. Each stage has its schedule, which talks to the exploration_scheduler to generate the schedule for the DP-GEN algorithm.

exploration strategy

Some concepts are explained below:

  • Exploration group. A group of LAMMPS tasks shares similar settings. For example, a group of NPT MD simulations in a certain thermodynamic space.

  • Exploration stage. The exploration_stage contains a list of exploration groups. It contains all information needed to define the lmp_task_group used by the block in the DP-GEN iteration.

  • Stage scheduler. It guarantees the convergence of the DP-GEN algorithm in each exploration_stage. If the exploration is not converged, the stage_scheduler generates lmp_task_group and conf_selector from the exploration_stage for the next iteration (probably with a different initial condition, i.e. different initial configurations and randomly generated initial velocity).

  • Exploration scheduler. The scheduler for the DP-GEN algorithm. When DP-GEN is converged in one of the stages, it goes to the next stage until all planned stages are used.

How to contribute

Anyone interested in the DPGEN2 project may contribute OPs, workflows, and exploration strategies.

Operators

There are two types of OPs in DPGEN2

  • OP. An execution unit the the workflow. It can be roughly viewed as a piece of Python script taking some input and gives some outputs. An OP cannot be used in the dflow until it is embedded in a super-OP.

  • Super-OP. An execution unite that is composed by one or more OP and/or super-OPs.

Techinically, OP is a Python class derived from dflow.python.OP. It serves as the PythonOPTemplate of dflow.Step.

The super-OP is a Python class derived from dflow.Steps. It contains dflow.Steps as building blocks, and can be used as OP template to generate a dflow.Step. The explanation of the concepts dflow.Step and dflow.Steps, one may refer to the manual of dflow.

The super-OP PrepRunDPTrain

In the following we will take the PrepRunDPTrain super-OP as an example to illustrate how to write OPs in DPGEN2.

PrepRunDPTrain is a super-OP that prepares several DeePMD-kit training tasks, and submit all of them. This super-OP is composed by two dflow.Steps building from dflow.python.OPs PrepDPTrain and RunDPTrain.

from dflow import (
    Step,
    Steps,
)
from dflow.python import(
    PythonOPTemplate,
    OP,
    Slices,
)

class PrepRunDPTrain(Steps):
    def __init__(
            self,
            name : str,
            prep_train_op : OP,
            run_train_op : OP,
            prep_train_image : str = "dflow:v1.0",
            run_train_image : str = "dflow:v1.0",
    ):
		...
        self = _prep_run_dp_train(
            self,
            self.step_keys,
            prep_train_op,
            run_train_op,
            prep_train_image = prep_train_image,
            run_train_image = run_train_image,
        )

The construction of the PrepRunDPTrain takes prepare-training OP and run-training OP and their docker images as input, and implemented in internal method _prep_run_dp_train.

def _prep_run_dp_train(
        train_steps,
        step_keys,
        prep_train_op : OP = PrepDPTrain,
        run_train_op : OP = RunDPTrain,
        prep_train_image : str = "dflow:v1.0",
        run_train_image : str = "dflow:v1.0",
):
    prep_train = Step(
        ...
        template=PythonOPTemplate(
            prep_train_op,
            image=prep_train_image,
            ...
        ),
        ...
    )
    train_steps.add(prep_train)

    run_train = Step(
        ...
        template=PythonOPTemplate(
            run_train_op,
            image=run_train_image,
            ...
        ),
        ...
    )
    train_steps.add(run_train)

    train_steps.outputs.artifacts["scripts"]._from = run_train.outputs.artifacts["script"]
    train_steps.outputs.artifacts["models"]._from = run_train.outputs.artifacts["model"]
    train_steps.outputs.artifacts["logs"]._from = run_train.outputs.artifacts["log"]
    train_steps.outputs.artifacts["lcurves"]._from = run_train.outputs.artifacts["lcurve"]

    return train_steps

In _prep_run_dp_train, two instances of dflow.Step, i.e. prep_train and run_train, generated from prep_train_op and run_train_op, respectively, are added to train_steps. Both of prep_train_op and run_train_op are OPs (python classes derived from dflow.python.OPs) that will be illustrated later. train_steps is an instance of dflow.Steps. The outputs of the second OP run_train are assigned to the outputs of the train_steps.

The prep_train prepares a list of paths, each of which contains all necessary files to start a DeePMD-kit training tasks.

The run_train slices the list of paths, and assign each item in the list to a DeePMD-kit task. The task is executed by run_train_op. This is a very nice feature of dflow, because the developer only needs to implement how one DeePMD-kit task is executed, and then all the items in the task list will be executed in parallel. See the following code to see how it works

    run_train = Step(
        'run-train',
        template=PythonOPTemplate(
            run_train_op,
            image=run_train_image,
            slices = Slices(
                "int('{{item}}')",
                input_parameter = ["task_name"],
                input_artifact = ["task_path", "init_model"],
                output_artifact = ["model", "lcurve", "log", "script"],
            ),
        ),
        parameters={
            "config" : train_steps.inputs.parameters["train_config"],
            "task_name" : prep_train.outputs.parameters["task_names"],
        },
        artifacts={
            'task_path' : prep_train.outputs.artifacts['task_paths'],
            "init_model" : train_steps.inputs.artifacts['init_models'],
            "init_data": train_steps.inputs.artifacts['init_data'],
            "iter_data": train_steps.inputs.artifacts['iter_data'],
        },
        with_sequence=argo_sequence(argo_len(prep_train.outputs.parameters["task_names"]), format=train_index_pattern),
        key = step_keys['run-train'],
    )

The input parameter "task_names" and artifacts "task_paths" and "init_model" are sliced and supplied to each DeePMD-kit task. The output artifacts of the tasks ("model", "lcurve", "log" and "script") are stacked in the same order as the input lists. These lists are assigned as the outputs of train_steps by

    train_steps.outputs.artifacts["scripts"]._from = run_train.outputs.artifacts["script"]
    train_steps.outputs.artifacts["models"]._from = run_train.outputs.artifacts["model"]
    train_steps.outputs.artifacts["logs"]._from = run_train.outputs.artifacts["log"]
    train_steps.outputs.artifacts["lcurves"]._from = run_train.outputs.artifacts["lcurve"]

The OP RunDPTrain

We will take RunDPTrain as an example to illustrate how to implement an OP in DPGEN2. The source code of this OP is found here

Firstly of all, an OP should be implemented as a derived class of dflow.python.OP.

The dflow.python.OP requires static type define for the input and output variables, i.e. the signatures of an OP. The input and output signatures of the dflow.python.OP are given by classmethods get_input_sign and get_output_sign.

from dflow.python import (
    OP,
    OPIO,
    OPIOSign,
    Artifact,
)
class RunDPTrain(OP):
    @classmethod
    def get_input_sign(cls):
        return OPIOSign({
            "config" : dict,
            "task_name" : str,
            "task_path" : Artifact(Path),
            "init_model" : Artifact(Path),
            "init_data" : Artifact(List[Path]),
            "iter_data" : Artifact(List[Path]),
        })

    @classmethod
    def get_output_sign(cls):
        return OPIOSign({
            "script" : Artifact(Path),
            "model" : Artifact(Path),
            "lcurve" : Artifact(Path),
            "log" : Artifact(Path),
        })

All items not defined as Artifact are treated as parameters of the OP. The concept of parameter and artifact are explained in the dflow document. To be short, the artifacts can be pathlib.Path or a list of pathlib.Path. The artifacts are passed by the file system. Other data structures are treated as parameters, they are passed as variables encoded in str. Therefore, a large amout of information should be stored in artifacts, otherwise they can be considered as parameters.

The operation of the OP is implemented in method execute, and are run in docker containers. Again taking the execute method of RunDPTrain as an example

    @OP.exec_sign_check
    def execute(
            self,
            ip : OPIO,
    ) -> OPIO:
        ...
        task_name = ip['task_name']
        task_path = ip['task_path']
        init_model = ip['init_model']
        init_data = ip['init_data']
        iter_data = ip['iter_data']
        ...
        work_dir = Path(task_name)
        ...
        # here copy all files in task_path to work_dir
        ...
        with set_directory(work_dir):
            fplog = open('train.log', 'w')
            def clean_before_quit():
                fplog.close()
            # train model
            command = ['dp', 'train', train_script_name]
            ret, out, err = run_command(command)
            if ret != 0:
                clean_before_quit()
                raise FatalError('dp train failed')
            fplog.write(out)
            # freeze model
            ret, out, err = run_command(['dp', 'freeze', '-o', 'frozen_model.pb'])
            if ret != 0:
                clean_before_quit()
                raise FatalError('dp freeze failed')
            fplog.write(out)
            clean_before_quit()

        return OPIO({
            "script" : work_dir / train_script_name,
            "model" : work_dir / "frozen_model.pb",
            "lcurve" : work_dir / "lcurve.out",
            "log" : work_dir / "train.log",
        })

The inputs and outputs variables are recorded in data structure dflow.python.OPIO, which is initialized by a Python dict. The keys in the input/output dict, and the types of the input/output variables will be checked against their signatures by decorator OP.exec_sign_check. If any key or type does not match, an exception will be raised.

It is noted that all input artifacts of the OP are read-only, therefore, the first step of the RunDPTrain.execute is to copy all necessary input files from the directory task_path prepared by PrepDPTrain to the working directory work_dir.

with_directory method creates the work_dir and swithes to the directory before the execution, and then exits the directoy when the task finishes or an error is raised.

In what follows, the training and model frozen bash commands are executed consecutively. The return code is check and a FatalError is raised if a non-zero code is detected.

Finally the trained model file, input script, learning curve file and the log file are recored in a dflow.python.OPIO and returned.

Exploration

DPGEN2 allows developers to contribute exploration strategies. The exploration strategy defines how the configuration space is explored by molecular simulations in each DPGEN iteration. Notice that we are not restricted to molecular dynamics, any molecular simulation is, in priciple, allowed. For example, Monte Carlo, enhanced sampling, structure optimization, and so on.

An exploration strategy takes the history of exploration as input, and gives back DPGEN the exploration tasks (we call it task group) and the rule to select configurations from the trajectories generated by the tasks (we call it configuration selector).

One can contribute from three aspects:

Stage scheduler

The stage scheduler takes an exploration report passed from the exploration scheduler as input, and tells the exploration scheduler if the exploration in the stage is converged, if not, returns a group of exploration tasks and a configuration selector that are used in the next DPGEN iteration.

Detailed explanation of the concepts are found here.

All the stage schedulers are derived from the abstract base class StageScheduler. The only interface to be implemented is StageScheduler.plan_next_iteration. One may check the doc string for the explanation of the interface.

class StageScheduler(ABC):
    """
    The scheduler for an exploration stage.
    """

    @abstractmethod
    def plan_next_iteration(
            self,
            hist_reports : List[ExplorationReport],
            report : ExplorationReport,
            confs : List[Path],
    ) -> Tuple[bool, ExplorationTaskGroup, ConfSelector] :
        """
        Make the plan for the next iteration of the stage.

        It checks the report of the current and all historical iterations of the stage,
        and tells if the iterations are converged.
        If not converged, it will plan the next ieration for the stage.

        Parameters
        ----------
        hist_reports: List[ExplorationReport]
            The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.
        report : ExplorationReport
            The exploration report of this iteration.
        confs: List[Path]
            A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

        Returns
        -------
        converged: bool
            If the stage converged.
        task: ExplorationTaskGroup
            A `ExplorationTaskGroup` defining the exploration of the next iteration. Should be `None` if the stage is converged.
        conf_selector: ConfSelector
            The configuration selector for the next iteration. Should be `None` if the stage is converged.

        """

One may check more details on the exploratin task group and the configuration selector.

Exploration task groups

DPGEN2 defines a python class ExplorationTask to manage all necessry files needed to run a exploration task. It can be used as the example provided in the doc string.

class ExplorationTask():
    """Define the files needed by an exploration task.

    Examples
    --------
    >>> # this example dumps all files needed by the task.
    >>> files = exploration_task.files()
    ... for file_name, file_content in files.items():
    ...     with open(file_name, 'w') as fp:
    ...         fp.write(file_content)

    """

A collection of the exploration tasks is called exploration task group. All tasks groups are derived from the base class ExplorationTaskGroup. The exploration task group can be viewd as a list of ExplorationTasks, one may get the list by using property ExplorationTaskGroup.task_list. One may add tasks, or ExplorationTaskGroup to the group by methods ExplorationTaskGroup.add_task and ExplorationTaskGroup.add_group, respectively.

class ExplorationTaskGroup(Sequence):
    @property
    def task_list(self) -> List[ExplorationTask]:
        """Get the `list` of `ExplorationTask`"""
        ...

    def add_task(self, task: ExplorationTask):
        """Add one task to the group."""
        ...

    def add_group(
            self,
            group : 'ExplorationTaskGroup',
    ):
        """Add another group to the group."""
        ...

An example of generating a group of NPT MD simulations may illustrate how to implement the ExplorationTaskGroups.

Configuration selector

The configuration selectors are derived from the abstract base class ConfSelector

class ConfSelector(ABC):
    """Select configurations from trajectory and model deviation files.
    """
    @abstractmethod
    def select (
            self,
            trajs : List[Path],
            model_devis : List[Path],
            traj_fmt : str = 'deepmd/npy',
            type_map : List[str] = None,
    ) -> Tuple[List[ Path ], ExplorationReport]:

The abstractmethod to implement is ConfSelector.select. trajs and model_devis are lists of files that recording the simulations trajectories and model deviations respectively. traj_fmt and type_map are parameters that may be needed for loading the trajectories by dpdata.

The ConfSelector.select returns a Path, each of which can be treated as a dpdata.MultiSystems, and a ExplorationReport.

An example of selecting configurations from LAMMPS trajectories may illustrate how to implement the ConfSelectors.

DPGEN2 API

dpgen2 package

Subpackages

dpgen2.conf package
Submodules
dpgen2.conf.alloy_conf module
class dpgen2.conf.alloy_conf.AlloyConf(lattice: System | Tuple[str, float], type_map: List[str], replicate: List[int] | Tuple[int] | int | None = None)[source]

Bases: object

Parameters:
lattice Union[dpdata.System, Tuple[str,float]]

Lattice of the alloy confs. can be dpdata.System: lattice in dpdata.System Tuple[str, float]: pair of lattice type and lattice constant. lattice type can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”

replicate Union[List[int], Tuple[int], int]

replicate of the lattice

type_map List[str]

The type map

Methods

generate_file_content(numb_confs[, ...])

Parameters:

generate_systems(numb_confs[, ...])

Parameters:

generate_file_content(numb_confs, concentration: List[List[float]] | List[float] | None = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0, fmt: str = 'lammps/lmp') List[str][source]
Parameters:
numb_confs int

Number of configurations to generate

concentration List[List[float]] or List[float] or None

If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac float

fraction of cell perturbation

atom_pert_dist float

the atom perturbation distance (unit angstrom).

fmt str

the format of the returned conf strings. Should be one of the formats supported by dpdata

Returns:
conf_list List[str]

A list of file content of configurations.

generate_systems(numb_confs, concentration: List[List[float]] | List[float] | None = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0) List[System][source]
Parameters:
numb_confs int

Number of configurations to generate

concentration List[List[float]] or List[float] or None

If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac float

fraction of cell perturbation

atom_pert_dist float

the atom perturbation distance (unit angstrom).

Returns:
conf_list List[dpdata.System]

A list of generated confs in dpdata.System.

class dpgen2.conf.alloy_conf.AlloyConfGenerator(numb_confs, lattice: System | Tuple[str, float], replicate: List[int] | Tuple[int] | int | None = None, concentration: List[List[float]] | List[float] | None = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0)[source]

Bases: ConfGenerator

Parameters:
numb_confs int

Number of configurations to generate

lattice Union[dpdata.System, Tuple[str,float]]

Lattice of the alloy confs. can be dpdata.System: lattice in dpdata.System Tuple[str, float]: pair of lattice type and lattice constant. lattice type can be “bcc”, “fcc”, “hcp”, “sc” or “diamond”

replicate Union[List[int], Tuple[int], int]

replicate of the lattice

concentration List[List[float]] or List[float] or None

If List[float], the concentrations of each element. The length of the list should be the same as the type_map. If List[List[float]], a list of concentrations (List[float]) is randomly picked from the List. If None, the elements are assumed to be of equal concentration.

cell_pert_frac float

fraction of cell perturbation

atom_pert_dist float

the atom perturbation distance (unit angstrom).

Methods

generate(type_map)

Method of generating configurations.

get_file_content(type_map[, fmt])

Get the file content of configurations

normalize_config([data, strict])

Normalized the argument.

args

doc

static args() List[Argument][source]
static doc() str[source]
generate(type_map) MultiSystems[source]

Method of generating configurations.

Parameters:
type_mapList[str]

The type map.

Returns:
confs: dpdata.MultiSystems

The returned configurations in dpdata.MultiSystems format

dpgen2.conf.alloy_conf.gen_doc(*, make_anchor=True, make_link=True, **kwargs)[source]
dpgen2.conf.alloy_conf.generate_alloy_conf_args()[source]
dpgen2.conf.alloy_conf.generate_alloy_conf_file_content(lattice: System | Tuple[str, float], type_map: List[str], numb_confs, replicate: List[int] | Tuple[int] | int | None = None, concentration: List[List[float]] | List[float] | None = None, cell_pert_frac: float = 0.0, atom_pert_dist: float = 0.0, fmt: str = 'lammps/lmp')[source]
dpgen2.conf.alloy_conf.normalize(data)[source]
dpgen2.conf.conf_generator module
class dpgen2.conf.conf_generator.ConfGenerator[source]

Bases: ABC

Methods

generate(type_map)

Method of generating configurations.

get_file_content(type_map[, fmt])

Get the file content of configurations

normalize_config([data, strict])

Normalized the argument.

args

abstract static args() List[Argument][source]
abstract generate(type_map) MultiSystems[source]

Method of generating configurations.

Parameters:
type_mapList[str]

The type map.

Returns:
confs: dpdata.MultiSystems

The returned configurations in dpdata.MultiSystems format

get_file_content(type_map, fmt='lammps/lmp') List[str][source]

Get the file content of configurations

Parameters:
type_mapList[str]

The type map.

Returns:
conf_list: List[str]

A list of file content of configurations.

classmethod normalize_config(data: Dict = {}, strict: bool = True) Dict[source]

Normalized the argument.

Parameters:
dataDict

The input dict of arguments.

strictbool

Strictly check the arguments.

Returns:
data: Dict

The normalized arguments.

dpgen2.conf.file_conf module
class dpgen2.conf.file_conf.FileConfGenerator(files: str | List[str], fmt: str = 'auto', prefix: str | None = None, remove_pbc: bool | None = False)[source]

Bases: ConfGenerator

Methods

generate(type_map)

Method of generating configurations.

get_file_content(type_map[, fmt])

Get the file content of configurations

normalize_config([data, strict])

Normalized the argument.

args

doc

generate_mixed

generate_std

static args() List[Argument][source]
static doc() str[source]
generate(type_map) MultiSystems[source]

Method of generating configurations.

Parameters:
type_mapList[str]

The type map.

Returns:
confs: dpdata.MultiSystems

The returned configurations in dpdata.MultiSystems format

generate_mixed(type_map) MultiSystems[source]
generate_std(type_map) MultiSystems[source]
dpgen2.conf.unit_cells module
class dpgen2.conf.unit_cells.BCC[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.DIAMOND[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.FCC[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.HCP[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
class dpgen2.conf.unit_cells.SC[source]

Bases: object

Methods

gen_box

numb_atoms

poscar_unit

gen_box()[source]
numb_atoms()[source]
poscar_unit(latt)[source]
dpgen2.conf.unit_cells.generate_unit_cell(crystal: str, latt: float = 1.0) System[source]
dpgen2.entrypoint package
Submodules
dpgen2.entrypoint.args module
dpgen2.entrypoint.args.bohrium_conf_args()[source]
dpgen2.entrypoint.args.default_step_config_args()[source]
dpgen2.entrypoint.args.dflow_conf_args()[source]
dpgen2.entrypoint.args.dp_dist_train_args()[source]
dpgen2.entrypoint.args.dp_train_args()[source]
dpgen2.entrypoint.args.dpgen_step_config_args(default_config)[source]
dpgen2.entrypoint.args.fp_args(inputs, run)[source]
dpgen2.entrypoint.args.gen_doc(*, make_anchor=True, make_link=True, **kwargs)[source]
dpgen2.entrypoint.args.input_args()[source]
dpgen2.entrypoint.args.lmp_args()[source]
dpgen2.entrypoint.args.normalize(data)[source]
dpgen2.entrypoint.args.submit_args(default_step_config={'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}})[source]
dpgen2.entrypoint.args.variant_conf()[source]
dpgen2.entrypoint.args.variant_conv()[source]
dpgen2.entrypoint.args.variant_explore()[source]
dpgen2.entrypoint.args.variant_fp()[source]
dpgen2.entrypoint.args.variant_train()[source]
dpgen2.entrypoint.common module
dpgen2.entrypoint.common.expand_idx(in_list) List[int][source]
dpgen2.entrypoint.common.expand_sys_str(root_dir: str | Path) List[str][source]
dpgen2.entrypoint.common.global_config_workflow(wf_config)[source]
dpgen2.entrypoint.download module
dpgen2.entrypoint.download.download(workflow_id, wf_config: Dict | None = {}, wf_keys: List | None = None, prefix: str | None = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.download.download_by_def(workflow_id, wf_config: Dict = {}, iterations: List[int] | None = None, step_defs: List[str] | None = None, prefix: str | None = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.gui module

DP-GUI entrypoint.

dpgen2.entrypoint.gui.start_dpgui(*, port: int, bind_all: bool, **kwargs)[source]

Host DP-GUI server.

Parameters:
portint

The port to serve DP-GUI on.

bind_allbool

Serve on all public interfaces. This will expose your DP-GUI instance to the network on both IPv4 and IPv6 (where available).

**kwargs

additional arguments

Raises:
ModuleNotFoundError

The dpgui package is not installed

dpgen2.entrypoint.main module
dpgen2.entrypoint.main.main()[source]
dpgen2.entrypoint.main.main_parser() ArgumentParser[source]

DPGEN2 commandline options argument parser.

Returns:
argparse.ArgumentParser

the argument parser

Notes

This function is used by documentation.

dpgen2.entrypoint.main.parse_args(args: List[str] | None = None)[source]

DPGEN2 commandline options argument parsing.

Parameters:
argsList[str]

list of command line arguments, main purpose is testing default option None takes arguments from sys.argv

dpgen2.entrypoint.showkey module
dpgen2.entrypoint.showkey.showkey(wf_id, wf_config)[source]
dpgen2.entrypoint.status module
dpgen2.entrypoint.status.status(workflow_id, wf_config: Dict | None = {})[source]
dpgen2.entrypoint.submit module
dpgen2.entrypoint.submit.copy_scheduler_plans(scheduler_new, scheduler_old)[source]
dpgen2.entrypoint.submit.fold_keys(all_step_keys)[source]
dpgen2.entrypoint.submit.get_kspacing_kgamma_from_incar(fname)[source]
dpgen2.entrypoint.submit.get_resubmit_keys(wf)[source]
dpgen2.entrypoint.submit.get_scheduler_ids(reuse_step)[source]
dpgen2.entrypoint.submit.get_superop(key)[source]
dpgen2.entrypoint.submit.make_calypso_naive_exploration_scheduler(config)[source]
dpgen2.entrypoint.submit.make_concurrent_learning_op(train_style: str = 'dp', explore_style: str = 'lmp', fp_style: str = 'vasp', prep_train_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_train_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, prep_explore_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_explore_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, prep_fp_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_fp_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, select_confs_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, collect_data_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, cl_step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]
dpgen2.entrypoint.submit.make_finetune_step(config, prep_train_config, run_train_config, upload_python_packages, numb_models, template_script, train_config, init_models, init_data, iter_data)[source]
dpgen2.entrypoint.submit.make_lmp_naive_exploration_scheduler(config)[source]
dpgen2.entrypoint.submit.make_naive_exploration_scheduler(config)[source]
dpgen2.entrypoint.submit.make_optional_parameter(mixed_type=False, finetune_mode='no')[source]
dpgen2.entrypoint.submit.print_list_steps(steps)[source]
dpgen2.entrypoint.submit.resubmit_concurrent_learning(wf_config, wfid, list_steps=False, reuse=None, replace_scheduler=False, fold=False)[source]
dpgen2.entrypoint.submit.submit_concurrent_learning(wf_config, reuse_step: List[ArgoStep] | None = None, replace_scheduler: bool = False, no_submission: bool = False)[source]
dpgen2.entrypoint.submit.successful_step_keys(wf)[source]
dpgen2.entrypoint.submit.update_reuse_step_scheduler(reuse_step, scheduler_new)[source]
dpgen2.entrypoint.submit.workflow_concurrent_learning(config: Dict) Tuple[Step, Step | None][source]
dpgen2.entrypoint.watch module
dpgen2.entrypoint.watch.update_finished_steps(wf, finished_keys: List[str] | None = None, download: bool | None = False, watching_keys: List[str] | None = None, prefix: str | None = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.watch.watch(workflow_id, wf_config: Dict | None = {}, watching_keys: List | None = ['prep-run-train', 'prep-run-lmp', 'prep-run-fp', 'collect-data'], frequency: float = 600.0, download: bool = False, prefix: str | None = None, chk_pnt: bool = False)[source]
dpgen2.entrypoint.workflow module
dpgen2.entrypoint.workflow.add_subparser_workflow_subcommand(subparsers, command: str)[source]
dpgen2.entrypoint.workflow.execute_workflow_subcommand(command: str, wfid: str, wf_config: dict | None = {})[source]
dpgen2.exploration package
Subpackages
dpgen2.exploration.deviation package
Submodules
dpgen2.exploration.deviation.deviation_manager module
class dpgen2.exploration.deviation.deviation_manager.DeviManager[source]

Bases: ABC

A class for model deviation management.

Methods

add(name, deviation)

Add a model deviation into this manager.

clear()

Clear all data in this manager.

get(name)

Gat a model deviation from this manager.

AVG_DEVI_F = 'avg_devi_f'
AVG_DEVI_V = 'avg_devi_v'
MAX_DEVI_F = 'max_devi_f'
MAX_DEVI_V = 'max_devi_v'
MIN_DEVI_F = 'min_devi_f'
MIN_DEVI_V = 'min_devi_v'
add(name: str, deviation: ndarray) None[source]

Add a model deviation into this manager.

Parameters:
namestr

The name of the deviation. The name is restricted to (DeviManager.MAX_DEVI_V, DeviManager.MIN_DEVI_V, DeviManager.AVG_DEVI_V, DeviManager.MAX_DEVI_F, DeviManager.MIN_DEVI_F, DeviManager.AVG_DEVI_F)

deviationnp.ndarray

The model deviation is a one-dimensional array extracted from a trajectory file.

abstract clear() None[source]

Clear all data in this manager.

get(name: str) List[ndarray | None][source]

Gat a model deviation from this manager.

Parameters:
namestr

The name of the deviation. The name is restricted to (DeviManager.MAX_DEVI_V, DeviManager.MIN_DEVI_V,

DeviManager.AVG_DEVI_V, DeviManager.MAX_DEVI_F, DeviManager.MIN_DEVI_F, DeviManager.AVG_DEVI_F)

dpgen2.exploration.deviation.deviation_std module
class dpgen2.exploration.deviation.deviation_std.DeviManagerStd[source]

Bases: DeviManager

The class which is responsible for model deviation management.

This is the standard implementation of DeviManager. Each deviation (e.g. max_devi_f, max_devi_v in file model_devi.out) is stored as a List[Optional[np.ndarray]], where np.array is a one-dimensional array. A List[np.ndarray][ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory. The model deviation can be List[None], where len(List[None]) is the number of trajectory files.

Methods

add(name, deviation)

Add a model deviation into this manager.

clear()

Clear all data in this manager.

get(name)

Gat a model deviation from this manager.

clear() None[source]

Clear all data in this manager.

dpgen2.exploration.render package
Submodules
dpgen2.exploration.render.traj_render module
class dpgen2.exploration.render.traj_render.TrajRender[source]

Bases: ABC

Methods

get_confs(traj, id_selected[, type_map, ...])

Get configurations from trajectory by selection.

get_model_devi(files)

Get model deviations from recording files.

abstract get_confs(traj: List[Path], id_selected: List[List[int]], type_map: List[str] | None = None, conf_filters: ConfFilters | None = None) MultiSystems[source]

Get configurations from trajectory by selection.

Parameters:
trajList[Path]

Trajectory files

id_selectedList[List[int]]

The selected frames. id_selected[ii][jj] is the jj-th selected frame from the ii-th trajectory. id_selected[ii] may be an empty list.

type_mapList[str]

The type map.

Returns:
ms: dpdata.MultiSystems

The configurations in dpdata.MultiSystems format

abstract get_model_devi(files: List[Path]) DeviManager[source]

Get model deviations from recording files.

Parameters:
filesList[Path]

The paths to the model deviation recording files

Returns:
DeviManager: The class which is responsible for model deviation management.
dpgen2.exploration.render.traj_render_lammps module
class dpgen2.exploration.render.traj_render_lammps.TrajRenderLammps(nopbc: bool = False)[source]

Bases: TrajRender

Methods

get_confs(trajs, id_selected[, type_map, ...])

Get configurations from trajectory by selection.

get_model_devi(files)

Get model deviations from recording files.

get_confs(trajs: List[Path], id_selected: List[List[int]], type_map: List[str] | None = None, conf_filters: ConfFilters | None = None) MultiSystems[source]

Get configurations from trajectory by selection.

Parameters:
trajList[Path]

Trajectory files

id_selectedList[List[int]]

The selected frames. id_selected[ii][jj] is the jj-th selected frame from the ii-th trajectory. id_selected[ii] may be an empty list.

type_mapList[str]

The type map.

Returns:
ms: dpdata.MultiSystems

The configurations in dpdata.MultiSystems format

get_model_devi(files: List[Path]) DeviManager[source]

Get model deviations from recording files.

Parameters:
filesList[Path]

The paths to the model deviation recording files

Returns:
DeviManager: The class which is responsible for model deviation management.
dpgen2.exploration.report package
Submodules
dpgen2.exploration.report.report module
class dpgen2.exploration.report.report.ExplorationReport[source]

Bases: ABC

Methods

clear()

Clear the report

converged(reports)

Check if the exploration is converged.

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(model_devi)

Record the model deviations of the trajectories

abstract clear()[source]

Clear the report

abstract converged(reports) bool[source]

Check if the exploration is converged.

Parameters:
reports

Historical reports

Returns:
converged bool

If the exploration is converged.

abstract get_candidate_ids(max_nframes: int | None = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters:
max_nframes

The maximal number of frames of candidates.

Returns:
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

no_candidate() bool[source]

If no candidate configuration is found

abstract print(stage_idx: int, idx_in_stage: int, iter_idx: int) str[source]

Print the report

abstract print_header() str[source]

Print the header of report

abstract record(model_devi: DeviManager)[source]

Record the model deviations of the trajectories

Parameters:
model_deviDeviManager

The class which is responsible for model deviation management. Model deviations is stored as a List[Optional[np.ndarray]], where np.array is a one-dimensional array. List[np.ndarray][ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory. Model deviations can be List[None], where len(List[None]) is the number of trajectory files.

dpgen2.exploration.report.report_adaptive_lower module
class dpgen2.exploration.report.report_adaptive_lower.ExplorationReportAdaptiveLower(level_f_hi: float = 0.5, numb_candi_f: int = 200, rate_candi_f: float = 0.01, level_v_hi: float | None = None, numb_candi_v: int = 0, rate_candi_v: float = 0.0, n_checked_steps: int = 2, conv_tolerance: float = 0.05, candi_sel_prob: str = 'uniform')[source]

Bases: ExplorationReport

The exploration report that adapts the lower trust level.

This report will treat a fixed number of frames that has force model deviation lower than level_f_hi, and virial model deviation lower than level_v_hi as candidates.

The number of force frames is given by max(numb_candi_f, rate_candi_f * nframes) The number of virial frames is given by max(numb_candi_v, rate_candi_v * nframes)

The lower force trust level will be set to the lowest force model deviation of the force frames. The lower virial trust level will be set to the lowest virial model deviation of the virial frames

The exploration will be treat as converged if the differences in model deviations in the neighboring steps are less than conv_tolerance in the last n_checked_steps.

Parameters:
level_f_hi float

The higher trust level of force model deviation

numb_candi_f int

The number of force frames that has a model deviation lower than level_f_hi treated as candidate.

rate_candi_f float

The ratio of force frames that has a model deviation lower than level_f_hi treated as candidate.

level_v_hi float

The higher trust level of virial model deviation

numb_candi_v int

The number of virial frames that has a model deviation lower than level_v_hi treated as candidate.

rate_candi_v float

The ratio of virial frames that has a model deviation lower than level_v_hi treated as candidate.

n_checked_steps int

The number of steps to check the convergence.

conv_tolerance float

The convergence tolerance.

candi_sel_prob str

The method for selecting candidates. It can be “uniform”: all candidates are of the same probability. “inv_pop_f” or “inv_pop_f:nhist”: the probability is inversely propotional to the population of a histogram between level_f_lo and level_f_hi. The number of bins in the histogram is set by nhist, which should be an integer. The default is 10.

Methods

clear()

Clear the report

converged(reports)

Check if the exploration is converged.

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(model_devi)

Record the model deviations of the trajectories

accurate_ratio

args

candidate_ratio

doc

failed_ratio

accurate_ratio(tag=None)[source]
static args() List[Argument][source]
candidate_ratio(tag=None)[source]
clear()[source]

Clear the report

converged(reports) bool[source]

Check if the exploration is converged.

Parameters:
reports

Historical reports

Returns:
converged bool

If the exploration is converged.

static doc() str[source]
failed_ratio(tag=None)[source]
get_candidate_ids(max_nframes: int | None = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters:
max_nframes

The maximal number of frames of candidates.

Returns:
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

print(stage_idx: int, idx_in_stage: int, iter_idx: int) str[source]

Print the report

print_header() str[source]

Print the header of report

record(model_devi: DeviManager)[source]

Record the model deviations of the trajectories

Parameters:
model_deviDeviManager

The class which is responsible for model deviation management. Model deviations is stored as a List[Optional[np.ndarray]], where np.array is a one-dimensional array. List[np.ndarray][ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory. Model deviations can be List[None], where len(List[None]) is the number of trajectory files.

dpgen2.exploration.report.report_trust_levels_base module
class dpgen2.exploration.report.report_trust_levels_base.ExplorationReportTrustLevels(level_f_lo, level_f_hi, level_v_lo=None, level_v_hi=None, conv_accuracy=0.9)[source]

Bases: ExplorationReport

Methods

clear()

Clear the report

converged([reports])

Check if the exploration is converged.

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(model_devi)

Record the model deviations of the trajectories

accurate_ratio

args

candidate_ratio

failed_ratio

accurate_ratio(tag=None)[source]
static args() List[Argument][source]
candidate_ratio(tag=None)[source]
clear()[source]

Clear the report

abstract converged(reports: List[ExplorationReport] | None = None) bool[source]

Check if the exploration is converged.

Parameters:
reports

Historical reports

Returns:
converged bool

If the exploration is converged.

failed_ratio(tag=None)[source]
abstract get_candidate_ids(max_nframes: int | None = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters:
max_nframes

The maximal number of frames of candidates.

Returns:
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

print(stage_idx: int, idx_in_stage: int, iter_idx: int) str[source]

Print the report

print_header() str[source]

Print the header of report

record(model_devi: DeviManager)[source]

Record the model deviations of the trajectories

Parameters:
model_deviDeviManager

The class which is responsible for model deviation management. Model deviations is stored as a List[Optional[np.ndarray]], where np.array is a one-dimensional array. List[np.ndarray][ii][jj] is the force model deviation of the jj-th frame of the ii-th trajectory. Model deviations can be List[None], where len(List[None]) is the number of trajectory files.

dpgen2.exploration.report.report_trust_levels_max module
class dpgen2.exploration.report.report_trust_levels_max.ExplorationReportTrustLevelsMax(level_f_lo, level_f_hi, level_v_lo=None, level_v_hi=None, conv_accuracy=0.9)[source]

Bases: ExplorationReportTrustLevels

Methods

clear()

Clear the report

converged([reports])

Check if the exploration is converged.

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(model_devi)

Record the model deviations of the trajectories

accurate_ratio

args

candidate_ratio

doc

failed_ratio

converged(reports: List[ExplorationReport] | None = None) bool[source]

Check if the exploration is converged.

Parameters:
reports

Historical reports

Returns:
converged bool

If the exploration is converged.

static doc() str[source]
get_candidate_ids(max_nframes: int | None = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters:
max_nframes

The maximal number of frames of candidates.

Returns:
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

dpgen2.exploration.report.report_trust_levels_random module
class dpgen2.exploration.report.report_trust_levels_random.ExplorationReportTrustLevelsRandom(level_f_lo, level_f_hi, level_v_lo=None, level_v_hi=None, conv_accuracy=0.9)[source]

Bases: ExplorationReportTrustLevels

Methods

clear()

Clear the report

converged([reports])

Check if the exploration is converged.

get_candidate_ids([max_nframes])

Get indexes of candidate configurations

no_candidate()

If no candidate configuration is found

print(stage_idx, idx_in_stage, iter_idx)

Print the report

print_header()

Print the header of report

record(model_devi)

Record the model deviations of the trajectories

accurate_ratio

args

candidate_ratio

doc

failed_ratio

converged(reports: List[ExplorationReport] | None = None) bool[source]

Check if the exploration is converged.

Parameters:
reports

Historical reports

Returns:
converged bool

If the exploration is converged.

static doc() str[source]
get_candidate_ids(max_nframes: int | None = None) List[List[int]][source]

Get indexes of candidate configurations

Parameters:
max_nframes

The maximal number of frames of candidates.

Returns:
idx: List[List[int]]

The frame indices of candidate configurations. idx[ii][jj] is the frame index of the jj-th candidate of the ii-th trajectory.

dpgen2.exploration.scheduler package
Submodules
dpgen2.exploration.scheduler.convergence_check_stage_scheduler module
class dpgen2.exploration.scheduler.convergence_check_stage_scheduler.ConvergenceCheckStageScheduler(stage: ExplorationStage, selector: ConfSelector, max_numb_iter: int | None = None, fatal_at_max: bool = True)[source]

Bases: StageScheduler

Methods

complete()

Tell if the stage is complete

converged()

Tell if the stage is converged

force_complete()

For complete the stage

get_reports()

Return all exploration reports

next_iteration()

Return the index of the next iteration

plan_next_iteration([report, trajs])

Make the plan for the next iteration of the stage.

reached_max_iteration

complete()[source]

Tell if the stage is complete

Returns:
converged bool

if the stage is complete

converged()[source]

Tell if the stage is converged

Returns:
converged bool

the convergence

force_complete()[source]

For complete the stage

get_reports()[source]

Return all exploration reports

Returns:
reports List[ExplorationReport]

the reports

next_iteration()[source]

Return the index of the next iteration

Returns:
index int

the index of the next iteration

plan_next_iteration(report: ExplorationReport | None = None, trajs: List[Path] | None = None) Tuple[bool, BaseExplorationTaskGroup | None, ConfSelector | None][source]

Make the plan for the next iteration of the stage.

It checks the report of the current and all historical iterations of the stage, and tells if the iterations are converged. If not converged, it will plan the next ieration for the stage.

Parameters:
hist_reportsList[ExplorationReport]

The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.

reportExplorationReport

The exploration report of this iteration.

confsList[Path]

A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

Returns:
stg_complete: bool

If the stage completed. Two cases may happen: 1. converged. 2. when not fatal_at_max, not converged but reached max number of iterations.

task: ExplorationTaskGroup

A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if the stage is converged.

conf_selector: ConfSelector

The configuration selector for the next iteration. Should be None if the stage is converged.

reached_max_iteration()[source]
dpgen2.exploration.scheduler.scheduler module
class dpgen2.exploration.scheduler.scheduler.ExplorationScheduler[source]

Bases: object

The exploration scheduler.

Methods

add_stage_scheduler(stage_scheduler)

Add stage scheduler.

complete()

Tell if all stages are converged.

force_stage_complete()

Force complete the current stage

get_convergence_ratio()

Get the accurate, candidate and failed ratios of the iterations

get_iteration()

Get the index of the current iteration.

get_stage()

Get the index of current stage.

get_stage_of_iterations()

Get the stage index and the index in the stage of iterations.

plan_next_iteration([report, trajs])

Make the plan for the next DPGEN iteration.

print_convergence

print_last_iteration

add_stage_scheduler(stage_scheduler: StageScheduler)[source]

Add stage scheduler.

All added schedulers can be treated as a list (order matters). Only one stage is converged, the iteration goes to the next iteration.

Parameters:
stage_schedulerStageScheduler

The added stage scheduler

complete()[source]

Tell if all stages are converged.

force_stage_complete()[source]

Force complete the current stage

get_convergence_ratio()[source]

Get the accurate, candidate and failed ratios of the iterations

Returns:
accu np.ndarray

The accurate ratio. length of array the same as # iterations.

cand np.ndarray

The candidate ratio. length of array the same as # iterations.

fail np.ndarray

The failed ration. length of array the same as # iterations.

get_iteration()[source]

Get the index of the current iteration.

Iteration index increase when self.plan_next_iteration returns valid expl_task_grp and conf_selector for the next iteration.

get_stage()[source]

Get the index of current stage.

Stage index increases when the previous stage converges. Usually called after self.plan_next_iteration.

get_stage_of_iterations()[source]

Get the stage index and the index in the stage of iterations.

plan_next_iteration(report: ExplorationReport | None = None, trajs: List[Path] | None = None) Tuple[bool, ExplorationTaskGroup | None, ConfSelector | None][source]

Make the plan for the next DPGEN iteration.

Parameters:
reportExplorationReport

The exploration report of this iteration.

trajsList[Path]

A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

Returns:
complete: bool

If all the DPGEN stages complete.

task: ExplorationTaskGroup

A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if converged.

conf_selector: ConfSelector

The configuration selector for the next iteration. Should be None if converged.

print_convergence()[source]
print_last_iteration(print_header=False)[source]
dpgen2.exploration.scheduler.stage_scheduler module
class dpgen2.exploration.scheduler.stage_scheduler.StageScheduler[source]

Bases: ABC

The scheduler for an exploration stage.

Methods

complete()

Tell if the stage is complete

converged()

Tell if the stage is converged

force_complete()

For complete the stage

get_reports()

Return all exploration reports

next_iteration()

Return the index of the next iteration

plan_next_iteration(report, trajs)

Make the plan for the next iteration of the stage.

abstract complete() bool[source]

Tell if the stage is complete

Returns:
converged bool

if the stage is complete

abstract converged() bool[source]

Tell if the stage is converged

Returns:
converged bool

the convergence

abstract force_complete()[source]

For complete the stage

abstract get_reports() List[ExplorationReport][source]

Return all exploration reports

Returns:
reports List[ExplorationReport]

the reports

abstract next_iteration() int[source]

Return the index of the next iteration

Returns:
index int

the index of the next iteration

abstract plan_next_iteration(report: ExplorationReport, trajs: List[Path]) Tuple[bool, ExplorationTaskGroup, ConfSelector][source]

Make the plan for the next iteration of the stage.

It checks the report of the current and all historical iterations of the stage, and tells if the iterations are converged. If not converged, it will plan the next ieration for the stage.

Parameters:
hist_reportsList[ExplorationReport]

The historical exploration report of the stage. If this is the first iteration of the stage, this list is empty.

reportExplorationReport

The exploration report of this iteration.

confsList[Path]

A list of configurations generated during the exploration. May be used to generate new configurations for the next iteration.

Returns:
stg_complete: bool

If the stage completed. Two cases may happen: 1. converged. 2. when not fatal_at_max, not converged but reached max number of iterations.

task: ExplorationTaskGroup

A ExplorationTaskGroup defining the exploration of the next iteration. Should be None if the stage is converged.

conf_selector: ConfSelector

The configuration selector for the next iteration. Should be None if the stage is converged.

dpgen2.exploration.selector package
Submodules
dpgen2.exploration.selector.conf_filter module
class dpgen2.exploration.selector.conf_filter.ConfFilter[source]

Bases: ABC

Methods

check(coords, cell, atom_types, nopbc)

Check if the configuration is valid.

abstract check(coords: ndarray, cell: ndarray, atom_types: ndarray, nopbc: bool) bool[source]

Check if the configuration is valid.

Parameters:
coordsnumpy.array

The coordinates, numpy array of shape natoms x 3

cellnumpy.array

The cell tensor. numpy array of shape 3 x 3

atom_typesnumpy.array

The atom types. numpy array of shape natoms

nopbcbool

If no periodic boundary condition.

Returns:
validbool

True if the configuration is a valid configuration, else False.

class dpgen2.exploration.selector.conf_filter.ConfFilters[source]

Bases: object

Methods

add

check

add(conf_filter: ConfFilter) ConfFilters[source]
check(conf: System) System[source]
dpgen2.exploration.selector.conf_selector module
class dpgen2.exploration.selector.conf_selector.ConfSelector[source]

Bases: ABC

Select configurations from trajectory and model deviation files.

Methods

select

abstract select(trajs: List[Path], model_devis: List[Path], type_map: List[str] | None = None) Tuple[List[Path], ExplorationReport][source]
dpgen2.exploration.selector.conf_selector_frame module
class dpgen2.exploration.selector.conf_selector_frame.ConfSelectorFrames(traj_render: TrajRender, report: ExplorationReport, max_numb_sel: int | None = None, conf_filters: ConfFilters | None = None)[source]

Bases: ConfSelector

Select frames from trajectories as confs.

Parameters: trust_level: TrustLevel

The trust level

conf_filter: ConfFilters

The configuration filter

Methods

select(trajs, model_devis[, type_map])

Select configurations

select(trajs: List[Path], model_devis: List[Path], type_map: List[str] | None = None) Tuple[List[Path], ExplorationReport][source]

Select configurations

Parameters:
trajsList[Path]

A list of Path to trajectory files generated by LAMMPS

model_devisList[Path]

A list of Path to model deviation files generated by LAMMPS. Format: each line has 7 numbers they are used as # frame_id md_v_max md_v_min md_v_mean md_f_max md_f_min md_f_mean where md stands for model deviation, v for virial and f for force

type_mapList[str]

The type_map of the systems

Returns:
confsList[Path]

The selected confgurations, stored in a folder in deepmd/npy format, can be parsed as dpdata.MultiSystems. The list only has one item.

reportExplorationReport

The exploration report recoding the status of the exploration.

dpgen2.exploration.task package
Subpackages
dpgen2.exploration.task.calypso package
Submodules
dpgen2.exploration.task.calypso.caly_input module
dpgen2.exploration.task.calypso.caly_input.make_calypso_input(numb_of_species: int, name_of_atoms: List[str], atomic_number: List[int], numb_of_atoms: List[int], distance_of_ions, pop_size: int = 30, max_step: int = 5, system_name: str = 'CALYPSO', numb_of_formula: List[int] = [1, 1], pressure: float = 0.001, fmax: float = 0.01, volume: float = 0, ialgo: int = 2, pso_ratio: float = 0.6, icode: int = 15, numb_of_lbest: int = 4, numb_of_local_optim: int = 4, command: str = 'sh submit.sh', max_time: int = 9000, gen_type: int = 1, pick_up: bool = False, pick_step: int = 1, parallel: bool = False, split: bool = True, spec_space_group: List[int] = [2, 230], vsc: bool = False, ctrl_range: List[List[int]] = [[1, 10]], max_numb_atoms: int = 100, **kwargs)[source]
dpgen2.exploration.task.lmp package
Submodules
dpgen2.exploration.task.lmp.lmp_input module
dpgen2.exploration.task.lmp.lmp_input.make_lmp_input(conf_file: str, ensemble: str, graphs: List[str], nsteps: int, dt: float, neidelay: int | None, trj_freq: int, mass_map: List[float], temp: float, tau_t: float = 0.1, pres: float | None = None, tau_p: float = 0.5, use_clusters: bool = False, relative_f_epsilon: float | None = None, relative_v_epsilon: float | None = None, pka_e: float | None = None, ele_temp_f: float | None = None, ele_temp_a: float | None = None, nopbc: bool = False, max_seed: int = 1000000, deepmd_version='2.0', trj_seperate_files=True)[source]
Submodules
dpgen2.exploration.task.caly_task_group module
class dpgen2.exploration.task.caly_task_group.CalyTaskGroup[source]

Bases: ExplorationTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the CALYPSO task group.

set_params(numb_of_species, name_of_atoms, ...)

Set calypso parameters

clear

make_task() ExplorationTaskGroup[source]

Make the CALYPSO task group.

Returns:
task_grp: ExplorationTaskGroup

Return one calypso task group.

set_params(numb_of_species, name_of_atoms, atomic_number, numb_of_atoms, distance_of_ions, pop_size: int = 30, max_step: int = 5, system_name: str = 'CALYPSO', numb_of_formula: List[int] = [1, 1], pressure: float = 0.001, fmax: float = 0.01, volume: float = 0, ialgo: int = 2, pso_ratio: float = 0.6, icode: int = 15, numb_of_lbest: int = 4, numb_of_local_optim: int = 4, command: str = 'sh submit.sh', max_time: int = 9000, gen_type: int = 1, pick_up: bool = False, pick_step: int = 1, parallel: bool = False, split: bool = True, spec_space_group: List[int] = [2, 230], vsc: bool = True, ctrl_range: List[List[int]] = [[1, 10]], max_numb_atoms: int = 100)[source]

Set calypso parameters

dpgen2.exploration.task.conf_sampling_task_group module
class dpgen2.exploration.task.conf_sampling_task_group.ConfSamplingTaskGroup[source]

Bases: ExplorationTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the task group.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

clear

set_conf(conf_list: List[str], n_sample: int | None = None, random_sample: bool = False)[source]

Set the configurations of exploration

Parameters:
conf_list str

A list of file contents

n_sample int

Number of samples drawn from the conf list each time make_task is called. If set to None, n_sample is set to length of the conf_list.

random_sample bool

If true the confs are randomly sampled, otherwise are consecutively sampled from the conf_list

dpgen2.exploration.task.customized_lmp_template_task_group module
class dpgen2.exploration.task.customized_lmp_template_task_group.CustomizedLmpTemplateTaskGroup[source]

Bases: ConfSamplingTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the task group.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

set_lmp(numb_models, custom_shell_commands)

Set lammps task.

clear

make_task() CustomizedLmpTemplateTaskGroup[source]

Make the task group.

set_lmp(numb_models: int, custom_shell_commands: List[str], revisions: dict = {}, traj_freq: int = 10, input_lmp_conf_name: str = 'conf.lmp', input_lmp_tmpl_name: str = 'in.lammps', input_plm_tmpl_name: str | None = None, input_extra_files: List[str] = [], output_dir_pattern: str | List[str] = '*', output_lmp_conf_name: str = 'conf.lmp', output_lmp_tmpl_name: str = 'in.lammps', output_plm_tmpl_name: str | None = None) None[source]

Set lammps task.

Parameters:
numb_modelsint

Number of models

custom_shell_commandsstr

Customized shell commands to be run for each configuration. The commands require input_lmp_conf_name as input conf file, input_lmp_tmpl_name and input_plm_tmpl_name as templates, and input_extra_files as extra input files. By running the commands a series folders in pattern output_dir_pattern are supposed to be generated, and each folder is supposed to contain a configuration file output_lmp_conf_name, a lammps template file output_lmp_tmpl_name and a plumed template file output_plm_tmpl_name.

revisionsdict

Revision dictionary. Provided in {key: [enumerated values]} format

traj_freqint

Frequency along trajectory of checking model deviation

input_lmp_conf_namestr

Input conf file name for the shell commands.

input_lmp_tmpl_namestr

Template file name of lammps input

input_plm_tmpl_namestr

Template file name of the plumed input

input_extra_filesList[str]

Extra files that may be needed to execute the shell commands

output_dir_patternUnion[str, List[str]]

Pattern of resultant folders generated by the shell commands.

output_lmp_conf_namestr

Generated conf file name.

output_lmp_tmpl_namestr

Generated lmp input file name.

output_plm_tmpl_namestr

Generated plm input file name.

dpgen2.exploration.task.lmp_template_task_group module
class dpgen2.exploration.task.lmp_template_task_group.LmpTemplateTaskGroup[source]

Bases: ConfSamplingTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the task group.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

clear

make_cont

set_lmp

make_cont(templates: list, revisions: dict)[source]
make_task() LmpTemplateTaskGroup[source]

Make the task group.

set_lmp(numb_models: int, lmp_template_fname: str, plm_template_fname: str | None = None, revisions: dict = {}, traj_freq: int = 10) None[source]
dpgen2.exploration.task.lmp_template_task_group.find_only_one_key(lmp_lines, key)[source]
dpgen2.exploration.task.lmp_template_task_group.revise_by_keys(lmp_lines, keys, values)[source]
dpgen2.exploration.task.lmp_template_task_group.revise_lmp_input_dump(lmp_lines, trj_freq)[source]
dpgen2.exploration.task.lmp_template_task_group.revise_lmp_input_model(lmp_lines, task_model_list, trj_freq, deepmd_version='1')[source]
dpgen2.exploration.task.lmp_template_task_group.revise_lmp_input_plm(lmp_lines, in_plm, out_plm='output.plumed')[source]
dpgen2.exploration.task.make_task_group_from_config module
dpgen2.exploration.task.make_task_group_from_config.caly_normalize(data)[source]
dpgen2.exploration.task.make_task_group_from_config.caly_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.caly_task_grp_args()[source]
dpgen2.exploration.task.make_task_group_from_config.config_strip_confidx(config)[source]
dpgen2.exploration.task.make_task_group_from_config.customized_lmp_template_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.lmp_normalize(data)[source]
dpgen2.exploration.task.make_task_group_from_config.lmp_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.lmp_template_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.make_calypso_task_group_from_config(config)[source]
dpgen2.exploration.task.make_task_group_from_config.make_lmp_task_group_from_config(numb_models, mass_map, config)[source]
dpgen2.exploration.task.make_task_group_from_config.make_task_group_from_config(numb_models, mass_map, config)[source]
dpgen2.exploration.task.make_task_group_from_config.npt_task_group_args()[source]
dpgen2.exploration.task.make_task_group_from_config.variant_task_group()[source]
dpgen2.exploration.task.npt_task_group module
class dpgen2.exploration.task.npt_task_group.NPTTaskGroup[source]

Bases: ConfSamplingTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the LAMMPS task group.

set_conf(conf_list[, n_sample, random_sample])

Set the configurations of exploration

set_md(numb_models, mass_map, temps[, ...])

Set MD parameters

clear

make_task() NPTTaskGroup[source]

Make the LAMMPS task group.

Returns:
task_grp: ExplorationTaskGroup

The returned lammps task group. The number of tasks is nconf*nT*nP. nconf is set by n_sample parameter of set_conf. nT and nP are lengths of the temps and press parameters of set_md.

set_md(numb_models, mass_map, temps: List[float], press: List[float] | None = None, ens: str = 'npt', dt: float = 0.001, nsteps: int = 1000, trj_freq: int = 10, tau_t: float = 0.1, tau_p: float = 0.5, pka_e: float | None = None, neidelay: int | None = None, no_pbc: bool = False, use_clusters: bool = False, relative_f_epsilon: float | None = None, relative_v_epsilon: float | None = None, ele_temp_f: float | None = None, ele_temp_a: float | None = None)[source]

Set MD parameters

dpgen2.exploration.task.stage module
class dpgen2.exploration.task.stage.ExplorationStage[source]

Bases: object

The exploration stage.

Methods

add_task_group(grp)

Add an exploration group

clear()

Clear all exploration group.

make_task()

Make the LAMMPS task group.

add_task_group(grp: ExplorationTaskGroup)[source]

Add an exploration group

Parameters:
grpExplorationTaskGroup

The added exploration task group

clear()[source]

Clear all exploration group.

make_task() BaseExplorationTaskGroup[source]

Make the LAMMPS task group.

Returns:
task_grp: BaseExplorationTaskGroup

The returned lammps task group. The number of tasks is equal to the summation of task groups defined by all the exploration groups added to the stage.

dpgen2.exploration.task.task module
class dpgen2.exploration.task.task.ExplorationTask[source]

Bases: object

Define the files needed by an exploration task.

Examples

>>> # this example dumps all files needed by the task.
>>> files = exploration_task.files()
... for file_name, file_content in files.items():
...     with open(file_name, 'w') as fp:
...         fp.write(file_content)

Methods

add_file(fname, fcont)

Add file to the task

files()

Get all files for the task.

add_file(fname: str, fcont: str)[source]

Add file to the task

Parameters:
fnamestr

The name of the file

fcontstr

The content of the file.

files() Dict[source]

Get all files for the task.

Returns:
filesdict

The dict storing all files for the task. The file name is a key of the dict, and the file content is the corresponding value.

dpgen2.exploration.task.task_group module
class dpgen2.exploration.task.task_group.BaseExplorationTaskGroup[source]

Bases: Sequence

A group of exploration tasks. Implemented as a list of ExplorationTask.

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

clear

add_group(group: ExplorationTaskGroup)[source]

Add another group to the group.

add_task(task: ExplorationTask)[source]

Add one task to the group.

clear() None[source]
property task_list: List[ExplorationTask]

Get the list of ExplorationTask

class dpgen2.exploration.task.task_group.ExplorationTaskGroup[source]

Bases: ABC, BaseExplorationTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

make_task()

Make the task group.

clear

abstract make_task() ExplorationTaskGroup[source]

Make the task group.

class dpgen2.exploration.task.task_group.FooTask(conf_name='conf.lmp', conf_cont='', inpu_name='in.lammps', inpu_cont='')[source]

Bases: ExplorationTask

Methods

add_file(fname, fcont)

Add file to the task

files()

Get all files for the task.

class dpgen2.exploration.task.task_group.FooTaskGroup(numb_task)[source]

Bases: BaseExplorationTaskGroup

Attributes:
task_list

Get the list of ExplorationTask

Methods

add_group(group)

Add another group to the group.

add_task(task)

Add one task to the group.

count(value)

index(value, [start, [stop]])

Raises ValueError if the value is not present.

clear

property task_list

Get the list of ExplorationTask

dpgen2.flow package
Submodules
dpgen2.flow.dpgen_loop module
class dpgen2.flow.dpgen_loop.ConcurrentLearning(name: str, block_op: ConcurrentLearningBlock, step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
init_keys
input_artifacts
input_parameters
loop_keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property init_keys
property input_artifacts
property input_parameters
property loop_keys
property output_artifacts
property output_parameters
class dpgen2.flow.dpgen_loop.ConcurrentLearningLoop(name: str, block_op: ConcurrentLearningBlock, step_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
class dpgen2.flow.dpgen_loop.MakeBlockId(*args, **kwargs)[source]

Bases: OP

Attributes:
key
workflow_name

Methods

execute(ip)

Run the OP

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

class dpgen2.flow.dpgen_loop.SchedulerWrapper(*args, **kwargs)[source]

Bases: OP

Attributes:
key
workflow_name

Methods

execute(ip)

Run the OP

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.flow.dpgen_loop.make_block_optional_parameter(cl_optional_parameter)[source]
dpgen2.fp package
Submodules
dpgen2.fp.abacus module
class dpgen2.fp.abacus.FpOpAbacusInputs(input_file: str | Path, pp_files: Dict[str, str | Path], element_mass: Dict[str, float] | None = None, kpt_file: str | Path | None = None, orb_files: Dict[str, str | Path] | None = None, deepks_descriptor: str | Path | None = None, deepks_model: str | Path | None = None)[source]

Bases: AbacusInputs

Methods

get_mass(element_list)

Get the mass of elements.

read_inputf(inputf)

Read INPUT and transfer to a dict.

write_deepks()

Check if INPUT is a deepks job, if yes, will return the deepks descriptor file name, else will return None.

write_pporb(element_list)

Based on element list, write the pp/orb files, and return a list of the filename.

args

get_deepks_descriptor

get_deepks_model

get_input

get_orb

get_pp

set_deepks_descriptor

set_deepks_model

set_input

set_mass

set_orb

set_pp

write_input

write_kpt

static args()[source]
class dpgen2.fp.abacus.PrepFpOpAbacus(*args, **kwargs)[source]

Bases: OP

Attributes:
key
workflow_name

Methods

execute(ip)

Run the OP

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

class dpgen2.fp.abacus.RunFpOpAbacus(*args, **kwargs)[source]

Bases: OP

Attributes:
key
workflow_name

Methods

execute(ip)

Run the OP

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

args

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

static args()[source]
execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.fp.deepmd module

Prep and Run Gaussian tasks.

class dpgen2.fp.deepmd.DeepmdInputs(**kwargs: Any)[source]

Bases: object

Methods

args

static args() List[Argument][source]
class dpgen2.fp.deepmd.PrepDeepmd(*args, **kwargs)[source]

Bases: PrepFp

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, inputs)

Define how one Deepmd task is prepared.

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

prep_task(conf_frame: System, inputs)[source]

Define how one Deepmd task is prepared.

Parameters:
conf_framedpdata.System

One frame of configuration in the dpdata format.

inputsstr or dict

This parameter is useless in deepmd.

class dpgen2.fp.deepmd.RunDeepmd(*args, **kwargs)[source]

Bases: RunFp

Attributes:
key
workflow_name

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a Deepmd task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a Deepmd task.

run_task(teacher_model_path, out, log)

Defines how one FP task runs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

static args() List[Argument][source]

The argument definition of the run_task method.

Returns:
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

input_files() List[str][source]

The mandatory input files to run a Deepmd task.

Returns:
files: List[str]

A list of madatory input files names.

optional_input_files() List[str][source]

The optional input files to run a Deepmd task.

Returns:
files: List[str]

A list of optional input files names.

run_task(teacher_model_path: BinaryFileInput, out: str, log: str) Tuple[str, str][source]

Defines how one FP task runs

Parameters:
commandstr

The command of running Deepmd task

outstr

The name of the output data file.

Returns:
out_name: str

The file name of the output data in the dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.gaussian module

Prep and Run Gaussian tasks.

class dpgen2.fp.gaussian.GaussianInputs(**kwargs: Any)[source]

Bases: object

Methods

args()

The arguments of the GaussianInputs class.

static args() List[Argument][source]

The arguments of the GaussianInputs class.

class dpgen2.fp.gaussian.PrepGaussian(*args, **kwargs)[source]

Bases: PrepFp

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, inputs)

Define how one Gaussian task is prepared.

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

prep_task(conf_frame: System, inputs: GaussianInputs)[source]

Define how one Gaussian task is prepared.

Parameters:
conf_framedpdata.System

One frame of configuration in the dpdata format.

inputsGaussianInputs

The GaussianInputs object handels all other input files of the task.

class dpgen2.fp.gaussian.RunGaussian(*args, **kwargs)[source]

Bases: RunFp

Attributes:
key
workflow_name

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a Gaussian task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a Gaussian task.

run_task(command, out)

Defines how one FP task runs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

static args() List[Argument][source]

The argument definition of the run_task method.

Returns:
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

input_files() List[str][source]

The mandatory input files to run a Gaussian task.

Returns:
files: List[str]

A list of madatory input files names.

optional_input_files() List[str][source]

The optional input files to run a Gaussian task.

Returns:
files: List[str]

A list of optional input files names.

run_task(command: str, out: str) Tuple[str, str][source]

Defines how one FP task runs

Parameters:
commandstr

The command of running gaussian task

outstr

The name of the output data file.

Returns:
out_name: str

The file name of the output data in the dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.prep_fp module
class dpgen2.fp.prep_fp.PrepFp(*args, **kwargs)[source]

Bases: OP, ABC

Prepares the working directories for first-principles (FP) tasks.

A list of (same length as ip[“confs”]) working directories containing all files needed to start FP tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, inputs)

Define how one FP task is prepared.

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • config : (dict) Should have config[‘inputs’], which defines the input files of the FP task.

  • confs : (Artifact(List[Path])) Configurations for the FP tasks. Stored in folders as deepmd/npy format. Can be parsed as dpdata.MultiSystems.

Returns:
opdict

Output dict with components:

  • task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the FP. The order fo the Paths should be consistent with op[“task_names”]

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

abstract prep_task(conf_frame: System, inputs: Any)[source]

Define how one FP task is prepared.

Parameters:
conf_framedpdata.System

One frame of configuration in the dpdata format.

inputsAny

The class object handels all other input files of the task. For example, pseudopotential file, k-point file and so on.

dpgen2.fp.run_fp module
class dpgen2.fp.run_fp.RunFp(*args, **kwargs)[source]

Bases: OP, ABC

Execute a first-principles (FP) task.

A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The FP command is exectuted from directory task_name. The op[“labeled_data”] in “deepmd/npy” format (HF5 in the future) provided by dpdata will be created.

Attributes:
key
workflow_name

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a FP task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a FP task.

run_task(**kwargs)

Defines how one FP task runs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

abstract static args() List[Argument][source]

The argument definition of the run_task method.

Returns:
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • config: (dict) The config of FP task. Should have config[‘run’], which defines the runtime configuration of the FP task.

  • task_name: (str) The name of task.

  • task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepFp.

Returns:
Output dict with components:
  • log: (Artifact(Path)) The log file of FP.
  • labeled_data: (Artifact(Path)) The path to the labeled data in “deepmd/npy” format provided by dpdata.
Raises:
TransientError

On the failure of FP execution.

FatalError

When mandatory files are not found.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

abstract input_files() List[str][source]

The mandatory input files to run a FP task.

Returns:
files: List[str]

A list of madatory input files names.

classmethod normalize_config(data: Dict = {}, strict: bool = True) Dict[source]

Normalized the argument.

Parameters:
dataDict

The input dict of arguments.

strictbool

Strictly check the arguments.

Returns:
data: Dict

The normalized arguments.

abstract optional_input_files() List[str][source]

The optional input files to run a FP task.

Returns:
files: List[str]

A list of optional input files names.

abstract run_task(**kwargs) Tuple[str, str][source]

Defines how one FP task runs

Parameters:
**kwargs

Keyword args defined by the developer. The fp/run_config session of the input file will be passed to this function.

Returns:
out_name: str

The file name of the output data. Should be in dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.vasp module
class dpgen2.fp.vasp.PrepVasp(*args, **kwargs)[source]

Bases: PrepFp

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

prep_task(conf_frame, vasp_inputs)

Define how one Vasp task is prepared.

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

prep_task(conf_frame: System, vasp_inputs: VaspInputs)[source]

Define how one Vasp task is prepared.

Parameters:
conf_framedpdata.System

One frame of configuration in the dpdata format.

vasp_inputsVaspInputs

The VaspInputs object handels all other input files of the task.

class dpgen2.fp.vasp.RunVasp(*args, **kwargs)[source]

Bases: RunFp

Attributes:
key
workflow_name

Methods

args()

The argument definition of the run_task method.

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

input_files()

The mandatory input files to run a vasp task.

normalize_config([data, strict])

Normalized the argument.

optional_input_files()

The optional input files to run a vasp task.

run_task(command, out, log)

Defines how one FP task runs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

static args()[source]

The argument definition of the run_task method.

Returns:
arguments: List[dargs.Argument]

List of dargs.Argument defines the arguments of run_task method.

input_files() List[str][source]

The mandatory input files to run a vasp task.

Returns:
files: List[str]

A list of madatory input files names.

optional_input_files() List[str][source]

The optional input files to run a vasp task.

Returns:
files: List[str]

A list of optional input files names.

run_task(command: str, out: str, log: str) Tuple[str, str][source]

Defines how one FP task runs

Parameters:
commandstr

The command of running vasp task

outstr

The name of the output data file.

logstr

The name of the log file

Returns:
out_name: str

The file name of the output data in the dpdata.LabeledSystem format.

log_name: str

The file name of the log.

dpgen2.fp.vasp_input module
class dpgen2.fp.vasp_input.VaspInputs(kspacing: float | List[float], incar: str, pp_files: Dict[str, str], kgamma: bool = True)[source]

Bases: object

Attributes:
incar_template
potcars

Methods

args

incar_from_file

make_kpoints

make_potcar

normalize_config

potcars_from_file

static args()[source]
incar_from_file(fname: str)[source]
property incar_template
make_kpoints(box: ndarray) str[source]
make_potcar(atom_names) str[source]
static normalize_config(data={}, strict=True)[source]
property potcars
potcars_from_file(dict_fnames: Dict[str, str])[source]
dpgen2.fp.vasp_input.make_kspacing_kpoints(box, kspacing, kgamma)[source]
dpgen2.op package
Submodules
dpgen2.op.collect_data module
class dpgen2.op.collect_data.CollectData(*args, **kwargs)[source]

Bases: OP

Collect labeled data and add to the iteration dataset.

After running FP tasks, the labeled data are scattered in task directories. This OP collect the labeled data in one data directory and add it to the iteration data. The data generated by this iteration will be place in ip[“name”] subdirectory of the iteration data directory.

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

default_optional_parameter = {'mixed_type': False}
execute(ip: OPIO) OPIO[source]

Execute the OP. This OP collect data scattered in directories given by ip[‘labeled_data’] in to one dpdata.Multisystems and store it in a directory named name. This directory is appended to the list iter_data.

Parameters:
ipdict

Input dict with components:

  • name: (str) The name of this iteration. The data generated by this iteration will be place in a sub-directory of name.

  • labeled_data: (Artifact(List[Path])) The paths of labeled data generated by FP tasks of the current iteration.

  • iter_data: (Artifact(List[Path])) The data paths previous iterations.

Returns:
Any

Output dict with components: - iter_data: (Artifact(List[Path])) The data paths of previous and the current iteration data.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.collect_run_caly module
class dpgen2.op.collect_run_caly.CollRunCaly(*args, **kwargs)[source]

Bases: OP

Execute CALYPSO to generate structures in work_dir.

Changing the work directory into task_name. All input files have been copied or symbol linked to this directory task_name by PrepCalyInput. The CALYPSO command is exectuted from directory task_name. The caly.log and the work_dir will be stored in op[“log”] and op[“work_dir”], respectively.

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

calypso_args

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

normalize_config

register_output_artifact

superfunction

static calypso_args()[source]
execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • config: (dict) The config of calypso task to obtain the command of calypso.

  • task_name: (str) The name of the task (calypso_task.{idx}).

  • input_file: (Path) The input file of the task (input.dat).

  • step: (Path) The step file from last calypso run

  • results: (Path) The results dir from last calypso run

  • opt_results_dir: (Path) The results dir contains POSCAR* CONTCAR* OUTCAR* from last calypso run

  • qhull_input: (Path) qhull input file test_qconvex.in

Returns:
Any

Output dict with components: - poscar_dir: (Path) The dir contains POSCAR*.

  • task_name: (str) The name of the task (calypso_task.{idx}).

  • input_file: (Path) The input file of the task (input.dat).

  • step: (Path) The step file.

  • results: (Path) The results dir.

  • qhull_input: (Path) qhull input file.

Raises:
TransientError

On the failure of CALYPSO execution. Resubmit rule should be clear.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

static normalize_config(data={})[source]
dpgen2.op.collect_run_caly.config_args()
dpgen2.op.collect_run_caly.get_value_from_inputdat(filename)[source]
dpgen2.op.collect_run_caly.prep_last_calypso_file(step, results, opt_results_dir, qhull_input, vsc)[source]
dpgen2.op.md_settings module
class dpgen2.op.md_settings.MDSettings(ens: str, dt: float, nsteps: int, trj_freq: int, temps: List[float] | None = None, press: List[float] | None = None, tau_t: float = 0.1, tau_p: float = 0.5, pka_e: float | None = None, neidelay: int | None = None, no_pbc: bool = False, use_clusters: bool = False, relative_epsilon: float | None = None, relative_v_epsilon: float | None = None, ele_temp_f: float | None = None, ele_temp_a: float | None = None)[source]

Bases: object

Methods

to_str

to_str() str[source]
dpgen2.op.prep_caly_input module
class dpgen2.op.prep_caly_input.PrepCalyInput(*args, **kwargs)[source]

Bases: OP

Prepare the working directories and input file for generating structures.

A calypso input file will be generated according to the given parameters (defined by ip[“caly_inputs”]). The artifact will be return (ip[input_files]). The name of directory is ip[“task_names”].

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components: - caly_task_grp : (BigParameter()) Definitions for CALYPSO input file.

Returns:
opdict

Output dict with components:

  • task_names: (List[str]) The name of CALYPSO tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • input_dat_files: (Artifact(List[Path])) The parepared working paths of the task containing input files (input.dat and calypso_run_opt.py) needed to generate structures by CALYPSO and make structure optimization with DP model.

  • caly_run_opt_files: (Artifact(List[Path]))

  • caly_check_opt_files: (Artifact(List[Path]))

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.prep_dp_train module
class dpgen2.op.prep_dp_train.PrepDPTrain(*args, **kwargs)[source]

Bases: OP

Prepares the working directories for DP training tasks.

A list of (numb_models) working directories containing all files needed to start training tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • template_script: (str or List[str]) A template of the training script. Can be a str or List[str]. In the case of str, all training tasks share the same training input template, the only difference is the random number used to initialize the network parameters. In the case of List[str], one training task uses one template from the list. The random numbers used to initialize the network parameters are differnt. The length of the list should be the same as numb_models.

  • numb_models: (int) Number of DP models to train.

Returns:
opdict

Output dict with components:

  • task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. The order fo the Paths should be consistent with op[“task_names”]

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.prep_lmp module
dpgen2.op.prep_lmp.PrepExplorationTaskGroup

alias of PrepLmp

class dpgen2.op.prep_lmp.PrepLmp(*args, **kwargs)[source]

Bases: OP

Prepare the working directories for LAMMPS tasks.

A list of working directories (defined by ip[“task”]) containing all files needed to start LAMMPS tasks will be created. The paths of the directories will be returned as op[“task_paths”]. The identities of the tasks are returned as op[“task_names”].

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components: - lmp_task_grp : (BigParameter(Path)) Can be pickle loaded as a ExplorationTaskGroup. Definitions for LAMMPS tasks

Returns:
opdict

Output dict with components:

  • task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different.

  • task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the LAMMPS simulation. The order fo the Paths should be consistent with op[“task_names”]

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.prep_run_dp_optim module
class dpgen2.op.prep_run_dp_optim.PrepRunDPOptim(*args, **kwargs)[source]

Bases: OP

Prepare the working directories and input file for structure optimization with DP.

POSCAR_*, model.000.pb, calypso_run_opt.py and calypso_check_opt.py will be copied or symlink to each optimization directory from ip[“work_path”], according to the popsize ip[“caly_input”][“PopSize”]. The paths of these optimization directory will be returned as op[“optim_paths”].

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components: - config: (dict) The config of calypso task to obtain the command of calypso. - task_name : (str) - finished : (str) - cnt_num : (int) - poscar_dir : (Path) - models_dir : (Path) - caly_run_opt_file : (Path) - caly_check_opt_file : (Path)

Returns:
opdict

Output dict with components:

  • task_name: (str)

  • optim_results_dir: (List[str])

  • traj_results: (Artifact(List[Path]))

  • caly_run_opt_file : (Path)

  • caly_check_opt_file : (Path)

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.run_caly_model_devi module
class dpgen2.op.run_caly_model_devi.RunCalyModelDevi(*args, **kwargs)[source]

Bases: OP

calculate model deviaion of trajectories structures.

Structure optimization will be executed in optim_path. The trajectory will be stored in files op[“traj”] and op[“model_devi”], respectively.

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components: - type_map: (List[str]) The type map of elements. - task_name: (str) The name of the task. - traj_dirs: (Artifact(List[Path])) The List of paths that contains trajectory files. - models: (Artifact(List[Path])) The frozen model to estimate the model deviation.

Returns:
Any

Output dict with components: - task_name: (str) The name of task. - traj: (Artifact(List[Path])) The output trajectory. - model_devi: (Artifact(List[Path])) The model deviation. The order of recorded model deviations should be consistent with the order of frames in traj.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

dpgen2.op.run_caly_model_devi.atoms2lmpdump(atoms, struc_idx, type_map, ignore=False)[source]

down triangle cell can be obtained from cell params: a, b, c, alpha, beta, gamma. cell = cellpar_to_cell([a, b, c, alpha, beta, gamma]) lx, ly, lz = cell[0][0], cell[1][1], cell[2][2] xy, xz, yz = cell[1][0], cell[2][0], cell[2][1] (lx,ly,lz) = (xhi-xlo,yhi-ylo,zhi-zlo) xlo_bound = xlo + MIN(0.0,xy,xz,xy+xz) xhi_bound = xhi + MAX(0.0,xy,xz,xy+xz) ylo_bound = ylo + MIN(0.0,yz) yhi_bound = yhi + MAX(0.0,yz) zlo_bound = zlo zhi_bound = zhi

ref: https://docs.lammps.org/Howto_triclinic.html

dpgen2.op.run_caly_model_devi.parse_traj(traj_file)[source]
dpgen2.op.run_caly_model_devi.write_model_devi_out(devi: ndarray, fname: str | Path, header: str = '')[source]
dpgen2.op.run_dp_train module
class dpgen2.op.run_dp_train.RunDPTrain(*args, **kwargs)[source]

Bases: OP

Execute a DP training task. Train and freeze a DP model.

A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The DeePMD-kit training and freezing commands are exectuted from directory task_name.

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

decide_init_model

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

normalize_config

register_output_artifact

skip_training

superfunction

training_args

write_data_to_input_script

write_other_to_input_script

static decide_init_model(config, init_model, init_data, iter_data, mixed_type=False)[source]
default_optional_parameter = {'finetune_mode': 'no', 'mixed_type': False}
execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • config: (dict) The config of training task. Check RunDPTrain.training_args for definitions.

  • task_name: (str) The name of training task.

  • task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepDPTrain.

  • init_model: (Artifact(Path)) A frozen model to initialize the training.

  • init_data: (Artifact(List[Path])) Initial training data.

  • iter_data: (Artifact(List[Path])) Training data generated in the DPGEN iterations.

Returns:
Any

Output dict with components: - script: (Artifact(Path)) The training script. - model: (Artifact(Path)) The trained frozen model. - lcurve: (Artifact(Path)) The learning curve file. - log: (Artifact(Path)) The log file of training.

Raises:
FatalError

On the failure of training or freezing. Human intervention needed.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

static normalize_config(data={})[source]
static skip_training(work_dir, train_dict, init_model, iter_data, finetune_mode)[source]
static training_args()[source]
static write_data_to_input_script(idict: dict, init_data: List[Path], iter_data: List[Path], auto_prob_str: str = 'prob_sys_size', major_version: str = '1')[source]
static write_other_to_input_script(idict, config, do_init_model, major_version: str = '1')[source]
dpgen2.op.run_dp_train.config_args()
dpgen2.op.run_lmp module
class dpgen2.op.run_lmp.RunLmp(*args, **kwargs)[source]

Bases: OP

Execute a LAMMPS task.

A working directory named task_name is created. All input files are copied or symbol linked to directory task_name. The LAMMPS command is exectuted from directory task_name. The trajectory and the model deviation will be stored in files op[“traj”] and op[“model_devi”], respectively.

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

lmp_args

normalize_config

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • config: (dict) The config of lmp task. Check RunLmp.lmp_args for definitions.

  • task_name: (str) The name of the task.

  • task_path: (Artifact(Path)) The path that contains all input files prepareed by PrepLmp.

  • models: (Artifact(List[Path])) The frozen model to estimate the model deviation. The first model with be used to drive molecular dynamics simulation.

Returns:
Any

Output dict with components: - log: (Artifact(Path)) The log file of LAMMPS. - traj: (Artifact(Path)) The output trajectory. - model_devi: (Artifact(Path)) The model deviation. The order of recorded model deviations should be consistent with the order of frames in traj.

Raises:
TransientError

On the failure of LAMMPS execution. Handle different failure cases? e.g. loss atoms.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

static lmp_args()[source]
static normalize_config(data={})[source]
dpgen2.op.run_lmp.add_teacher_model(lmp_input_name: str)[source]
dpgen2.op.run_lmp.config_args()
dpgen2.op.run_lmp.find_only_one_key(lmp_lines, key)[source]
dpgen2.op.run_lmp.randomly_shuffle_models(lmp_input_name: str)[source]
dpgen2.op.select_confs module
class dpgen2.op.select_confs.SelectConfs(*args, **kwargs)[source]

Bases: OP

Select configurations from exploration trajectories for labeling.

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

validate_trajs

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • conf_selector: (ConfSelector) Configuration selector.

  • type_map: (List[str]) The type map.

  • trajs: (Artifact(List[Path])) The trajectories generated in the exploration.

  • model_devis: (Artifact(List[Path])) The file storing the model deviation of the trajectory. The order of model deviation storage is consistent with that of the trajectories. The order of frames of one model deviation storage is also consistent with tat of the corresponding trajectory.

Returns:
Any

Output dict with components: - report: (ExplorationReport) The report on the exploration. - conf: (Artifact(List[Path])) The selected configurations.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

static validate_trajs(trajs, model_devis)[source]
dpgen2.superop package
Submodules
dpgen2.superop.block module
class dpgen2.superop.block.ConcurrentLearningBlock(name: str, prep_run_dp_train_op: PrepRunDPTrain, prep_run_explore_op: PrepRunLmp | PrepRunCaly, select_confs_op: Type[OP], prep_run_fp_op: PrepRunFp, collect_data_op: Type[OP], select_confs_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, collect_data_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.block.make_collect_data_optional_parameter(block_optional_parameter)[source]
dpgen2.superop.block.make_run_dp_train_optional_parameter(block_optional_parameter)[source]
dpgen2.superop.caly_evo_step module
class dpgen2.superop.caly_evo_step.CalyEvoStep(name: str, collect_run_caly: Type[OP], prep_run_dp_optim: Type[OP], prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_calypso module
class dpgen2.superop.prep_run_calypso.PrepRunCaly(name: str, prep_caly_input_op: Type[OP], caly_evo_step_op: OPTemplate, run_caly_model_devi_op: Type[OP], prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_dp_train module
class dpgen2.superop.prep_run_dp_train.ModifyTrainScript(*args, **kwargs)[source]

Bases: OP

Modify the training scripts to prepare them for training tasks in dpgen step.

Read the training scripts modified by finetune, and replace the original template scripts to be compatible with pre-trained models. New templates are returned as op[“template_script”].

Attributes:
key
workflow_name

Methods

execute(ip)

Execute the OP.

get_input_sign()

Get the signature of the inputs

get_output_sign()

Get the signature of the outputs

convert_to_graph

exec_sign_check

from_graph

function

get_info

get_input_artifact_link

get_input_artifact_storage_key

get_opio_info

get_output_artifact_link

get_output_artifact_storage_key

register_output_artifact

superfunction

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:
ipdict

Input dict with components:

  • scripts: (Artifact(Path)) Training scripts from finetune.

  • numb_models: (int) Number of DP models to train.

Returns:
opdict

Output dict with components:

  • template_script: (List[dict]) One template from one finetuning task. The length of the list should be the same as numb_models.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

class dpgen2.superop.prep_run_dp_train.PrepRunDPTrain(name: str, prep_train_op: ~typing.Type[~dflow.python.op.OP], run_train_op: ~typing.Type[~dpgen2.op.run_dp_train.RunDPTrain], modify_train_script_op: ~typing.Type[~dpgen2.superop.prep_run_dp_train.ModifyTrainScript] = <class 'dpgen2.superop.prep_run_dp_train.ModifyTrainScript'>, prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: ~typing.List[~os.PathLike] | None = None, finetune: bool = False)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_fp module
class dpgen2.superop.prep_run_fp.PrepRunFp(name: str, prep_op: Type[OP], run_op: Type[OP], prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.superop.prep_run_lmp module
class dpgen2.superop.prep_run_lmp.PrepRunLmp(name: str, prep_op: Type[OP], run_op: Type[OP], prep_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, run_config: dict = {'continue_on_failed': False, 'continue_on_num_success': None, 'continue_on_success_ratio': None, 'executor': None, 'parallelism': None, 'template_config': {'envs': None, 'image': 'dptechnology/dpgen2:latest', 'retry_on_transient_error': None, 'timeout': None, 'timeout_as_transient_error': False}}, upload_python_packages: List[PathLike] | None = None)[source]

Bases: Steps

Attributes:
input_artifacts
input_parameters
keys
output_artifacts
output_parameters

Methods

add(step)

Add a step or a list of parallel steps to the steps

add_slices

convert_to_argo

convert_to_graph

copy

deepcopy

from_dict

from_graph

handle_key

run

property input_artifacts
property input_parameters
property keys
property output_artifacts
property output_parameters
dpgen2.utils package
Submodules
dpgen2.utils.binary_file_input module

Binary file inputs

class dpgen2.utils.binary_file_input.BinaryFileInput(path: str | Path, ext: str | None = None)[source]

Bases: object

Methods

save_as_file

save_as_file(path: str | Path) None[source]
dpgen2.utils.bohrium_config module
dpgen2.utils.bohrium_config.bohrium_config_from_dict(bohrium_config)[source]
dpgen2.utils.chdir module
dpgen2.utils.chdir.chdir(path_key: str)[source]

Returns a decorator that can change the current working path.

Parameters:
path_keystr

key to OPIO

Examples

>>> class SomeOP(OP):
...     @chdir("path")
...     def execute(self, ip: OPIO):
...         do_something()
dpgen2.utils.chdir.set_directory(path: Path)[source]

Sets the current working path within the context.

Parameters:
pathPath

The path to the cwd

Yields:
None

Examples

>>> with set_directory("some_path"):
...    do_something()
dpgen2.utils.dflow_config module
dpgen2.utils.dflow_config.dflow_config(config_data)[source]

set the dflow config by config_data

the keys starting with “s3_” will be treated as s3_config keys, other keys are treated as config keys.

dpgen2.utils.dflow_config.dflow_config_lower(dflow_config)[source]
dpgen2.utils.dflow_config.dflow_s3_config(config_data)[source]

set the s3 config by config_data

dpgen2.utils.dflow_config.dflow_s3_config_lower(dflow_s3_config_data)[source]
dpgen2.utils.dflow_config.workflow_config_from_dict(wf_config)[source]
dpgen2.utils.dflow_query module
dpgen2.utils.dflow_query.find_slice_ranges(keys: List[str], sliced_subkey: str)[source]

find range of sliced OPs that matches the pattern ‘iter-[0-9]*–{sliced_subkey}-[0-9]*’

dpgen2.utils.dflow_query.get_all_schedulers(wf: Any, keys: List[str])[source]

get the output Scheduler of the all the iterations

dpgen2.utils.dflow_query.get_iteration(key: str)[source]
dpgen2.utils.dflow_query.get_last_iteration(keys: List[str])[source]

get the index of the last iteraction from a list of step keys.

dpgen2.utils.dflow_query.get_last_scheduler(wf: Any, keys: List[str])[source]

get the output Scheduler of the last successful iteration

dpgen2.utils.dflow_query.get_subkey(key: str, idx: int = -1)[source]
dpgen2.utils.dflow_query.matched_step_key(all_keys: List[str], step_keys: List[str] | None = None)[source]

returns the keys in all_keys that matches any of the step_keys

dpgen2.utils.dflow_query.print_keys_in_nice_format(keys: List[str], sliced_subkey: List[str], idx_fmt_len: int = 8)[source]
dpgen2.utils.dflow_query.sort_slice_ops(keys: List[str], sliced_subkey: List[str])[source]

sort the keys of the sliced ops. the keys of the sliced ops contains sliced_subkey

dpgen2.utils.download_dpgen2_artifacts module
class dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition[source]

Bases: object

Methods

add_def

add_input

add_output

add_def(tdict, key, suffix=None)[source]
add_input(input_key, suffix=None)[source]
add_output(output_key, suffix=None)[source]
dpgen2.utils.download_dpgen2_artifacts.download_dpgen2_artifacts(wf: Workflow, key: str, prefix: str | None = None, chk_pnt: bool = False)[source]

download the artifacts of a step. the key should be of format ‘iter-xxxxxx–subkey-of-step-xxxxxx’ the input and output artifacts will be downloaded to prefix/iter-xxxxxx/key-of-step/inputs/ and prefix/iter-xxxxxx/key-of-step/outputs/

the downloaded input and output artifacts of steps are defined by op_download_setting

dpgen2.utils.download_dpgen2_artifacts.download_dpgen2_artifacts_by_def(wf: Workflow, iterations: List[int] | None = None, step_defs: List[str] | None = None, prefix: str | None = None, chk_pnt: bool = False)[source]
dpgen2.utils.download_dpgen2_artifacts.print_op_download_setting(op_download_setting={'collect-data': <dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition object>, 'prep-run-fp': <dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition object>, 'prep-run-lmp': <dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition object>, 'prep-run-train': <dpgen2.utils.download_dpgen2_artifacts.DownloadDefinition object>})[source]
dpgen2.utils.obj_artifact module
dpgen2.utils.obj_artifact.dump_object_to_file(obj, fname)[source]

pickle dump object to a file

dpgen2.utils.obj_artifact.load_object_from_file(fname)[source]

pickle load object from a file

dpgen2.utils.run_command module
dpgen2.utils.run_command.run_command(cmd: str | List[str], shell: bool = False) Tuple[int, str, str][source]
dpgen2.utils.step_config module
dpgen2.utils.step_config.dispatcher_args()[source]

free style dispatcher args

dpgen2.utils.step_config.gen_doc(*, make_anchor=True, make_link=True, **kwargs)[source]
dpgen2.utils.step_config.init_executor(executor_dict)[source]
dpgen2.utils.step_config.normalize(data)[source]
dpgen2.utils.step_config.step_conf_args()[source]
dpgen2.utils.step_config.template_conf_args()[source]
dpgen2.utils.step_config.template_slice_conf_args()[source]
dpgen2.utils.step_config.variant_executor()[source]

Submodules

dpgen2.constants module