dpgen run machine parameters#
Note
One can load, modify, and export the input file by using our effective web-based tool DP-GUI online or hosted using the command line interface dpgen gui
. All parameters below can be set in DP-GUI. By clicking “SAVE JSON”, one can download the input file.
- run_mdata:#
- type:
dict
argument path:run_mdata
machine.json file
- api_version:#
- type:
str
, optional, default:1.0
argument path:run_mdata/api_version
Please set to 1.0
- deepmd_version:#
- type:
str
, optional, default:2
argument path:run_mdata/deepmd_version
DeePMD-kit version, e.g. 2.1.3
- train:#
- type:
dict
argument path:run_mdata/train
Parameters of command, machine, and resources for train
- command:#
- type:
str
argument path:run_mdata/train/command
Command of a program.
- machine:#
- type:
dict
argument path:run_mdata/train/machine
- batch_type:#
- type:
str
argument path:run_mdata/train/machine/batch_type
The batch job system type. Option: Fugaku, LSF, SlurmJobArray, JH_UniScheduler, Shell, Torque, PBS, SGE, Slurm, DistributedShell, OpenAPI, Bohrium
- local_root:#
- type:
str
|NoneType
argument path:run_mdata/train/machine/local_root
The dir where the tasks and relating files locate. Typically the project dir.
- remote_root:#
- type:
str
|NoneType
, optionalargument path:run_mdata/train/machine/remote_root
The dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.
- clean_asynchronously:#
- type:
bool
, optional, default:False
argument path:run_mdata/train/machine/clean_asynchronously
Clean the remote directory asynchronously after the job finishes.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str
(flag key)argument path:run_mdata/train/machine/context_type
possible choices:LocalContext
,BohriumContext
,HDFSContext
,LazyLocalContext
,OpenAPIContext
,SSHContext
The connection used to remote machine. Option: LocalContext, SSHContext, LazyLocalContext, BohriumContext, OpenAPIContext, HDFSContext
When context_type is set to
LocalContext
(or its aliaseslocalcontext
,Local
,local
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/train/machine[LocalContext]/remote_profile
The information used to maintain the local machine.
- symlink:#
- type:
bool
, optional, default:True
argument path:run_mdata/train/machine[LocalContext]/remote_profile/symlink
Whether to use symbolic links to replace copy. This option should be turned off if the local directory is not accessible on the Batch system.
When context_type is set to
BohriumContext
(or its aliasesbohriumcontext
,Bohrium
,bohrium
,DpCloudServerContext
,dpcloudservercontext
,DpCloudServer
,dpcloudserver
,LebesgueContext
,lebesguecontext
,Lebesgue
,lebesgue
):- remote_profile:#
- type:
dict
argument path:run_mdata/train/machine[BohriumContext]/remote_profile
The information used to maintain the connection with remote machine.
- email:#
- type:
str
, optionalargument path:run_mdata/train/machine[BohriumContext]/remote_profile/email
Email
- password:#
- type:
str
, optionalargument path:run_mdata/train/machine[BohriumContext]/remote_profile/password
Password
- program_id:#
- type:
int
, alias: project_idargument path:run_mdata/train/machine[BohriumContext]/remote_profile/program_id
Program ID
- retry_count:#
- type:
NoneType
|int
, optional, default:2
argument path:run_mdata/train/machine[BohriumContext]/remote_profile/retry_count
The retry count when a job is terminated
- ignore_exit_code:#
- type:
bool
, optional, default:True
argument path:run_mdata/train/machine[BohriumContext]/remote_profile/ignore_exit_code
- The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,
the job state will be designated as terminated.
- keep_backup:#
- type:
bool
, optionalargument path:run_mdata/train/machine[BohriumContext]/remote_profile/keep_backup
keep download and upload zip
- input_data:#
- type:
dict
argument path:run_mdata/train/machine[BohriumContext]/remote_profile/input_data
Configuration of job
When context_type is set to
HDFSContext
(or its aliaseshdfscontext
,HDFS
,hdfs
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/train/machine[HDFSContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
LazyLocalContext
(or its aliaseslazylocalcontext
,LazyLocal
,lazylocal
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/train/machine[LazyLocalContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
OpenAPIContext
(or its aliasesopenapicontext
,OpenAPI
,openapi
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/train/machine[OpenAPIContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
SSHContext
(or its aliasessshcontext
,SSH
,ssh
):- remote_profile:#
- type:
dict
argument path:run_mdata/train/machine[SSHContext]/remote_profile
The information used to maintain the connection with remote machine.
- hostname:#
- type:
str
argument path:run_mdata/train/machine[SSHContext]/remote_profile/hostname
hostname or ip of ssh connection.
- username:#
- type:
str
argument path:run_mdata/train/machine[SSHContext]/remote_profile/username
username of target linux system
- password:#
- type:
str
, optionalargument path:run_mdata/train/machine[SSHContext]/remote_profile/password
(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int
, optional, default:22
argument path:run_mdata/train/machine[SSHContext]/remote_profile/port
ssh connection port.
- key_filename:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/machine[SSHContext]/remote_profile/key_filename
key filename used by ssh connection. If left None, find key in ~/.ssh or use password for login
- passphrase:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/machine[SSHContext]/remote_profile/passphrase
passphrase of key used by ssh connection
- timeout:#
- type:
int
, optional, default:10
argument path:run_mdata/train/machine[SSHContext]/remote_profile/timeout
timeout of ssh connection
- totp_secret:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/machine[SSHContext]/remote_profile/totp_secret
Time-based one time password secret. It should be a base32-encoded string extracted from the 2D code.
- tar_compress:#
- type:
bool
, optional, default:True
argument path:run_mdata/train/machine[SSHContext]/remote_profile/tar_compress
The archive will be compressed in upload and download if it is True. If not, compression will be skipped.
- look_for_keys:#
- type:
bool
, optional, default:True
argument path:run_mdata/train/machine[SSHContext]/remote_profile/look_for_keys
enable searching for discoverable private key files in ~/.ssh/
- execute_command:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/machine[SSHContext]/remote_profile/execute_command
execute command after ssh connection is established.
- resources:#
- type:
dict
argument path:run_mdata/train/resources
- number_node:#
- type:
int
, optional, default:1
argument path:run_mdata/train/resources/number_node
The number of nodes required for each job.
- cpu_per_node:#
- type:
int
, optional, default:1
argument path:run_mdata/train/resources/cpu_per_node
CPU numbers of each node assigned to each job.
- gpu_per_node:#
- type:
int
, optional, default:0
argument path:run_mdata/train/resources/gpu_per_node
GPU numbers of each node assigned to each job.
- queue_name:#
- type:
str
, optional, default: (empty string)argument path:run_mdata/train/resources/queue_name
The queue name of batch job scheduler system.
- group_size:#
- type:
int
argument path:run_mdata/train/resources/group_size
The number of tasks in a job. 0 means infinity.
- custom_flags:#
- type:
typing.List[str]
, optionalargument path:run_mdata/train/resources/custom_flags
The extra lines pass to job submitting script header
- strategy:#
- type:
dict
, optionalargument path:run_mdata/train/resources/strategy
strategies we use to generation job submitting scripts.
- if_cuda_multi_devices:#
- type:
bool
, optional, default:False
argument path:run_mdata/train/resources/strategy/if_cuda_multi_devices
If there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.
- ratio_unfinished:#
- type:
float
, optional, default:0.0
argument path:run_mdata/train/resources/strategy/ratio_unfinished
The ratio of tasks that can be unfinished.
- customized_script_header_template_file:#
- type:
str
, optionalargument path:run_mdata/train/resources/strategy/customized_script_header_template_file
The customized template file to generate job submitting script header, which overrides the default file.
- para_deg:#
- type:
int
, optional, default:1
argument path:run_mdata/train/resources/para_deg
Decide how many tasks will be run in parallel.
- source_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/train/resources/source_list
The env file to be sourced before the command execution.
- module_purge:#
- type:
bool
, optional, default:False
argument path:run_mdata/train/resources/module_purge
Remove all modules on HPC system before module load (module_list)
- module_unload_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/train/resources/module_unload_list
The modules to be unloaded on HPC system before submitting jobs
- module_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/train/resources/module_list
The modules to be loaded on HPC system before submitting jobs
- envs:#
- type:
dict
, optional, default:{}
argument path:run_mdata/train/resources/envs
The environment variables to be exported on before submitting jobs
- prepend_script:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/train/resources/prepend_script
Optional script run before jobs submitted.
- append_script:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/train/resources/append_script
Optional script run after jobs submitted.
- wait_time:#
- type:
float
|int
, optional, default:0
argument path:run_mdata/train/resources/wait_time
The waitting time in second after a single task submitted
Depending on the value of batch_type, different sub args are accepted.
- batch_type:#
- type:
str
(flag key)argument path:run_mdata/train/resources/batch_type
possible choices:Bohrium
,Torque
,Shell
,OpenAPI
,JH_UniScheduler
,Slurm
,SGE
,DistributedShell
,LSF
,PBS
,SlurmJobArray
,Fugaku
The batch job system type loaded from machine/batch_type.
When batch_type is set to
Bohrium
(or its aliasesbohrium
,Lebesgue
,lebesgue
,DpCloudServer
,dpcloudserver
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[Bohrium]/kwargs
This field is empty for this batch.
When batch_type is set to
Torque
(or its aliastorque
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[Torque]/kwargs
This field is empty for this batch.
When batch_type is set to
Shell
(or its aliasshell
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[Shell]/kwargs
This field is empty for this batch.
When batch_type is set to
OpenAPI
(or its aliasopenapi
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[OpenAPI]/kwargs
This field is empty for this batch.
When batch_type is set to
JH_UniScheduler
(or its aliasjh_unischeduler
):- kwargs:#
- type:
dict
argument path:run_mdata/train/resources[JH_UniScheduler]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/resources[JH_UniScheduler]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #JSUB
When batch_type is set to
Slurm
(or its aliasslurm
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[Slurm]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/resources[Slurm]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #SBATCH
When batch_type is set to
SGE
(or its aliassge
):- kwargs:#
- type:
dict
argument path:run_mdata/train/resources[SGE]/kwargs
Extra arguments.
- pe_name:#
- type:
str
, optional, default:mpi
, alias: sge_pe_nameargument path:run_mdata/train/resources[SGE]/kwargs/pe_name
The parallel environment name of SGE system.
- job_name:#
- type:
str
, optional, default:wDPjob
argument path:run_mdata/train/resources[SGE]/kwargs/job_name
The name of SGE’s job.
When batch_type is set to
DistributedShell
(or its aliasdistributedshell
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[DistributedShell]/kwargs
This field is empty for this batch.
When batch_type is set to
LSF
(or its aliaslsf
):- kwargs:#
- type:
dict
argument path:run_mdata/train/resources[LSF]/kwargs
Extra arguments.
- gpu_usage:#
- type:
bool
, optional, default:False
argument path:run_mdata/train/resources[LSF]/kwargs/gpu_usage
Choosing if GPU is used in the calculation step.
- gpu_new_syntax:#
- type:
bool
, optional, default:False
argument path:run_mdata/train/resources[LSF]/kwargs/gpu_new_syntax
For LFS >= 10.1.0.3, new option -gpu for #BSUB could be used. If False, and old syntax would be used.
- gpu_exclusive:#
- type:
bool
, optional, default:True
argument path:run_mdata/train/resources[LSF]/kwargs/gpu_exclusive
Only take effect when new syntax enabled. Control whether submit tasks in exclusive way for GPU.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/resources[LSF]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #BSUB
When batch_type is set to
PBS
(or its aliaspbs
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[PBS]/kwargs
This field is empty for this batch.
When batch_type is set to
SlurmJobArray
(or its aliasslurmjobarray
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[SlurmJobArray]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/train/resources[SlurmJobArray]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #SBATCH
- slurm_job_size:#
- type:
int
, optional, default:1
argument path:run_mdata/train/resources[SlurmJobArray]/kwargs/slurm_job_size
Number of tasks in a Slurm job
When batch_type is set to
Fugaku
(or its aliasfugaku
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/train/resources[Fugaku]/kwargs
This field is empty for this batch.
- user_forward_files:#
- type:
list
, optionalargument path:run_mdata/train/user_forward_files
Files to be forwarded to the remote machine.
- user_backward_files:#
- type:
list
, optionalargument path:run_mdata/train/user_backward_files
Files to be backwarded from the remote machine.
- model_devi:#
- type:
dict
argument path:run_mdata/model_devi
Parameters of command, machine, and resources for model_devi
- command:#
- type:
str
argument path:run_mdata/model_devi/command
Command of a program.
- machine:#
- type:
dict
argument path:run_mdata/model_devi/machine
- batch_type:#
- type:
str
argument path:run_mdata/model_devi/machine/batch_type
The batch job system type. Option: Fugaku, LSF, SlurmJobArray, JH_UniScheduler, Shell, Torque, PBS, SGE, Slurm, DistributedShell, OpenAPI, Bohrium
- local_root:#
- type:
str
|NoneType
argument path:run_mdata/model_devi/machine/local_root
The dir where the tasks and relating files locate. Typically the project dir.
- remote_root:#
- type:
str
|NoneType
, optionalargument path:run_mdata/model_devi/machine/remote_root
The dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.
- clean_asynchronously:#
- type:
bool
, optional, default:False
argument path:run_mdata/model_devi/machine/clean_asynchronously
Clean the remote directory asynchronously after the job finishes.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str
(flag key)argument path:run_mdata/model_devi/machine/context_type
possible choices:LocalContext
,BohriumContext
,HDFSContext
,LazyLocalContext
,OpenAPIContext
,SSHContext
The connection used to remote machine. Option: LocalContext, SSHContext, LazyLocalContext, BohriumContext, OpenAPIContext, HDFSContext
When context_type is set to
LocalContext
(or its aliaseslocalcontext
,Local
,local
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/model_devi/machine[LocalContext]/remote_profile
The information used to maintain the local machine.
- symlink:#
- type:
bool
, optional, default:True
argument path:run_mdata/model_devi/machine[LocalContext]/remote_profile/symlink
Whether to use symbolic links to replace copy. This option should be turned off if the local directory is not accessible on the Batch system.
When context_type is set to
BohriumContext
(or its aliasesbohriumcontext
,Bohrium
,bohrium
,DpCloudServerContext
,dpcloudservercontext
,DpCloudServer
,dpcloudserver
,LebesgueContext
,lebesguecontext
,Lebesgue
,lebesgue
):- remote_profile:#
- type:
dict
argument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile
The information used to maintain the connection with remote machine.
- email:#
- type:
str
, optionalargument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/email
Email
- password:#
- type:
str
, optionalargument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/password
Password
- program_id:#
- type:
int
, alias: project_idargument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/program_id
Program ID
- retry_count:#
- type:
NoneType
|int
, optional, default:2
argument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/retry_count
The retry count when a job is terminated
- ignore_exit_code:#
- type:
bool
, optional, default:True
argument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/ignore_exit_code
- The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,
the job state will be designated as terminated.
- keep_backup:#
- type:
bool
, optionalargument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/keep_backup
keep download and upload zip
- input_data:#
- type:
dict
argument path:run_mdata/model_devi/machine[BohriumContext]/remote_profile/input_data
Configuration of job
When context_type is set to
HDFSContext
(or its aliaseshdfscontext
,HDFS
,hdfs
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/model_devi/machine[HDFSContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
LazyLocalContext
(or its aliaseslazylocalcontext
,LazyLocal
,lazylocal
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/model_devi/machine[LazyLocalContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
OpenAPIContext
(or its aliasesopenapicontext
,OpenAPI
,openapi
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/model_devi/machine[OpenAPIContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
SSHContext
(or its aliasessshcontext
,SSH
,ssh
):- remote_profile:#
- type:
dict
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile
The information used to maintain the connection with remote machine.
- hostname:#
- type:
str
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/hostname
hostname or ip of ssh connection.
- username:#
- type:
str
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/username
username of target linux system
- password:#
- type:
str
, optionalargument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/password
(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int
, optional, default:22
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/port
ssh connection port.
- key_filename:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/key_filename
key filename used by ssh connection. If left None, find key in ~/.ssh or use password for login
- passphrase:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/passphrase
passphrase of key used by ssh connection
- timeout:#
- type:
int
, optional, default:10
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/timeout
timeout of ssh connection
- totp_secret:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/totp_secret
Time-based one time password secret. It should be a base32-encoded string extracted from the 2D code.
- tar_compress:#
- type:
bool
, optional, default:True
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/tar_compress
The archive will be compressed in upload and download if it is True. If not, compression will be skipped.
- look_for_keys:#
- type:
bool
, optional, default:True
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/look_for_keys
enable searching for discoverable private key files in ~/.ssh/
- execute_command:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/machine[SSHContext]/remote_profile/execute_command
execute command after ssh connection is established.
- resources:#
- type:
dict
argument path:run_mdata/model_devi/resources
- number_node:#
- type:
int
, optional, default:1
argument path:run_mdata/model_devi/resources/number_node
The number of nodes required for each job.
- cpu_per_node:#
- type:
int
, optional, default:1
argument path:run_mdata/model_devi/resources/cpu_per_node
CPU numbers of each node assigned to each job.
- gpu_per_node:#
- type:
int
, optional, default:0
argument path:run_mdata/model_devi/resources/gpu_per_node
GPU numbers of each node assigned to each job.
- queue_name:#
- type:
str
, optional, default: (empty string)argument path:run_mdata/model_devi/resources/queue_name
The queue name of batch job scheduler system.
- group_size:#
- type:
int
argument path:run_mdata/model_devi/resources/group_size
The number of tasks in a job. 0 means infinity.
- custom_flags:#
- type:
typing.List[str]
, optionalargument path:run_mdata/model_devi/resources/custom_flags
The extra lines pass to job submitting script header
- strategy:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources/strategy
strategies we use to generation job submitting scripts.
- if_cuda_multi_devices:#
- type:
bool
, optional, default:False
argument path:run_mdata/model_devi/resources/strategy/if_cuda_multi_devices
If there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.
- ratio_unfinished:#
- type:
float
, optional, default:0.0
argument path:run_mdata/model_devi/resources/strategy/ratio_unfinished
The ratio of tasks that can be unfinished.
- customized_script_header_template_file:#
- type:
str
, optionalargument path:run_mdata/model_devi/resources/strategy/customized_script_header_template_file
The customized template file to generate job submitting script header, which overrides the default file.
- para_deg:#
- type:
int
, optional, default:1
argument path:run_mdata/model_devi/resources/para_deg
Decide how many tasks will be run in parallel.
- source_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/model_devi/resources/source_list
The env file to be sourced before the command execution.
- module_purge:#
- type:
bool
, optional, default:False
argument path:run_mdata/model_devi/resources/module_purge
Remove all modules on HPC system before module load (module_list)
- module_unload_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/model_devi/resources/module_unload_list
The modules to be unloaded on HPC system before submitting jobs
- module_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/model_devi/resources/module_list
The modules to be loaded on HPC system before submitting jobs
- envs:#
- type:
dict
, optional, default:{}
argument path:run_mdata/model_devi/resources/envs
The environment variables to be exported on before submitting jobs
- prepend_script:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/model_devi/resources/prepend_script
Optional script run before jobs submitted.
- append_script:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/model_devi/resources/append_script
Optional script run after jobs submitted.
- wait_time:#
- type:
float
|int
, optional, default:0
argument path:run_mdata/model_devi/resources/wait_time
The waitting time in second after a single task submitted
Depending on the value of batch_type, different sub args are accepted.
- batch_type:#
- type:
str
(flag key)argument path:run_mdata/model_devi/resources/batch_type
possible choices:Bohrium
,Torque
,Shell
,OpenAPI
,JH_UniScheduler
,Slurm
,SGE
,DistributedShell
,LSF
,PBS
,SlurmJobArray
,Fugaku
The batch job system type loaded from machine/batch_type.
When batch_type is set to
Bohrium
(or its aliasesbohrium
,Lebesgue
,lebesgue
,DpCloudServer
,dpcloudserver
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[Bohrium]/kwargs
This field is empty for this batch.
When batch_type is set to
Torque
(or its aliastorque
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[Torque]/kwargs
This field is empty for this batch.
When batch_type is set to
Shell
(or its aliasshell
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[Shell]/kwargs
This field is empty for this batch.
When batch_type is set to
OpenAPI
(or its aliasopenapi
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[OpenAPI]/kwargs
This field is empty for this batch.
When batch_type is set to
JH_UniScheduler
(or its aliasjh_unischeduler
):- kwargs:#
- type:
dict
argument path:run_mdata/model_devi/resources[JH_UniScheduler]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/resources[JH_UniScheduler]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #JSUB
When batch_type is set to
Slurm
(or its aliasslurm
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[Slurm]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/resources[Slurm]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #SBATCH
When batch_type is set to
SGE
(or its aliassge
):- kwargs:#
- type:
dict
argument path:run_mdata/model_devi/resources[SGE]/kwargs
Extra arguments.
- pe_name:#
- type:
str
, optional, default:mpi
, alias: sge_pe_nameargument path:run_mdata/model_devi/resources[SGE]/kwargs/pe_name
The parallel environment name of SGE system.
- job_name:#
- type:
str
, optional, default:wDPjob
argument path:run_mdata/model_devi/resources[SGE]/kwargs/job_name
The name of SGE’s job.
When batch_type is set to
DistributedShell
(or its aliasdistributedshell
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[DistributedShell]/kwargs
This field is empty for this batch.
When batch_type is set to
LSF
(or its aliaslsf
):- kwargs:#
- type:
dict
argument path:run_mdata/model_devi/resources[LSF]/kwargs
Extra arguments.
- gpu_usage:#
- type:
bool
, optional, default:False
argument path:run_mdata/model_devi/resources[LSF]/kwargs/gpu_usage
Choosing if GPU is used in the calculation step.
- gpu_new_syntax:#
- type:
bool
, optional, default:False
argument path:run_mdata/model_devi/resources[LSF]/kwargs/gpu_new_syntax
For LFS >= 10.1.0.3, new option -gpu for #BSUB could be used. If False, and old syntax would be used.
- gpu_exclusive:#
- type:
bool
, optional, default:True
argument path:run_mdata/model_devi/resources[LSF]/kwargs/gpu_exclusive
Only take effect when new syntax enabled. Control whether submit tasks in exclusive way for GPU.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/resources[LSF]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #BSUB
When batch_type is set to
PBS
(or its aliaspbs
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[PBS]/kwargs
This field is empty for this batch.
When batch_type is set to
SlurmJobArray
(or its aliasslurmjobarray
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[SlurmJobArray]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/model_devi/resources[SlurmJobArray]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #SBATCH
- slurm_job_size:#
- type:
int
, optional, default:1
argument path:run_mdata/model_devi/resources[SlurmJobArray]/kwargs/slurm_job_size
Number of tasks in a Slurm job
When batch_type is set to
Fugaku
(or its aliasfugaku
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/model_devi/resources[Fugaku]/kwargs
This field is empty for this batch.
- user_forward_files:#
- type:
list
, optionalargument path:run_mdata/model_devi/user_forward_files
Files to be forwarded to the remote machine.
- user_backward_files:#
- type:
list
, optionalargument path:run_mdata/model_devi/user_backward_files
Files to be backwarded from the remote machine.
- fp:#
- type:
dict
argument path:run_mdata/fp
Parameters of command, machine, and resources for fp
- command:#
- type:
str
argument path:run_mdata/fp/command
Command of a program.
- machine:#
- type:
dict
argument path:run_mdata/fp/machine
- batch_type:#
- type:
str
argument path:run_mdata/fp/machine/batch_type
The batch job system type. Option: Fugaku, LSF, SlurmJobArray, JH_UniScheduler, Shell, Torque, PBS, SGE, Slurm, DistributedShell, OpenAPI, Bohrium
- local_root:#
- type:
str
|NoneType
argument path:run_mdata/fp/machine/local_root
The dir where the tasks and relating files locate. Typically the project dir.
- remote_root:#
- type:
str
|NoneType
, optionalargument path:run_mdata/fp/machine/remote_root
The dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.
- clean_asynchronously:#
- type:
bool
, optional, default:False
argument path:run_mdata/fp/machine/clean_asynchronously
Clean the remote directory asynchronously after the job finishes.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str
(flag key)argument path:run_mdata/fp/machine/context_type
possible choices:LocalContext
,BohriumContext
,HDFSContext
,LazyLocalContext
,OpenAPIContext
,SSHContext
The connection used to remote machine. Option: LocalContext, SSHContext, LazyLocalContext, BohriumContext, OpenAPIContext, HDFSContext
When context_type is set to
LocalContext
(or its aliaseslocalcontext
,Local
,local
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/fp/machine[LocalContext]/remote_profile
The information used to maintain the local machine.
- symlink:#
- type:
bool
, optional, default:True
argument path:run_mdata/fp/machine[LocalContext]/remote_profile/symlink
Whether to use symbolic links to replace copy. This option should be turned off if the local directory is not accessible on the Batch system.
When context_type is set to
BohriumContext
(or its aliasesbohriumcontext
,Bohrium
,bohrium
,DpCloudServerContext
,dpcloudservercontext
,DpCloudServer
,dpcloudserver
,LebesgueContext
,lebesguecontext
,Lebesgue
,lebesgue
):- remote_profile:#
- type:
dict
argument path:run_mdata/fp/machine[BohriumContext]/remote_profile
The information used to maintain the connection with remote machine.
- email:#
- type:
str
, optionalargument path:run_mdata/fp/machine[BohriumContext]/remote_profile/email
Email
- password:#
- type:
str
, optionalargument path:run_mdata/fp/machine[BohriumContext]/remote_profile/password
Password
- program_id:#
- type:
int
, alias: project_idargument path:run_mdata/fp/machine[BohriumContext]/remote_profile/program_id
Program ID
- retry_count:#
- type:
NoneType
|int
, optional, default:2
argument path:run_mdata/fp/machine[BohriumContext]/remote_profile/retry_count
The retry count when a job is terminated
- ignore_exit_code:#
- type:
bool
, optional, default:True
argument path:run_mdata/fp/machine[BohriumContext]/remote_profile/ignore_exit_code
- The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,
the job state will be designated as terminated.
- keep_backup:#
- type:
bool
, optionalargument path:run_mdata/fp/machine[BohriumContext]/remote_profile/keep_backup
keep download and upload zip
- input_data:#
- type:
dict
argument path:run_mdata/fp/machine[BohriumContext]/remote_profile/input_data
Configuration of job
When context_type is set to
HDFSContext
(or its aliaseshdfscontext
,HDFS
,hdfs
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/fp/machine[HDFSContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
LazyLocalContext
(or its aliaseslazylocalcontext
,LazyLocal
,lazylocal
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/fp/machine[LazyLocalContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
OpenAPIContext
(or its aliasesopenapicontext
,OpenAPI
,openapi
):- remote_profile:#
- type:
dict
, optionalargument path:run_mdata/fp/machine[OpenAPIContext]/remote_profile
The information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
SSHContext
(or its aliasessshcontext
,SSH
,ssh
):- remote_profile:#
- type:
dict
argument path:run_mdata/fp/machine[SSHContext]/remote_profile
The information used to maintain the connection with remote machine.
- hostname:#
- type:
str
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/hostname
hostname or ip of ssh connection.
- username:#
- type:
str
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/username
username of target linux system
- password:#
- type:
str
, optionalargument path:run_mdata/fp/machine[SSHContext]/remote_profile/password
(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int
, optional, default:22
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/port
ssh connection port.
- key_filename:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/key_filename
key filename used by ssh connection. If left None, find key in ~/.ssh or use password for login
- passphrase:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/passphrase
passphrase of key used by ssh connection
- timeout:#
- type:
int
, optional, default:10
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/timeout
timeout of ssh connection
- totp_secret:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/totp_secret
Time-based one time password secret. It should be a base32-encoded string extracted from the 2D code.
- tar_compress:#
- type:
bool
, optional, default:True
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/tar_compress
The archive will be compressed in upload and download if it is True. If not, compression will be skipped.
- look_for_keys:#
- type:
bool
, optional, default:True
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/look_for_keys
enable searching for discoverable private key files in ~/.ssh/
- execute_command:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/machine[SSHContext]/remote_profile/execute_command
execute command after ssh connection is established.
- resources:#
- type:
dict
argument path:run_mdata/fp/resources
- number_node:#
- type:
int
, optional, default:1
argument path:run_mdata/fp/resources/number_node
The number of nodes required for each job.
- cpu_per_node:#
- type:
int
, optional, default:1
argument path:run_mdata/fp/resources/cpu_per_node
CPU numbers of each node assigned to each job.
- gpu_per_node:#
- type:
int
, optional, default:0
argument path:run_mdata/fp/resources/gpu_per_node
GPU numbers of each node assigned to each job.
- queue_name:#
- type:
str
, optional, default: (empty string)argument path:run_mdata/fp/resources/queue_name
The queue name of batch job scheduler system.
- group_size:#
- type:
int
argument path:run_mdata/fp/resources/group_size
The number of tasks in a job. 0 means infinity.
- custom_flags:#
- type:
typing.List[str]
, optionalargument path:run_mdata/fp/resources/custom_flags
The extra lines pass to job submitting script header
- strategy:#
- type:
dict
, optionalargument path:run_mdata/fp/resources/strategy
strategies we use to generation job submitting scripts.
- if_cuda_multi_devices:#
- type:
bool
, optional, default:False
argument path:run_mdata/fp/resources/strategy/if_cuda_multi_devices
If there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.
- ratio_unfinished:#
- type:
float
, optional, default:0.0
argument path:run_mdata/fp/resources/strategy/ratio_unfinished
The ratio of tasks that can be unfinished.
- customized_script_header_template_file:#
- type:
str
, optionalargument path:run_mdata/fp/resources/strategy/customized_script_header_template_file
The customized template file to generate job submitting script header, which overrides the default file.
- para_deg:#
- type:
int
, optional, default:1
argument path:run_mdata/fp/resources/para_deg
Decide how many tasks will be run in parallel.
- source_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/fp/resources/source_list
The env file to be sourced before the command execution.
- module_purge:#
- type:
bool
, optional, default:False
argument path:run_mdata/fp/resources/module_purge
Remove all modules on HPC system before module load (module_list)
- module_unload_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/fp/resources/module_unload_list
The modules to be unloaded on HPC system before submitting jobs
- module_list:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/fp/resources/module_list
The modules to be loaded on HPC system before submitting jobs
- envs:#
- type:
dict
, optional, default:{}
argument path:run_mdata/fp/resources/envs
The environment variables to be exported on before submitting jobs
- prepend_script:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/fp/resources/prepend_script
Optional script run before jobs submitted.
- append_script:#
- type:
typing.List[str]
, optional, default:[]
argument path:run_mdata/fp/resources/append_script
Optional script run after jobs submitted.
- wait_time:#
- type:
float
|int
, optional, default:0
argument path:run_mdata/fp/resources/wait_time
The waitting time in second after a single task submitted
Depending on the value of batch_type, different sub args are accepted.
- batch_type:#
- type:
str
(flag key)argument path:run_mdata/fp/resources/batch_type
possible choices:Bohrium
,Torque
,Shell
,OpenAPI
,JH_UniScheduler
,Slurm
,SGE
,DistributedShell
,LSF
,PBS
,SlurmJobArray
,Fugaku
The batch job system type loaded from machine/batch_type.
When batch_type is set to
Bohrium
(or its aliasesbohrium
,Lebesgue
,lebesgue
,DpCloudServer
,dpcloudserver
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[Bohrium]/kwargs
This field is empty for this batch.
When batch_type is set to
Torque
(or its aliastorque
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[Torque]/kwargs
This field is empty for this batch.
When batch_type is set to
Shell
(or its aliasshell
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[Shell]/kwargs
This field is empty for this batch.
When batch_type is set to
OpenAPI
(or its aliasopenapi
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[OpenAPI]/kwargs
This field is empty for this batch.
When batch_type is set to
JH_UniScheduler
(or its aliasjh_unischeduler
):- kwargs:#
- type:
dict
argument path:run_mdata/fp/resources[JH_UniScheduler]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/resources[JH_UniScheduler]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #JSUB
When batch_type is set to
Slurm
(or its aliasslurm
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[Slurm]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/resources[Slurm]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #SBATCH
When batch_type is set to
SGE
(or its aliassge
):- kwargs:#
- type:
dict
argument path:run_mdata/fp/resources[SGE]/kwargs
Extra arguments.
- pe_name:#
- type:
str
, optional, default:mpi
, alias: sge_pe_nameargument path:run_mdata/fp/resources[SGE]/kwargs/pe_name
The parallel environment name of SGE system.
- job_name:#
- type:
str
, optional, default:wDPjob
argument path:run_mdata/fp/resources[SGE]/kwargs/job_name
The name of SGE’s job.
When batch_type is set to
DistributedShell
(or its aliasdistributedshell
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[DistributedShell]/kwargs
This field is empty for this batch.
When batch_type is set to
LSF
(or its aliaslsf
):- kwargs:#
- type:
dict
argument path:run_mdata/fp/resources[LSF]/kwargs
Extra arguments.
- gpu_usage:#
- type:
bool
, optional, default:False
argument path:run_mdata/fp/resources[LSF]/kwargs/gpu_usage
Choosing if GPU is used in the calculation step.
- gpu_new_syntax:#
- type:
bool
, optional, default:False
argument path:run_mdata/fp/resources[LSF]/kwargs/gpu_new_syntax
For LFS >= 10.1.0.3, new option -gpu for #BSUB could be used. If False, and old syntax would be used.
- gpu_exclusive:#
- type:
bool
, optional, default:True
argument path:run_mdata/fp/resources[LSF]/kwargs/gpu_exclusive
Only take effect when new syntax enabled. Control whether submit tasks in exclusive way for GPU.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/resources[LSF]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #BSUB
When batch_type is set to
PBS
(or its aliaspbs
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[PBS]/kwargs
This field is empty for this batch.
When batch_type is set to
SlurmJobArray
(or its aliasslurmjobarray
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[SlurmJobArray]/kwargs
Extra arguments.
- custom_gpu_line:#
- type:
str
|NoneType
, optional, default:None
argument path:run_mdata/fp/resources[SlurmJobArray]/kwargs/custom_gpu_line
Custom GPU configuration, starting with #SBATCH
- slurm_job_size:#
- type:
int
, optional, default:1
argument path:run_mdata/fp/resources[SlurmJobArray]/kwargs/slurm_job_size
Number of tasks in a Slurm job
When batch_type is set to
Fugaku
(or its aliasfugaku
):- kwargs:#
- type:
dict
, optionalargument path:run_mdata/fp/resources[Fugaku]/kwargs
This field is empty for this batch.
- user_forward_files:#
- type:
list
, optionalargument path:run_mdata/fp/user_forward_files
Files to be forwarded to the remote machine.
- user_backward_files:#
- type:
list
, optionalargument path:run_mdata/fp/user_backward_files
Files to be backwarded from the remote machine.