dpgen simplify machine parameters#
Note
One can load, modify, and export the input file by using our effective web-based tool DP-GUI online or hosted using the command line interface dpgen gui. All parameters below can be set in DP-GUI. By clicking “SAVE JSON”, one can download the input file.
- simplify_mdata:#
- type:
dictargument path:simplify_mdatamachine.json file
- api_version:#
- type:
str, optional, default:1.0argument path:simplify_mdata/api_versionPlease set to 1.0
- deepmd_version:#
- type:
str, optional, default:2argument path:simplify_mdata/deepmd_versionDeePMD-kit version, e.g. 2.1.3
- train:#
- type:
dictargument path:simplify_mdata/trainParameters of command, machine, and resources for train
- command:#
- type:
strargument path:simplify_mdata/train/commandCommand of a program.
- machine:#
- type:
dictargument path:simplify_mdata/train/machine- batch_type:#
- type:
strargument path:simplify_mdata/train/machine/batch_typeThe batch job system type. Option: LSF, PBS, Slurm, Fugaku, SGE, Torque, DistributedShell, Bohrium, Shell, JH_UniScheduler, OpenAPI, SlurmJobArray
- local_root:#
- type:
str|NoneTypeargument path:simplify_mdata/train/machine/local_rootThe dir where the tasks and relating files locate. Typically the project dir.
- remote_root:#
- type:
str|NoneType, optionalargument path:simplify_mdata/train/machine/remote_rootThe dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.
- clean_asynchronously:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/train/machine/clean_asynchronouslyClean the remote directory asynchronously after the job finishes.
- retry_count:#
- type:
int, optional, default:3argument path:simplify_mdata/train/machine/retry_countNumber of retries to resubmit failed jobs.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str(flag key)argument path:simplify_mdata/train/machine/context_typepossible choices:LazyLocalContext,LocalContext,SSHContext,OpenAPIContext,BohriumContext,HDFSContextThe connection used to remote machine. Option: LazyLocalContext, OpenAPIContext, LocalContext, BohriumContext, HDFSContext, SSHContext
When context_type is set to
LazyLocalContext(or its aliaseslazylocalcontext,LazyLocal,lazylocal):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/train/machine[LazyLocalContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
LocalContext(or its aliaseslocalcontext,Local,local):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/train/machine[LocalContext]/remote_profileThe information used to maintain the local machine.
- symlink:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/train/machine[LocalContext]/remote_profile/symlinkWhether to use symbolic links to replace copy. This option should be turned off if the local directory is not accessible on the Batch system.
When context_type is set to
SSHContext(or its aliasessshcontext,SSH,ssh):- remote_profile:#
- type:
dictargument path:simplify_mdata/train/machine[SSHContext]/remote_profileThe information used to maintain the connection with remote machine.
- hostname:#
- type:
strargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/hostnamehostname or ip of ssh connection.
- username:#
- type:
strargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/usernameusername of target linux system
- password:#
- type:
str, optionalargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/password(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int, optional, default:22argument path:simplify_mdata/train/machine[SSHContext]/remote_profile/portssh connection port.
- key_filename:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/key_filenamekey filename used by ssh connection. If left None, find key in ~/.ssh or use password for login
- passphrase:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/passphrasepassphrase of key used by ssh connection
- timeout:#
- type:
int, optional, default:10argument path:simplify_mdata/train/machine[SSHContext]/remote_profile/timeouttimeout of ssh connection
- totp_secret:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/totp_secretTime-based one time password secret. It should be a base32-encoded string extracted from the 2D code.
- tar_compress:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/tar_compressThe archive will be compressed in upload and download if it is True. If not, compression will be skipped.
- look_for_keys:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/look_for_keysenable searching for discoverable private key files in ~/.ssh/
- execute_command:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/execute_commandexecute command after ssh connection is established.
- proxy_command:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/machine[SSHContext]/remote_profile/proxy_commandProxyCommand to use for SSH connection through intermediate servers.
When context_type is set to
OpenAPIContext(or its aliasesopenapicontext,OpenAPI,openapi):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/train/machine[OpenAPIContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
BohriumContext(or its aliasesbohriumcontext,Bohrium,bohrium,DpCloudServerContext,dpcloudservercontext,DpCloudServer,dpcloudserver,LebesgueContext,lebesguecontext,Lebesgue,lebesgue):- remote_profile:#
- type:
dictargument path:simplify_mdata/train/machine[BohriumContext]/remote_profileThe information used to maintain the connection with remote machine.
- email:#
- type:
str, optionalargument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/emailEmail
- password:#
- type:
str, optionalargument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/passwordPassword
- program_id:#
- type:
int, alias: project_idargument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/program_idProgram ID
- retry_count:#
- type:
NoneType|int, optional, default:2argument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/retry_countThe retry count when a job is terminated
- ignore_exit_code:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/ignore_exit_code- The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,
the job state will be designated as terminated.
- keep_backup:#
- type:
bool, optionalargument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/keep_backupkeep download and upload zip
- input_data:#
- type:
dictargument path:simplify_mdata/train/machine[BohriumContext]/remote_profile/input_dataConfiguration of job
When context_type is set to
HDFSContext(or its aliaseshdfscontext,HDFS,hdfs):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/train/machine[HDFSContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
- resources:#
- type:
dictargument path:simplify_mdata/train/resources- number_node:#
- type:
int, optional, default:1argument path:simplify_mdata/train/resources/number_nodeThe number of nodes required for each job.
- cpu_per_node:#
- type:
int, optional, default:1argument path:simplify_mdata/train/resources/cpu_per_nodeCPU numbers of each node assigned to each job.
- gpu_per_node:#
- type:
int, optional, default:0argument path:simplify_mdata/train/resources/gpu_per_nodeGPU numbers of each node assigned to each job.
- queue_name:#
- type:
str, optional, default: (empty string)argument path:simplify_mdata/train/resources/queue_nameThe queue name of batch job scheduler system.
- group_size:#
- type:
intargument path:simplify_mdata/train/resources/group_sizeThe number of tasks in a job. 0 means infinity.
- custom_flags:#
- type:
typing.List[str], optionalargument path:simplify_mdata/train/resources/custom_flagsThe extra lines pass to job submitting script header
- strategy:#
- type:
dict, optionalargument path:simplify_mdata/train/resources/strategystrategies we use to generation job submitting scripts.
- if_cuda_multi_devices:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/train/resources/strategy/if_cuda_multi_devicesIf there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.
- ratio_unfinished:#
- type:
float, optional, default:0.0argument path:simplify_mdata/train/resources/strategy/ratio_unfinishedThe ratio of tasks that can be unfinished.
- customized_script_header_template_file:#
- type:
str, optionalargument path:simplify_mdata/train/resources/strategy/customized_script_header_template_fileThe customized template file to generate job submitting script header, which overrides the default file.
- para_deg:#
- type:
int, optional, default:1argument path:simplify_mdata/train/resources/para_degDecide how many tasks will be run in parallel.
- source_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/train/resources/source_listThe env file to be sourced before the command execution.
- module_purge:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/train/resources/module_purgeRemove all modules on HPC system before module load (module_list)
- module_unload_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/train/resources/module_unload_listThe modules to be unloaded on HPC system before submitting jobs
- module_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/train/resources/module_listThe modules to be loaded on HPC system before submitting jobs
- envs:#
- type:
dict, optional, default:{}argument path:simplify_mdata/train/resources/envsThe environment variables to be exported on before submitting jobs
- prepend_script:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/train/resources/prepend_scriptOptional script run before jobs submitted.
- append_script:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/train/resources/append_scriptOptional script run after jobs submitted.
- wait_time:#
- type:
float|int, optional, default:0argument path:simplify_mdata/train/resources/wait_timeThe waitting time in second after a single task submitted
Depending on the value of batch_type, different sub args are accepted.
- batch_type:#
- type:
str(flag key)argument path:simplify_mdata/train/resources/batch_typepossible choices:Shell,Torque,DistributedShell,PBS,Fugaku,OpenAPI,Bohrium,SlurmJobArray,Slurm,SGE,JH_UniScheduler,LSFThe batch job system type loaded from machine/batch_type.
When batch_type is set to
Shell(or its aliasshell):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[Shell]/kwargsThis field is empty for this batch.
When batch_type is set to
Torque(or its aliastorque):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[Torque]/kwargsThis field is empty for this batch.
When batch_type is set to
DistributedShell(or its aliasdistributedshell):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[DistributedShell]/kwargsThis field is empty for this batch.
When batch_type is set to
PBS(or its aliaspbs):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[PBS]/kwargsThis field is empty for this batch.
When batch_type is set to
Fugaku(or its aliasfugaku):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[Fugaku]/kwargsThis field is empty for this batch.
When batch_type is set to
OpenAPI(or its aliasopenapi):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[OpenAPI]/kwargsThis field is empty for this batch.
When batch_type is set to
Bohrium(or its aliasesbohrium,Lebesgue,lebesgue,DpCloudServer,dpcloudserver):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[Bohrium]/kwargsThis field is empty for this batch.
When batch_type is set to
SlurmJobArray(or its aliasslurmjobarray):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[SlurmJobArray]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/resources[SlurmJobArray]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #SBATCH
- slurm_job_size:#
- type:
int, optional, default:1argument path:simplify_mdata/train/resources[SlurmJobArray]/kwargs/slurm_job_sizeNumber of tasks in a Slurm job
When batch_type is set to
Slurm(or its aliasslurm):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/train/resources[Slurm]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/resources[Slurm]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #SBATCH
When batch_type is set to
SGE(or its aliassge):- kwargs:#
- type:
dictargument path:simplify_mdata/train/resources[SGE]/kwargsExtra arguments.
- pe_name:#
- type:
str, optional, default:mpi, alias: sge_pe_nameargument path:simplify_mdata/train/resources[SGE]/kwargs/pe_nameThe parallel environment name of SGE system.
- job_name:#
- type:
str, optional, default:wDPjobargument path:simplify_mdata/train/resources[SGE]/kwargs/job_nameThe name of SGE’s job.
When batch_type is set to
JH_UniScheduler(or its aliasjh_unischeduler):- kwargs:#
- type:
dictargument path:simplify_mdata/train/resources[JH_UniScheduler]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/resources[JH_UniScheduler]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #JSUB
When batch_type is set to
LSF(or its aliaslsf):- kwargs:#
- type:
dictargument path:simplify_mdata/train/resources[LSF]/kwargsExtra arguments.
- gpu_usage:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/train/resources[LSF]/kwargs/gpu_usageChoosing if GPU is used in the calculation step.
- gpu_new_syntax:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/train/resources[LSF]/kwargs/gpu_new_syntaxFor LFS >= 10.1.0.3, new option -gpu for #BSUB could be used. If False, and old syntax would be used.
- gpu_exclusive:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/train/resources[LSF]/kwargs/gpu_exclusiveOnly take effect when new syntax enabled. Control whether submit tasks in exclusive way for GPU.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/train/resources[LSF]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #BSUB
- user_forward_files:#
- type:
list, optionalargument path:simplify_mdata/train/user_forward_filesFiles to be forwarded to the remote machine.
- user_backward_files:#
- type:
list, optionalargument path:simplify_mdata/train/user_backward_filesFiles to be backwarded from the remote machine.
- model_devi:#
- type:
dictargument path:simplify_mdata/model_deviParameters of command, machine, and resources for model_devi
- command:#
- type:
strargument path:simplify_mdata/model_devi/commandCommand of a program.
- machine:#
- type:
dictargument path:simplify_mdata/model_devi/machine- batch_type:#
- type:
strargument path:simplify_mdata/model_devi/machine/batch_typeThe batch job system type. Option: LSF, PBS, Slurm, Fugaku, SGE, Torque, DistributedShell, Bohrium, Shell, JH_UniScheduler, OpenAPI, SlurmJobArray
- local_root:#
- type:
str|NoneTypeargument path:simplify_mdata/model_devi/machine/local_rootThe dir where the tasks and relating files locate. Typically the project dir.
- remote_root:#
- type:
str|NoneType, optionalargument path:simplify_mdata/model_devi/machine/remote_rootThe dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.
- clean_asynchronously:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/model_devi/machine/clean_asynchronouslyClean the remote directory asynchronously after the job finishes.
- retry_count:#
- type:
int, optional, default:3argument path:simplify_mdata/model_devi/machine/retry_countNumber of retries to resubmit failed jobs.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str(flag key)argument path:simplify_mdata/model_devi/machine/context_typepossible choices:LazyLocalContext,LocalContext,SSHContext,OpenAPIContext,BohriumContext,HDFSContextThe connection used to remote machine. Option: LazyLocalContext, OpenAPIContext, LocalContext, BohriumContext, HDFSContext, SSHContext
When context_type is set to
LazyLocalContext(or its aliaseslazylocalcontext,LazyLocal,lazylocal):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/machine[LazyLocalContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
LocalContext(or its aliaseslocalcontext,Local,local):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/machine[LocalContext]/remote_profileThe information used to maintain the local machine.
- symlink:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/model_devi/machine[LocalContext]/remote_profile/symlinkWhether to use symbolic links to replace copy. This option should be turned off if the local directory is not accessible on the Batch system.
When context_type is set to
SSHContext(or its aliasessshcontext,SSH,ssh):- remote_profile:#
- type:
dictargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profileThe information used to maintain the connection with remote machine.
- hostname:#
- type:
strargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/hostnamehostname or ip of ssh connection.
- username:#
- type:
strargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/usernameusername of target linux system
- password:#
- type:
str, optionalargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/password(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int, optional, default:22argument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/portssh connection port.
- key_filename:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/key_filenamekey filename used by ssh connection. If left None, find key in ~/.ssh or use password for login
- passphrase:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/passphrasepassphrase of key used by ssh connection
- timeout:#
- type:
int, optional, default:10argument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/timeouttimeout of ssh connection
- totp_secret:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/totp_secretTime-based one time password secret. It should be a base32-encoded string extracted from the 2D code.
- tar_compress:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/tar_compressThe archive will be compressed in upload and download if it is True. If not, compression will be skipped.
- look_for_keys:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/look_for_keysenable searching for discoverable private key files in ~/.ssh/
- execute_command:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/execute_commandexecute command after ssh connection is established.
- proxy_command:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/machine[SSHContext]/remote_profile/proxy_commandProxyCommand to use for SSH connection through intermediate servers.
When context_type is set to
OpenAPIContext(or its aliasesopenapicontext,OpenAPI,openapi):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/machine[OpenAPIContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
BohriumContext(or its aliasesbohriumcontext,Bohrium,bohrium,DpCloudServerContext,dpcloudservercontext,DpCloudServer,dpcloudserver,LebesgueContext,lebesguecontext,Lebesgue,lebesgue):- remote_profile:#
- type:
dictargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profileThe information used to maintain the connection with remote machine.
- email:#
- type:
str, optionalargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/emailEmail
- password:#
- type:
str, optionalargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/passwordPassword
- program_id:#
- type:
int, alias: project_idargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/program_idProgram ID
- retry_count:#
- type:
NoneType|int, optional, default:2argument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/retry_countThe retry count when a job is terminated
- ignore_exit_code:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/ignore_exit_code- The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,
the job state will be designated as terminated.
- keep_backup:#
- type:
bool, optionalargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/keep_backupkeep download and upload zip
- input_data:#
- type:
dictargument path:simplify_mdata/model_devi/machine[BohriumContext]/remote_profile/input_dataConfiguration of job
When context_type is set to
HDFSContext(or its aliaseshdfscontext,HDFS,hdfs):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/machine[HDFSContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
- resources:#
- type:
dictargument path:simplify_mdata/model_devi/resources- number_node:#
- type:
int, optional, default:1argument path:simplify_mdata/model_devi/resources/number_nodeThe number of nodes required for each job.
- cpu_per_node:#
- type:
int, optional, default:1argument path:simplify_mdata/model_devi/resources/cpu_per_nodeCPU numbers of each node assigned to each job.
- gpu_per_node:#
- type:
int, optional, default:0argument path:simplify_mdata/model_devi/resources/gpu_per_nodeGPU numbers of each node assigned to each job.
- queue_name:#
- type:
str, optional, default: (empty string)argument path:simplify_mdata/model_devi/resources/queue_nameThe queue name of batch job scheduler system.
- group_size:#
- type:
intargument path:simplify_mdata/model_devi/resources/group_sizeThe number of tasks in a job. 0 means infinity.
- custom_flags:#
- type:
typing.List[str], optionalargument path:simplify_mdata/model_devi/resources/custom_flagsThe extra lines pass to job submitting script header
- strategy:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources/strategystrategies we use to generation job submitting scripts.
- if_cuda_multi_devices:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/model_devi/resources/strategy/if_cuda_multi_devicesIf there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.
- ratio_unfinished:#
- type:
float, optional, default:0.0argument path:simplify_mdata/model_devi/resources/strategy/ratio_unfinishedThe ratio of tasks that can be unfinished.
- customized_script_header_template_file:#
- type:
str, optionalargument path:simplify_mdata/model_devi/resources/strategy/customized_script_header_template_fileThe customized template file to generate job submitting script header, which overrides the default file.
- para_deg:#
- type:
int, optional, default:1argument path:simplify_mdata/model_devi/resources/para_degDecide how many tasks will be run in parallel.
- source_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/model_devi/resources/source_listThe env file to be sourced before the command execution.
- module_purge:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/model_devi/resources/module_purgeRemove all modules on HPC system before module load (module_list)
- module_unload_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/model_devi/resources/module_unload_listThe modules to be unloaded on HPC system before submitting jobs
- module_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/model_devi/resources/module_listThe modules to be loaded on HPC system before submitting jobs
- envs:#
- type:
dict, optional, default:{}argument path:simplify_mdata/model_devi/resources/envsThe environment variables to be exported on before submitting jobs
- prepend_script:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/model_devi/resources/prepend_scriptOptional script run before jobs submitted.
- append_script:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/model_devi/resources/append_scriptOptional script run after jobs submitted.
- wait_time:#
- type:
float|int, optional, default:0argument path:simplify_mdata/model_devi/resources/wait_timeThe waitting time in second after a single task submitted
Depending on the value of batch_type, different sub args are accepted.
- batch_type:#
- type:
str(flag key)argument path:simplify_mdata/model_devi/resources/batch_typepossible choices:Shell,Torque,DistributedShell,PBS,Fugaku,OpenAPI,Bohrium,SlurmJobArray,Slurm,SGE,JH_UniScheduler,LSFThe batch job system type loaded from machine/batch_type.
When batch_type is set to
Shell(or its aliasshell):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[Shell]/kwargsThis field is empty for this batch.
When batch_type is set to
Torque(or its aliastorque):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[Torque]/kwargsThis field is empty for this batch.
When batch_type is set to
DistributedShell(or its aliasdistributedshell):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[DistributedShell]/kwargsThis field is empty for this batch.
When batch_type is set to
PBS(or its aliaspbs):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[PBS]/kwargsThis field is empty for this batch.
When batch_type is set to
Fugaku(or its aliasfugaku):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[Fugaku]/kwargsThis field is empty for this batch.
When batch_type is set to
OpenAPI(or its aliasopenapi):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[OpenAPI]/kwargsThis field is empty for this batch.
When batch_type is set to
Bohrium(or its aliasesbohrium,Lebesgue,lebesgue,DpCloudServer,dpcloudserver):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[Bohrium]/kwargsThis field is empty for this batch.
When batch_type is set to
SlurmJobArray(or its aliasslurmjobarray):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[SlurmJobArray]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/resources[SlurmJobArray]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #SBATCH
- slurm_job_size:#
- type:
int, optional, default:1argument path:simplify_mdata/model_devi/resources[SlurmJobArray]/kwargs/slurm_job_sizeNumber of tasks in a Slurm job
When batch_type is set to
Slurm(or its aliasslurm):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/model_devi/resources[Slurm]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/resources[Slurm]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #SBATCH
When batch_type is set to
SGE(or its aliassge):- kwargs:#
- type:
dictargument path:simplify_mdata/model_devi/resources[SGE]/kwargsExtra arguments.
- pe_name:#
- type:
str, optional, default:mpi, alias: sge_pe_nameargument path:simplify_mdata/model_devi/resources[SGE]/kwargs/pe_nameThe parallel environment name of SGE system.
- job_name:#
- type:
str, optional, default:wDPjobargument path:simplify_mdata/model_devi/resources[SGE]/kwargs/job_nameThe name of SGE’s job.
When batch_type is set to
JH_UniScheduler(or its aliasjh_unischeduler):- kwargs:#
- type:
dictargument path:simplify_mdata/model_devi/resources[JH_UniScheduler]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/resources[JH_UniScheduler]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #JSUB
When batch_type is set to
LSF(or its aliaslsf):- kwargs:#
- type:
dictargument path:simplify_mdata/model_devi/resources[LSF]/kwargsExtra arguments.
- gpu_usage:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/model_devi/resources[LSF]/kwargs/gpu_usageChoosing if GPU is used in the calculation step.
- gpu_new_syntax:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/model_devi/resources[LSF]/kwargs/gpu_new_syntaxFor LFS >= 10.1.0.3, new option -gpu for #BSUB could be used. If False, and old syntax would be used.
- gpu_exclusive:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/model_devi/resources[LSF]/kwargs/gpu_exclusiveOnly take effect when new syntax enabled. Control whether submit tasks in exclusive way for GPU.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/model_devi/resources[LSF]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #BSUB
- user_forward_files:#
- type:
list, optionalargument path:simplify_mdata/model_devi/user_forward_filesFiles to be forwarded to the remote machine.
- user_backward_files:#
- type:
list, optionalargument path:simplify_mdata/model_devi/user_backward_filesFiles to be backwarded from the remote machine.
- fp:#
- type:
dictargument path:simplify_mdata/fpParameters of command, machine, and resources for fp
- command:#
- type:
strargument path:simplify_mdata/fp/commandCommand of a program.
- machine:#
- type:
dictargument path:simplify_mdata/fp/machine- batch_type:#
- type:
strargument path:simplify_mdata/fp/machine/batch_typeThe batch job system type. Option: LSF, PBS, Slurm, Fugaku, SGE, Torque, DistributedShell, Bohrium, Shell, JH_UniScheduler, OpenAPI, SlurmJobArray
- local_root:#
- type:
str|NoneTypeargument path:simplify_mdata/fp/machine/local_rootThe dir where the tasks and relating files locate. Typically the project dir.
- remote_root:#
- type:
str|NoneType, optionalargument path:simplify_mdata/fp/machine/remote_rootThe dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.
- clean_asynchronously:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/fp/machine/clean_asynchronouslyClean the remote directory asynchronously after the job finishes.
- retry_count:#
- type:
int, optional, default:3argument path:simplify_mdata/fp/machine/retry_countNumber of retries to resubmit failed jobs.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str(flag key)argument path:simplify_mdata/fp/machine/context_typepossible choices:LazyLocalContext,LocalContext,SSHContext,OpenAPIContext,BohriumContext,HDFSContextThe connection used to remote machine. Option: LazyLocalContext, OpenAPIContext, LocalContext, BohriumContext, HDFSContext, SSHContext
When context_type is set to
LazyLocalContext(or its aliaseslazylocalcontext,LazyLocal,lazylocal):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/fp/machine[LazyLocalContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
LocalContext(or its aliaseslocalcontext,Local,local):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/fp/machine[LocalContext]/remote_profileThe information used to maintain the local machine.
- symlink:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/fp/machine[LocalContext]/remote_profile/symlinkWhether to use symbolic links to replace copy. This option should be turned off if the local directory is not accessible on the Batch system.
When context_type is set to
SSHContext(or its aliasessshcontext,SSH,ssh):- remote_profile:#
- type:
dictargument path:simplify_mdata/fp/machine[SSHContext]/remote_profileThe information used to maintain the connection with remote machine.
- hostname:#
- type:
strargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/hostnamehostname or ip of ssh connection.
- username:#
- type:
strargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/usernameusername of target linux system
- password:#
- type:
str, optionalargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/password(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int, optional, default:22argument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/portssh connection port.
- key_filename:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/key_filenamekey filename used by ssh connection. If left None, find key in ~/.ssh or use password for login
- passphrase:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/passphrasepassphrase of key used by ssh connection
- timeout:#
- type:
int, optional, default:10argument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/timeouttimeout of ssh connection
- totp_secret:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/totp_secretTime-based one time password secret. It should be a base32-encoded string extracted from the 2D code.
- tar_compress:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/tar_compressThe archive will be compressed in upload and download if it is True. If not, compression will be skipped.
- look_for_keys:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/look_for_keysenable searching for discoverable private key files in ~/.ssh/
- execute_command:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/execute_commandexecute command after ssh connection is established.
- proxy_command:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/machine[SSHContext]/remote_profile/proxy_commandProxyCommand to use for SSH connection through intermediate servers.
When context_type is set to
OpenAPIContext(or its aliasesopenapicontext,OpenAPI,openapi):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/fp/machine[OpenAPIContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When context_type is set to
BohriumContext(or its aliasesbohriumcontext,Bohrium,bohrium,DpCloudServerContext,dpcloudservercontext,DpCloudServer,dpcloudserver,LebesgueContext,lebesguecontext,Lebesgue,lebesgue):- remote_profile:#
- type:
dictargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profileThe information used to maintain the connection with remote machine.
- email:#
- type:
str, optionalargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/emailEmail
- password:#
- type:
str, optionalargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/passwordPassword
- program_id:#
- type:
int, alias: project_idargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/program_idProgram ID
- retry_count:#
- type:
NoneType|int, optional, default:2argument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/retry_countThe retry count when a job is terminated
- ignore_exit_code:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/ignore_exit_code- The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,
the job state will be designated as terminated.
- keep_backup:#
- type:
bool, optionalargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/keep_backupkeep download and upload zip
- input_data:#
- type:
dictargument path:simplify_mdata/fp/machine[BohriumContext]/remote_profile/input_dataConfiguration of job
When context_type is set to
HDFSContext(or its aliaseshdfscontext,HDFS,hdfs):- remote_profile:#
- type:
dict, optionalargument path:simplify_mdata/fp/machine[HDFSContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
- resources:#
- type:
dictargument path:simplify_mdata/fp/resources- number_node:#
- type:
int, optional, default:1argument path:simplify_mdata/fp/resources/number_nodeThe number of nodes required for each job.
- cpu_per_node:#
- type:
int, optional, default:1argument path:simplify_mdata/fp/resources/cpu_per_nodeCPU numbers of each node assigned to each job.
- gpu_per_node:#
- type:
int, optional, default:0argument path:simplify_mdata/fp/resources/gpu_per_nodeGPU numbers of each node assigned to each job.
- queue_name:#
- type:
str, optional, default: (empty string)argument path:simplify_mdata/fp/resources/queue_nameThe queue name of batch job scheduler system.
- group_size:#
- type:
intargument path:simplify_mdata/fp/resources/group_sizeThe number of tasks in a job. 0 means infinity.
- custom_flags:#
- type:
typing.List[str], optionalargument path:simplify_mdata/fp/resources/custom_flagsThe extra lines pass to job submitting script header
- strategy:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources/strategystrategies we use to generation job submitting scripts.
- if_cuda_multi_devices:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/fp/resources/strategy/if_cuda_multi_devicesIf there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.
- ratio_unfinished:#
- type:
float, optional, default:0.0argument path:simplify_mdata/fp/resources/strategy/ratio_unfinishedThe ratio of tasks that can be unfinished.
- customized_script_header_template_file:#
- type:
str, optionalargument path:simplify_mdata/fp/resources/strategy/customized_script_header_template_fileThe customized template file to generate job submitting script header, which overrides the default file.
- para_deg:#
- type:
int, optional, default:1argument path:simplify_mdata/fp/resources/para_degDecide how many tasks will be run in parallel.
- source_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/fp/resources/source_listThe env file to be sourced before the command execution.
- module_purge:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/fp/resources/module_purgeRemove all modules on HPC system before module load (module_list)
- module_unload_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/fp/resources/module_unload_listThe modules to be unloaded on HPC system before submitting jobs
- module_list:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/fp/resources/module_listThe modules to be loaded on HPC system before submitting jobs
- envs:#
- type:
dict, optional, default:{}argument path:simplify_mdata/fp/resources/envsThe environment variables to be exported on before submitting jobs
- prepend_script:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/fp/resources/prepend_scriptOptional script run before jobs submitted.
- append_script:#
- type:
typing.List[str], optional, default:[]argument path:simplify_mdata/fp/resources/append_scriptOptional script run after jobs submitted.
- wait_time:#
- type:
float|int, optional, default:0argument path:simplify_mdata/fp/resources/wait_timeThe waitting time in second after a single task submitted
Depending on the value of batch_type, different sub args are accepted.
- batch_type:#
- type:
str(flag key)argument path:simplify_mdata/fp/resources/batch_typepossible choices:Shell,Torque,DistributedShell,PBS,Fugaku,OpenAPI,Bohrium,SlurmJobArray,Slurm,SGE,JH_UniScheduler,LSFThe batch job system type loaded from machine/batch_type.
When batch_type is set to
Shell(or its aliasshell):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[Shell]/kwargsThis field is empty for this batch.
When batch_type is set to
Torque(or its aliastorque):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[Torque]/kwargsThis field is empty for this batch.
When batch_type is set to
DistributedShell(or its aliasdistributedshell):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[DistributedShell]/kwargsThis field is empty for this batch.
When batch_type is set to
PBS(or its aliaspbs):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[PBS]/kwargsThis field is empty for this batch.
When batch_type is set to
Fugaku(or its aliasfugaku):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[Fugaku]/kwargsThis field is empty for this batch.
When batch_type is set to
OpenAPI(or its aliasopenapi):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[OpenAPI]/kwargsThis field is empty for this batch.
When batch_type is set to
Bohrium(or its aliasesbohrium,Lebesgue,lebesgue,DpCloudServer,dpcloudserver):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[Bohrium]/kwargsThis field is empty for this batch.
When batch_type is set to
SlurmJobArray(or its aliasslurmjobarray):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[SlurmJobArray]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/resources[SlurmJobArray]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #SBATCH
- slurm_job_size:#
- type:
int, optional, default:1argument path:simplify_mdata/fp/resources[SlurmJobArray]/kwargs/slurm_job_sizeNumber of tasks in a Slurm job
When batch_type is set to
Slurm(or its aliasslurm):- kwargs:#
- type:
dict, optionalargument path:simplify_mdata/fp/resources[Slurm]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/resources[Slurm]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #SBATCH
When batch_type is set to
SGE(or its aliassge):- kwargs:#
- type:
dictargument path:simplify_mdata/fp/resources[SGE]/kwargsExtra arguments.
- pe_name:#
- type:
str, optional, default:mpi, alias: sge_pe_nameargument path:simplify_mdata/fp/resources[SGE]/kwargs/pe_nameThe parallel environment name of SGE system.
- job_name:#
- type:
str, optional, default:wDPjobargument path:simplify_mdata/fp/resources[SGE]/kwargs/job_nameThe name of SGE’s job.
When batch_type is set to
JH_UniScheduler(or its aliasjh_unischeduler):- kwargs:#
- type:
dictargument path:simplify_mdata/fp/resources[JH_UniScheduler]/kwargsExtra arguments.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/resources[JH_UniScheduler]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #JSUB
When batch_type is set to
LSF(or its aliaslsf):- kwargs:#
- type:
dictargument path:simplify_mdata/fp/resources[LSF]/kwargsExtra arguments.
- gpu_usage:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/fp/resources[LSF]/kwargs/gpu_usageChoosing if GPU is used in the calculation step.
- gpu_new_syntax:#
- type:
bool, optional, default:Falseargument path:simplify_mdata/fp/resources[LSF]/kwargs/gpu_new_syntaxFor LFS >= 10.1.0.3, new option -gpu for #BSUB could be used. If False, and old syntax would be used.
- gpu_exclusive:#
- type:
bool, optional, default:Trueargument path:simplify_mdata/fp/resources[LSF]/kwargs/gpu_exclusiveOnly take effect when new syntax enabled. Control whether submit tasks in exclusive way for GPU.
- custom_gpu_line:#
- type:
str|NoneType, optional, default:Noneargument path:simplify_mdata/fp/resources[LSF]/kwargs/custom_gpu_lineCustom GPU configuration, starting with #BSUB
- user_forward_files:#
- type:
list, optionalargument path:simplify_mdata/fp/user_forward_filesFiles to be forwarded to the remote machine.
- user_backward_files:#
- type:
list, optionalargument path:simplify_mdata/fp/user_backward_filesFiles to be backwarded from the remote machine.