Submit from JSON file

Contents

Submit from JSON file#

DPDispatcher can submit a submission from a JSON file:

dpdisp submit submission.json

The JSON file must contain the submission configuration. An example of the JSON file is shown below.

 1{
 2    "work_base": "0_md/",
 3    "machine": {
 4        "batch_type": "Shell",
 5        "local_root": "./",
 6        "context_type": "LazyLocalContext"
 7    },
 8    "resources": {
 9        "number_node": 1,
10        "cpu_per_node": 1,
11        "gpu_per_node": 0,
12        "queue_name": "",
13        "group_size": 1
14    },
15    "forward_common_files": [],
16    "backward_common_files": [],
17    "task_list": [
18        {
19            "command": "echo hello",
20            "task_work_path": "task1/",
21            "forward_files": [],
22            "backward_files": [],
23            "outlog": "log",
24            "errlog": "err"
25        }
26    ]
27}

The JSON entries for submission are defined as follows:

submission:#
type: dict
argument path: submission

Submission configuration

work_base:#
type: str
argument path: submission/work_base

Base directory for the work

forward_common_files:#
type: typing.List[str], optional, default: []
argument path: submission/forward_common_files

Common files to forward to the remote machine

backward_common_files:#
type: typing.List[str], optional, default: []
argument path: submission/backward_common_files

Common files to backward from the remote machine

machine:#
type: dict
argument path: submission/machine

Machine configuration. See related documentation for details.

batch_type:#
type: str
argument path: submission/machine/batch_type

The batch job system type. Option: Torque, PBS, OpenAPI, DistributedShell, LSF, Slurm, SGE, Fugaku, SlurmJobArray, Bohrium, Shell, JH_UniScheduler

local_root:#
type: str | NoneType
argument path: submission/machine/local_root

The dir where the tasks and relating files locate. Typically the project dir.

remote_root:#
type: str | NoneType, optional
argument path: submission/machine/remote_root

The dir where the tasks are executed on the remote machine. Only needed when context is not lazy-local.

clean_asynchronously:#
type: bool, optional, default: False
argument path: submission/machine/clean_asynchronously

Clean the remote directory asynchronously after the job finishes.

retry_count:#
type: int, optional, default: 3
argument path: submission/machine/retry_count

Number of retries to resubmit failed jobs.

Depending on the value of context_type, different sub args are accepted.

context_type:#

The connection used to remote machine. Option: LocalContext, LazyLocalContext, SSHContext, BohriumContext, HDFSContext, OpenAPIContext

When |flag:submission/machine/context_type|_ is set to OpenAPIContext (or its aliases openapicontext, OpenAPI, openapi):

remote_profile:#
type: dict, optional
argument path: submission/machine[OpenAPIContext]/remote_profile

The information used to maintain the connection with remote machine. This field is empty for this context.

When |flag:submission/machine/context_type|_ is set to LocalContext (or its aliases localcontext, Local, local):

remote_profile:#
type: dict, optional
argument path: submission/machine[LocalContext]/remote_profile

The information used to maintain the local machine.

When |flag:submission/machine/context_type|_ is set to HDFSContext (or its aliases hdfscontext, HDFS, hdfs):

remote_profile:#
type: dict, optional
argument path: submission/machine[HDFSContext]/remote_profile

The information used to maintain the connection with remote machine. This field is empty for this context.

When |flag:submission/machine/context_type|_ is set to SSHContext (or its aliases sshcontext, SSH, ssh):

remote_profile:#
type: dict
argument path: submission/machine[SSHContext]/remote_profile

The information used to maintain the connection with remote machine.

hostname:#
type: str
argument path: submission/machine[SSHContext]/remote_profile/hostname

hostname or ip of ssh connection.

username:#
type: str
argument path: submission/machine[SSHContext]/remote_profile/username

username of target linux system

password:#
type: str, optional
argument path: submission/machine[SSHContext]/remote_profile/password

(deprecated) password of linux system. Please use SSH keys instead to improve security.

port:#
type: int, optional, default: 22
argument path: submission/machine[SSHContext]/remote_profile/port

ssh connection port.

key_filename:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/key_filename

key filename used by ssh connection. If left None, find key in ~/.ssh or use password for login

passphrase:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/passphrase

passphrase of key used by ssh connection

timeout:#
type: int, optional, default: 10
argument path: submission/machine[SSHContext]/remote_profile/timeout

timeout of ssh connection

totp_secret:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/totp_secret

Time-based one time password secret. It should be a base32-encoded string extracted from the 2D code.

tar_compress:#
type: bool, optional, default: True
argument path: submission/machine[SSHContext]/remote_profile/tar_compress

The archive will be compressed in upload and download if it is True. If not, compression will be skipped.

look_for_keys:#
type: bool, optional, default: True
argument path: submission/machine[SSHContext]/remote_profile/look_for_keys

enable searching for discoverable private key files in ~/.ssh/

execute_command:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/execute_command

execute command after ssh connection is established.

proxy_command:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/proxy_command

ProxyCommand to use for SSH connection through intermediate servers.

When |flag:submission/machine/context_type|_ is set to LazyLocalContext (or its aliases lazylocalcontext, LazyLocal, lazylocal):

remote_profile:#
type: dict, optional
argument path: submission/machine[LazyLocalContext]/remote_profile

The information used to maintain the connection with remote machine. This field is empty for this context.

When |flag:submission/machine/context_type|_ is set to BohriumContext (or its aliases bohriumcontext, Bohrium, bohrium, DpCloudServerContext, dpcloudservercontext, DpCloudServer, dpcloudserver, LebesgueContext, lebesguecontext, Lebesgue, lebesgue):

remote_profile:#
type: dict
argument path: submission/machine[BohriumContext]/remote_profile

The information used to maintain the connection with remote machine.

email:#
type: str, optional
argument path: submission/machine[BohriumContext]/remote_profile/email

Email

password:#
type: str, optional
argument path: submission/machine[BohriumContext]/remote_profile/password

Password

program_id:#
type: int, alias: project_id
argument path: submission/machine[BohriumContext]/remote_profile/program_id

Program ID

retry_count:#
type: NoneType | int, optional, default: 2
argument path: submission/machine[BohriumContext]/remote_profile/retry_count

The retry count when a job is terminated

ignore_exit_code:#
type: bool, optional, default: True
argument path: submission/machine[BohriumContext]/remote_profile/ignore_exit_code
The job state will be marked as finished if the exit code is non-zero when set to True. Otherwise,

the job state will be designated as terminated.

keep_backup:#
type: bool, optional
argument path: submission/machine[BohriumContext]/remote_profile/keep_backup

keep download and upload zip

input_data:#
type: dict
argument path: submission/machine[BohriumContext]/remote_profile/input_data

Configuration of job

resources:#
type: dict
argument path: submission/resources

Resources configuration. See related documentation for details.

number_node:#
type: int, optional, default: 1
argument path: submission/resources/number_node

The number of nodes required for each job.

cpu_per_node:#
type: int, optional, default: 1
argument path: submission/resources/cpu_per_node

CPU numbers of each node assigned to each job.

gpu_per_node:#
type: int, optional, default: 0
argument path: submission/resources/gpu_per_node

GPU numbers of each node assigned to each job.

queue_name:#
type: str, optional, default: (empty string)
argument path: submission/resources/queue_name

The queue name of batch job scheduler system.

group_size:#
type: int
argument path: submission/resources/group_size

The number of tasks in a job. 0 means infinity.

custom_flags:#
type: typing.List[str], optional
argument path: submission/resources/custom_flags

The extra lines pass to job submitting script header

strategy:#
type: dict, optional
argument path: submission/resources/strategy

strategies we use to generation job submitting scripts.

if_cuda_multi_devices:#
type: bool, optional, default: False
argument path: submission/resources/strategy/if_cuda_multi_devices

If there are multiple nvidia GPUS on the node, and we want to assign the tasks to different GPUS.If true, dpdispatcher will manually export environment variable CUDA_VISIBLE_DEVICES to different task.Usually, this option will be used with Task.task_need_resources variable simultaneously.

ratio_unfinished:#
type: float, optional, default: 0.0
argument path: submission/resources/strategy/ratio_unfinished

The ratio of tasks that can be unfinished.

customized_script_header_template_file:#
type: str, optional
argument path: submission/resources/strategy/customized_script_header_template_file

The customized template file to generate job submitting script header, which overrides the default file.

para_deg:#
type: int, optional, default: 1
argument path: submission/resources/para_deg

Decide how many tasks will be run in parallel.

source_list:#
type: typing.List[str], optional, default: []
argument path: submission/resources/source_list

The env file to be sourced before the command execution.

module_purge:#
type: bool, optional, default: False
argument path: submission/resources/module_purge

Remove all modules on HPC system before module load (module_list)

module_unload_list:#
type: typing.List[str], optional, default: []
argument path: submission/resources/module_unload_list

The modules to be unloaded on HPC system before submitting jobs

module_list:#
type: typing.List[str], optional, default: []
argument path: submission/resources/module_list

The modules to be loaded on HPC system before submitting jobs

envs:#
type: dict, optional, default: {}
argument path: submission/resources/envs

The environment variables to be exported on before submitting jobs

prepend_script:#
type: typing.List[str], optional, default: []
argument path: submission/resources/prepend_script

Optional script run before jobs submitted.

append_script:#
type: typing.List[str], optional, default: []
argument path: submission/resources/append_script

Optional script run after jobs submitted.

wait_time:#
type: float | int, optional, default: 0
argument path: submission/resources/wait_time

The waitting time in second after a single task submitted

kwargs:#
type: dict, optional
argument path: submission/resources/kwargs

Vary by different machines.

batch_type:#
type: str, optional
argument path: submission/resources/batch_type

Allow this key when strict checking.

task_list:#
type: list
argument path: submission/task_list

List of tasks to execute.

This argument takes a list with each element containing the following:

command:#
type: str
argument path: submission/task_list/command

A command to be executed of this task. The expected return code is 0.

task_work_path:#
type: str
argument path: submission/task_list/task_work_path

The dir where the command to be executed.

forward_files:#
type: typing.List[str], optional, default: []
argument path: submission/task_list/forward_files

The files to be uploaded in task_work_path before the task exectued.

backward_files:#
type: typing.List[str], optional, default: []
argument path: submission/task_list/backward_files

The files to be download to local_root in task_work_path after the task finished

outlog:#
type: str | NoneType, optional, default: log
argument path: submission/task_list/outlog

The out log file name. redirect from stdout

errlog:#
type: str | NoneType, optional, default: err
argument path: submission/task_list/errlog

The err log file name. redirect from stderr

Options#

  • --dry-run: Only upload files without submitting.

  • --exit-on-submit: Exit after submitting without waiting for completion.

  • --allow-ref: Allow loading external JSON/YAML snippets through $ref (disabled by default for security).