Submit from JSON file

Contents

Submit from JSON file#

DPDispatcher can submit a submission from a JSON file:

dpdisp submit submission.json

The JSON file must contain the submission configuration. An example of the JSON file is shown below.

 1{
 2    "work_base": "0_md/",
 3    "machine": {
 4        "batch_type": "Shell",
 5        "local_root": "./",
 6        "context_type": "LazyLocalContext"
 7    },
 8    "resources": {
 9        "number_node": 1,
10        "cpu_per_node": 1,
11        "gpu_per_node": 0,
12        "queue_name": "",
13        "group_size": 1
14    },
15    "forward_common_files": [],
16    "backward_common_files": [],
17    "task_list": [
18        {
19            "command": "echo hello",
20            "task_work_path": "task1/",
21            "forward_files": [],
22            "backward_files": [],
23            "outlog": "log",
24            "errlog": "err"
25        }
26    ]
27}

The JSON entries for submission are defined as follows:

submission:#
type: dict
argument path: submission

Submission configuration

work_base:#
type: str
argument path: submission/work_base

Base directory for the work, relative to machine.local_root. This must be a relative path; if an absolute path is provided it will not be combined with machine.local_root.

forward_common_files:#
type: typing.List[str], optional, default: []
argument path: submission/forward_common_files

Files shared by all tasks and uploaded from work_base before execution.

backward_common_files:#
type: typing.List[str], optional, default: []
argument path: submission/backward_common_files

Files shared by all tasks and downloaded back to work_base after execution.

machine:#
type: dict
argument path: submission/machine

Machine configuration. See related documentation for details.

batch_type:#
type: str
argument path: submission/machine/batch_type

Batch backend used to execute jobs. Option: Shell, PBS, DistributedShell, JH_UniScheduler, SlurmJobArray, LSF, OpenAPI, SGE, Torque, Slurm, Bohrium, Fugaku

local_root:#
type: str | NoneType
argument path: submission/machine/local_root

Local project root used by DPDispatcher to find task directories and local files. If submission.work_base is a relative path, it is resolved inside this directory; if submission.work_base is absolute, it is used as-is and local_root is ignored.

remote_root:#
type: str | NoneType, optional
argument path: submission/machine/remote_root

Remote root directory used by non-local contexts such as SSH. DPDispatcher creates and uses a submission-specific working directory beneath this root on the remote side. For SSHContext, this path should be absolute.

clean_asynchronously:#
type: bool, optional, default: False
argument path: submission/machine/clean_asynchronously

Clean the remote working directory asynchronously after the job finishes. Avoid enabling this while debugging, because it can remove remote artifacts before you inspect them.

retry_count:#
type: int, optional, default: 3
argument path: submission/machine/retry_count

How many times DPDispatcher will retry a failed job before raising an error.

Depending on the value of context_type, different sub args are accepted.

context_type:#

Execution context / connection type used to reach the execution environment. Option: LazyLocalContext, SSHContext, BohriumContext, OpenAPIContext, LocalContext, HDFSContext

When |flag:submission/machine/context_type|_ is set to BohriumContext (or its aliases bohriumcontext, Bohrium, bohrium, DpCloudServerContext, dpcloudservercontext, DpCloudServer, dpcloudserver, LebesgueContext, lebesguecontext, Lebesgue, lebesgue):

remote_profile:#
type: dict
argument path: submission/machine[BohriumContext]/remote_profile

Configuration for Bohrium submission, including login credentials, project selection, and job-handling behavior.

email:#
type: str, optional
argument path: submission/machine[BohriumContext]/remote_profile/email

Email address used to log in to Bohrium.

password:#
type: str, optional
argument path: submission/machine[BohriumContext]/remote_profile/password

Password used together with email or phone login. If BOHR_TICKET is set, password-based login can be skipped.

phone:#
type: str, optional
argument path: submission/machine[BohriumContext]/remote_profile/phone

Phone number used to log in when email is not used.

program_id:#
type: int, alias: project_id
argument path: submission/machine[BohriumContext]/remote_profile/program_id

Program / project ID used to place uploaded jobs under the correct Bohrium project namespace.

retry_count:#
type: NoneType | int, optional, default: 2
argument path: submission/machine[BohriumContext]/remote_profile/retry_count

How many times a terminated remote job is retried on the platform side before giving up.

ignore_exit_code:#
type: bool, optional, default: True
argument path: submission/machine[BohriumContext]/remote_profile/ignore_exit_code

Whether a non-zero exit code from the remote platform is still treated as finished. If False, such jobs are marked as terminated.

keep_backup:#
type: bool, optional
argument path: submission/machine[BohriumContext]/remote_profile/keep_backup

Whether to keep uploaded/downloaded zip archives in the local backup directory after transfer.

input_data:#
type: dict
argument path: submission/machine[BohriumContext]/remote_profile/input_data

Platform-specific job configuration passed through to the Bohrium API.

When |flag:submission/machine/context_type|_ is set to LocalContext (or its aliases localcontext, Local, local):

remote_profile:#
type: dict, optional
argument path: submission/machine[LocalContext]/remote_profile

Options controlling how files are staged between local_root and remote_root when both paths are on the local filesystem.

When |flag:submission/machine/context_type|_ is set to HDFSContext (or its aliases hdfscontext, HDFS, hdfs):

remote_profile:#
type: dict, optional
argument path: submission/machine[HDFSContext]/remote_profile

The information used to maintain the connection with remote machine. This field is empty for this context.

When |flag:submission/machine/context_type|_ is set to SSHContext (or its aliases sshcontext, SSH, ssh):

remote_profile:#
type: dict
argument path: submission/machine[SSHContext]/remote_profile

SSH connection settings for the remote machine, including authentication, timeouts, and optional proxy/jump-host behavior.

hostname:#
type: str
argument path: submission/machine[SSHContext]/remote_profile/hostname

Hostname or IP address of the SSH target machine.

username:#
type: str
argument path: submission/machine[SSHContext]/remote_profile/username

Username used to log in to the target system.

password:#
type: str, optional
argument path: submission/machine[SSHContext]/remote_profile/password

(deprecated) password of linux system. Please use SSH keys instead to improve security.

port:#
type: int, optional, default: 22
argument path: submission/machine[SSHContext]/remote_profile/port

SSH port of the target machine. Usually 22.

key_filename:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/key_filename

Path to the private key file used for SSH authentication. If left None, DPDispatcher can try discoverable keys in ~/.ssh or fall back to password-based login if configured.

passphrase:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/passphrase

Passphrase for the SSH private key, if the key is encrypted.

timeout:#
type: int, optional, default: 10
argument path: submission/machine[SSHContext]/remote_profile/timeout

Timeout in seconds for establishing the SSH connection.

totp_secret:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/totp_secret

Time-based one-time-password secret used for keyboard-interactive 2FA. It should be a base32-encoded string.

tar_compress:#
type: bool, optional, default: True
argument path: submission/machine[SSHContext]/remote_profile/tar_compress

Whether upload/download tar archives are compressed. Keeping this True usually reduces transfer size at the cost of extra CPU time.

look_for_keys:#
type: bool, optional, default: True
argument path: submission/machine[SSHContext]/remote_profile/look_for_keys

Whether to search for discoverable private key files in ~/.ssh when key_filename is not provided.

execute_command:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/execute_command

Optional command executed immediately after the SSH connection is established.

proxy_command:#
type: str | NoneType, optional, default: None
argument path: submission/machine[SSHContext]/remote_profile/proxy_command

Optional SSH ProxyCommand used to reach the target through an intermediate host or tunnel.

When |flag:submission/machine/context_type|_ is set to OpenAPIContext (or its aliases openapicontext, OpenAPI, openapi):

remote_profile:#
type: dict, optional
argument path: submission/machine[OpenAPIContext]/remote_profile

The information used to maintain the connection with remote machine. This field is empty for this context.

When |flag:submission/machine/context_type|_ is set to LazyLocalContext (or its aliases lazylocalcontext, LazyLocal, lazylocal):

remote_profile:#
type: dict, optional
argument path: submission/machine[LazyLocalContext]/remote_profile

The information used to maintain the connection with remote machine. This field is empty for this context.

resources:#
type: dict
argument path: submission/resources

Resources configuration. See related documentation for details.

number_node:#
type: int, optional, default: 1
argument path: submission/resources/number_node

Number of nodes requested for each scheduler job generated by DPDispatcher.

cpu_per_node:#
type: int, optional, default: 1
argument path: submission/resources/cpu_per_node

Number of CPUs requested on each node for each scheduler job.

gpu_per_node:#
type: int, optional, default: 0
argument path: submission/resources/gpu_per_node

Number of GPUs requested on each node for each scheduler job.

queue_name:#
type: str, optional, default: (empty string)
argument path: submission/resources/queue_name

Queue or partition name used by the selected batch system. For local Shell runs this is usually an empty string; for Slurm it typically maps to a partition.

group_size:#
type: int
argument path: submission/resources/group_size

How many tasks are packed into one scheduler job. For example, 20 tasks with group_size=5 are typically split into 4 jobs. Use 1 for the simplest one-task workflow. 0 means no explicit upper limit in the grouping logic.

custom_flags:#
type: typing.List[str], optional
argument path: submission/resources/custom_flags

Extra scheduler-header lines inserted into the generated submission script, typically for backend-specific options that are not covered by the standard fields.

strategy:#
type: dict, optional
argument path: submission/resources/strategy

Strategy options that affect how DPDispatcher generates and evaluates submission scripts.

if_cuda_multi_devices:#
type: bool, optional, default: False
argument path: submission/resources/strategy/if_cuda_multi_devices

If a node has multiple NVIDIA GPUs, assign different tasks inside the same job to different GPUs by setting CUDA_VISIBLE_DEVICES automatically. Usually used together with para_deg > 1 and task-level resource awareness.

ratio_unfinished:#
type: float, optional, default: 0.0
argument path: submission/resources/strategy/ratio_unfinished

Maximum fraction of tasks allowed to remain unfinished when evaluating job completion. Use 0.0 for the strict default that requires every task to finish.

customized_script_header_template_file:#
type: str, optional
argument path: submission/resources/strategy/customized_script_header_template_file

Custom template file for the scheduler-header portion of generated submission scripts. Overrides the default template.

para_deg:#
type: int, optional, default: 1
argument path: submission/resources/para_deg

How many tasks inside one generated job are run in parallel. This is different from group_size: group_size controls how many tasks are bundled into a job, while para_deg controls concurrency within that job. Keep para_deg=1 for the safest default.

source_list:#
type: typing.List[str], optional, default: []
argument path: submission/resources/source_list

Shell scripts or environment files sourced before task commands run. Useful on HPC systems for activating software stacks explicitly instead of relying on login-shell defaults.

module_purge:#
type: bool, optional, default: False
argument path: submission/resources/module_purge

Whether to run ‘module purge’ before applying module_unload_list and module_list. Mainly useful on HPC systems.

module_unload_list:#
type: typing.List[str], optional, default: []
argument path: submission/resources/module_unload_list

Modules to unload before loading the requested modules. Mainly relevant on HPC systems with environment modules.

module_list:#
type: typing.List[str], optional, default: []
argument path: submission/resources/module_list

Modules to load before executing tasks. Mainly relevant on HPC systems with environment modules.

envs:#
type: dict, optional, default: {}
argument path: submission/resources/envs

Environment variables exported before executing tasks.

prepend_script:#
type: typing.List[str], optional, default: []
argument path: submission/resources/prepend_script

Optional shell lines inserted before task commands in the generated job script.

append_script:#
type: typing.List[str], optional, default: []
argument path: submission/resources/append_script

Optional shell lines inserted after task commands in the generated job script.

wait_time:#
type: float | int, optional, default: 0
argument path: submission/resources/wait_time

Delay in seconds inserted after a job is submitted or resubmitted. Usually keep 0 unless the scheduler/site asks you to throttle submission pace.

kwargs:#
type: dict, optional
argument path: submission/resources/kwargs

Vary by different machines.

batch_type:#
type: str, optional
argument path: submission/resources/batch_type

Allow this key when strict checking.

task_list:#
type: list
argument path: submission/task_list

List of tasks to execute.

This argument takes a list with each element containing the following:

command:#
type: str
argument path: submission/task_list/command

Shell command executed for this task. A zero exit code is treated as success. If the real application may fail before useful artifacts are synchronized, consider wrapping it and saving diagnostics to files that are listed in backward_files.

task_work_path:#
type: str
argument path: submission/task_list/task_work_path

Working directory of this task, specified as a relative path inside submission.work_base. Absolute paths are not supported and may break staging or remote execution. For the smallest local example, use ‘.’. If you use a subdirectory such as ‘task1/’, the command runs inside that subdirectory.

forward_files:#
type: typing.List[str], optional, default: []
argument path: submission/task_list/forward_files

Files to upload for this task before execution. Paths are resolved relative to this task’s task_work_path. Put per-task inputs here; files shared by all tasks belong in submission.forward_common_files.

backward_files:#
type: typing.List[str], optional, default: []
argument path: submission/task_list/backward_files

Files to download for this task after execution. Paths are collected from this task’s task_work_path on the execution side and synchronized back to the same relative task directory under the local staging root (typically machine.local_root/work_base).

outlog:#
type: str | NoneType, optional, default: log
argument path: submission/task_list/outlog

Filename used to redirect stdout inside task_work_path while the task runs. If this file is downloaded or synchronized back, it typically appears under the same relative task directory on the local side.

errlog:#
type: str | NoneType, optional, default: err
argument path: submission/task_list/errlog

Filename used to redirect stderr inside task_work_path while the task runs. If this file is downloaded or synchronized back, it typically appears under the same relative task directory on the local side.

Options#

  • --dry-run: Only upload files without submitting.

  • --exit-on-submit: Exit after submitting without waiting for completion.

  • --allow-ref: Allow loading external JSON/YAML snippets through $ref (disabled by default for security).