Submit from JSON file#
DPDispatcher can submit a submission from a JSON file:
dpdisp submit submission.json
The JSON file must contain the submission configuration. An example of the JSON file is shown below.
1{
2 "work_base": "0_md/",
3 "machine": {
4 "batch_type": "Shell",
5 "local_root": "./",
6 "context_type": "LazyLocalContext"
7 },
8 "resources": {
9 "number_node": 1,
10 "cpu_per_node": 1,
11 "gpu_per_node": 0,
12 "queue_name": "",
13 "group_size": 1
14 },
15 "forward_common_files": [],
16 "backward_common_files": [],
17 "task_list": [
18 {
19 "command": "echo hello",
20 "task_work_path": "task1/",
21 "forward_files": [],
22 "backward_files": [],
23 "outlog": "log",
24 "errlog": "err"
25 }
26 ]
27}
The JSON entries for submission are defined as follows:
- submission:#
- type:
dictargument path:submissionSubmission configuration
- work_base:#
- type:
strargument path:submission/work_baseBase directory for the work, relative to machine.local_root. This must be a relative path; if an absolute path is provided it will not be combined with machine.local_root.
- forward_common_files:#
- type:
typing.List[str], optional, default:[]argument path:submission/forward_common_filesFiles shared by all tasks and uploaded from work_base before execution.
- backward_common_files:#
- type:
typing.List[str], optional, default:[]argument path:submission/backward_common_filesFiles shared by all tasks and downloaded back to work_base after execution.
- machine:#
- type:
dictargument path:submission/machineMachine configuration. See related documentation for details.
- batch_type:#
- type:
strargument path:submission/machine/batch_typeBatch backend used to execute jobs. Option: Shell, PBS, DistributedShell, JH_UniScheduler, SlurmJobArray, LSF, OpenAPI, SGE, Torque, Slurm, Bohrium, Fugaku
- local_root:#
- type:
str|NoneTypeargument path:submission/machine/local_rootLocal project root used by DPDispatcher to find task directories and local files. If submission.work_base is a relative path, it is resolved inside this directory; if submission.work_base is absolute, it is used as-is and local_root is ignored.
- remote_root:#
- type:
str|NoneType, optionalargument path:submission/machine/remote_rootRemote root directory used by non-local contexts such as SSH. DPDispatcher creates and uses a submission-specific working directory beneath this root on the remote side. For SSHContext, this path should be absolute.
- clean_asynchronously:#
- type:
bool, optional, default:Falseargument path:submission/machine/clean_asynchronouslyClean the remote working directory asynchronously after the job finishes. Avoid enabling this while debugging, because it can remove remote artifacts before you inspect them.
- retry_count:#
- type:
int, optional, default:3argument path:submission/machine/retry_countHow many times DPDispatcher will retry a failed job before raising an error.
Depending on the value of context_type, different sub args are accepted.
- context_type:#
- type:
str(flag key)argument path:submission/machine/context_typeExecution context / connection type used to reach the execution environment. Option: LazyLocalContext, SSHContext, BohriumContext, OpenAPIContext, LocalContext, HDFSContext
When |flag:submission/machine/context_type|_ is set to
BohriumContext(or its aliasesbohriumcontext,Bohrium,bohrium,DpCloudServerContext,dpcloudservercontext,DpCloudServer,dpcloudserver,LebesgueContext,lebesguecontext,Lebesgue,lebesgue):- remote_profile:#
- type:
dictargument path:submission/machine[BohriumContext]/remote_profileConfiguration for Bohrium submission, including login credentials, project selection, and job-handling behavior.
- email:#
- type:
str, optionalargument path:submission/machine[BohriumContext]/remote_profile/emailEmail address used to log in to Bohrium.
- password:#
- type:
str, optionalargument path:submission/machine[BohriumContext]/remote_profile/passwordPassword used together with email or phone login. If BOHR_TICKET is set, password-based login can be skipped.
- phone:#
- type:
str, optionalargument path:submission/machine[BohriumContext]/remote_profile/phonePhone number used to log in when email is not used.
- program_id:#
- type:
int, alias: project_idargument path:submission/machine[BohriumContext]/remote_profile/program_idProgram / project ID used to place uploaded jobs under the correct Bohrium project namespace.
- retry_count:#
- type:
NoneType|int, optional, default:2argument path:submission/machine[BohriumContext]/remote_profile/retry_countHow many times a terminated remote job is retried on the platform side before giving up.
- ignore_exit_code:#
- type:
bool, optional, default:Trueargument path:submission/machine[BohriumContext]/remote_profile/ignore_exit_codeWhether a non-zero exit code from the remote platform is still treated as finished. If False, such jobs are marked as terminated.
- keep_backup:#
- type:
bool, optionalargument path:submission/machine[BohriumContext]/remote_profile/keep_backupWhether to keep uploaded/downloaded zip archives in the local backup directory after transfer.
- input_data:#
- type:
dictargument path:submission/machine[BohriumContext]/remote_profile/input_dataPlatform-specific job configuration passed through to the Bohrium API.
When |flag:submission/machine/context_type|_ is set to
LocalContext(or its aliaseslocalcontext,Local,local):- remote_profile:#
- type:
dict, optionalargument path:submission/machine[LocalContext]/remote_profileOptions controlling how files are staged between local_root and remote_root when both paths are on the local filesystem.
- symlink:#
- type:
bool, optional, default:Trueargument path:submission/machine[LocalContext]/remote_profile/symlinkWhether to use symbolic links instead of copying files from local_root into remote_root. Disable this when the execution side cannot access the original local path through the same filesystem view.
When |flag:submission/machine/context_type|_ is set to
HDFSContext(or its aliaseshdfscontext,HDFS,hdfs):- remote_profile:#
- type:
dict, optionalargument path:submission/machine[HDFSContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When |flag:submission/machine/context_type|_ is set to
SSHContext(or its aliasessshcontext,SSH,ssh):- remote_profile:#
- type:
dictargument path:submission/machine[SSHContext]/remote_profileSSH connection settings for the remote machine, including authentication, timeouts, and optional proxy/jump-host behavior.
- hostname:#
- type:
strargument path:submission/machine[SSHContext]/remote_profile/hostnameHostname or IP address of the SSH target machine.
- username:#
- type:
strargument path:submission/machine[SSHContext]/remote_profile/usernameUsername used to log in to the target system.
- password:#
- type:
str, optionalargument path:submission/machine[SSHContext]/remote_profile/password(deprecated) password of linux system. Please use SSH keys instead to improve security.
- port:#
- type:
int, optional, default:22argument path:submission/machine[SSHContext]/remote_profile/portSSH port of the target machine. Usually 22.
- key_filename:#
- type:
str|NoneType, optional, default:Noneargument path:submission/machine[SSHContext]/remote_profile/key_filenamePath to the private key file used for SSH authentication. If left None, DPDispatcher can try discoverable keys in ~/.ssh or fall back to password-based login if configured.
- passphrase:#
- type:
str|NoneType, optional, default:Noneargument path:submission/machine[SSHContext]/remote_profile/passphrasePassphrase for the SSH private key, if the key is encrypted.
- timeout:#
- type:
int, optional, default:10argument path:submission/machine[SSHContext]/remote_profile/timeoutTimeout in seconds for establishing the SSH connection.
- totp_secret:#
- type:
str|NoneType, optional, default:Noneargument path:submission/machine[SSHContext]/remote_profile/totp_secretTime-based one-time-password secret used for keyboard-interactive 2FA. It should be a base32-encoded string.
- tar_compress:#
- type:
bool, optional, default:Trueargument path:submission/machine[SSHContext]/remote_profile/tar_compressWhether upload/download tar archives are compressed. Keeping this True usually reduces transfer size at the cost of extra CPU time.
- look_for_keys:#
- type:
bool, optional, default:Trueargument path:submission/machine[SSHContext]/remote_profile/look_for_keysWhether to search for discoverable private key files in ~/.ssh when key_filename is not provided.
- execute_command:#
- type:
str|NoneType, optional, default:Noneargument path:submission/machine[SSHContext]/remote_profile/execute_commandOptional command executed immediately after the SSH connection is established.
- proxy_command:#
- type:
str|NoneType, optional, default:Noneargument path:submission/machine[SSHContext]/remote_profile/proxy_commandOptional SSH ProxyCommand used to reach the target through an intermediate host or tunnel.
When |flag:submission/machine/context_type|_ is set to
OpenAPIContext(or its aliasesopenapicontext,OpenAPI,openapi):- remote_profile:#
- type:
dict, optionalargument path:submission/machine[OpenAPIContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
When |flag:submission/machine/context_type|_ is set to
LazyLocalContext(or its aliaseslazylocalcontext,LazyLocal,lazylocal):- remote_profile:#
- type:
dict, optionalargument path:submission/machine[LazyLocalContext]/remote_profileThe information used to maintain the connection with remote machine. This field is empty for this context.
- resources:#
- type:
dictargument path:submission/resourcesResources configuration. See related documentation for details.
- number_node:#
- type:
int, optional, default:1argument path:submission/resources/number_nodeNumber of nodes requested for each scheduler job generated by DPDispatcher.
- cpu_per_node:#
- type:
int, optional, default:1argument path:submission/resources/cpu_per_nodeNumber of CPUs requested on each node for each scheduler job.
- gpu_per_node:#
- type:
int, optional, default:0argument path:submission/resources/gpu_per_nodeNumber of GPUs requested on each node for each scheduler job.
- queue_name:#
- type:
str, optional, default: (empty string)argument path:submission/resources/queue_nameQueue or partition name used by the selected batch system. For local Shell runs this is usually an empty string; for Slurm it typically maps to a partition.
- group_size:#
- type:
intargument path:submission/resources/group_sizeHow many tasks are packed into one scheduler job. For example, 20 tasks with group_size=5 are typically split into 4 jobs. Use 1 for the simplest one-task workflow. 0 means no explicit upper limit in the grouping logic.
- custom_flags:#
- type:
typing.List[str], optionalargument path:submission/resources/custom_flagsExtra scheduler-header lines inserted into the generated submission script, typically for backend-specific options that are not covered by the standard fields.
- strategy:#
- type:
dict, optionalargument path:submission/resources/strategyStrategy options that affect how DPDispatcher generates and evaluates submission scripts.
- if_cuda_multi_devices:#
- type:
bool, optional, default:Falseargument path:submission/resources/strategy/if_cuda_multi_devicesIf a node has multiple NVIDIA GPUs, assign different tasks inside the same job to different GPUs by setting CUDA_VISIBLE_DEVICES automatically. Usually used together with para_deg > 1 and task-level resource awareness.
- ratio_unfinished:#
- type:
float, optional, default:0.0argument path:submission/resources/strategy/ratio_unfinishedMaximum fraction of tasks allowed to remain unfinished when evaluating job completion. Use 0.0 for the strict default that requires every task to finish.
- customized_script_header_template_file:#
- type:
str, optionalargument path:submission/resources/strategy/customized_script_header_template_fileCustom template file for the scheduler-header portion of generated submission scripts. Overrides the default template.
- para_deg:#
- type:
int, optional, default:1argument path:submission/resources/para_degHow many tasks inside one generated job are run in parallel. This is different from group_size: group_size controls how many tasks are bundled into a job, while para_deg controls concurrency within that job. Keep para_deg=1 for the safest default.
- source_list:#
- type:
typing.List[str], optional, default:[]argument path:submission/resources/source_listShell scripts or environment files sourced before task commands run. Useful on HPC systems for activating software stacks explicitly instead of relying on login-shell defaults.
- module_purge:#
- type:
bool, optional, default:Falseargument path:submission/resources/module_purgeWhether to run ‘module purge’ before applying module_unload_list and module_list. Mainly useful on HPC systems.
- module_unload_list:#
- type:
typing.List[str], optional, default:[]argument path:submission/resources/module_unload_listModules to unload before loading the requested modules. Mainly relevant on HPC systems with environment modules.
- module_list:#
- type:
typing.List[str], optional, default:[]argument path:submission/resources/module_listModules to load before executing tasks. Mainly relevant on HPC systems with environment modules.
- envs:#
- type:
dict, optional, default:{}argument path:submission/resources/envsEnvironment variables exported before executing tasks.
- prepend_script:#
- type:
typing.List[str], optional, default:[]argument path:submission/resources/prepend_scriptOptional shell lines inserted before task commands in the generated job script.
- append_script:#
- type:
typing.List[str], optional, default:[]argument path:submission/resources/append_scriptOptional shell lines inserted after task commands in the generated job script.
- wait_time:#
- type:
float|int, optional, default:0argument path:submission/resources/wait_timeDelay in seconds inserted after a job is submitted or resubmitted. Usually keep 0 unless the scheduler/site asks you to throttle submission pace.
- kwargs:#
- type:
dict, optionalargument path:submission/resources/kwargsVary by different machines.
- batch_type:#
- type:
str, optionalargument path:submission/resources/batch_typeAllow this key when strict checking.
- task_list:#
- type:
listargument path:submission/task_listList of tasks to execute.
This argument takes a list with each element containing the following:
- command:#
- type:
strargument path:submission/task_list/commandShell command executed for this task. A zero exit code is treated as success. If the real application may fail before useful artifacts are synchronized, consider wrapping it and saving diagnostics to files that are listed in backward_files.
- task_work_path:#
- type:
strargument path:submission/task_list/task_work_pathWorking directory of this task, specified as a relative path inside submission.work_base. Absolute paths are not supported and may break staging or remote execution. For the smallest local example, use ‘.’. If you use a subdirectory such as ‘task1/’, the command runs inside that subdirectory.
- forward_files:#
- type:
typing.List[str], optional, default:[]argument path:submission/task_list/forward_filesFiles to upload for this task before execution. Paths are resolved relative to this task’s task_work_path. Put per-task inputs here; files shared by all tasks belong in submission.forward_common_files.
- backward_files:#
- type:
typing.List[str], optional, default:[]argument path:submission/task_list/backward_filesFiles to download for this task after execution. Paths are collected from this task’s task_work_path on the execution side and synchronized back to the same relative task directory under the local staging root (typically machine.local_root/work_base).
- outlog:#
- type:
str|NoneType, optional, default:logargument path:submission/task_list/outlogFilename used to redirect stdout inside task_work_path while the task runs. If this file is downloaded or synchronized back, it typically appears under the same relative task directory on the local side.
- errlog:#
- type:
str|NoneType, optional, default:errargument path:submission/task_list/errlogFilename used to redirect stderr inside task_work_path while the task runs. If this file is downloaded or synchronized back, it typically appears under the same relative task directory on the local side.
Options#
--dry-run: Only upload files without submitting.--exit-on-submit: Exit after submitting without waiting for completion.--allow-ref: Allow loading external JSON/YAML snippets through$ref(disabled by default for security).