Supported batch job systems#

Batch job system is a system to process batch jobs. One needs to set batch_type to one of the following values:

Bash#

batch_type: Shell

When batch_type is set to Shell, dpdispatcher will generate a bash script to process jobs. No extra packages are required for Shell.

Due to lack of scheduling system, Shell runs all jobs at the same time. To avoid running multiple jobs at the same time, one could set group_size to 0 (means infinity) to generate only one job with multiple tasks.

Slurm#

batch_type: Slurm, SlurmJobArray

Slurm is a job scheduling system used by lots of HPCs. One needs to make sure slurm has been set up in the remote server and the related environment is activated.

When SlurmJobArray is used, dpdispatcher submits Slurm jobs with job arrays. In this way, several dpdispatcher tasks map to a Slurm job and a dpdispatcher job maps to a Slurm job array. Millions of Slurm jobs can be submitted quickly and Slurm can execute all Slurm jobs at the same time. One can use group_size and slurm_job_size to control how many Slurm jobs are contained in a Slurm job array.

OpenPBS or PBSPro#

batch_type: PBS

OpenPBS is an open-source job scheduling of the Linux Foundation and PBS Profession is its commercial solution. One needs to make sure OpenPBS has been set up in the remote server and the related environment is activated.

Note that do not use PBS for Torque.

TORQUE#

batch_type: Torque

The Terascale Open-source Resource and QUEue Manager (TORQUE) is a distributed resource manager based on standard OpenPBS. However, not all OpenPBS flags are still supported in TORQUE. One needs to make sure TORQUE has been set up in the remote server and the related environment is activated.

LSF#

batch_type: LSF

IBM Spectrum LSF Suites is a comprehensive workload management solution used by HPCs. One needs to make sure LSF has been set up in the remote server and the related environment is activated.

JH UniScheduler#

batch_type: JH_UniScheduler

JH UniScheduler was developed by JHINNO company and uses “jsub” to submit tasks. Its overall architecture is similar to that of IBM’s LSF. However, there are still some differences between them. One needs to make sure JH UniScheduler has been set up in the remote server and the related environment is activated.

Bohrium#

batch_type: Bohrium

Bohrium is the cloud platform for scientific computing. Read Bohrium documentation for details.

DistributedShell#

batch_type: DistributedShell

DistributedShell is used to submit yarn jobs. Read Support DPDispatcher on Yarn for details.

Fugaku#

batch_type: Fugaku

Fujitsu cloud service is a job scheduling system used by Fujitsu’s HPCs such as Fugaku, ITO and K computer. It should be noted that although the same job scheduling system is used, there are some differences in the details, Fagaku class cannot be directly used for other HPCs.

Read Fujitsu cloud service documentation for details.

OpenAPI#

batcy_type: OpenAPI OpenAPI is a new way to submit jobs to Bohrium. It is using AccessKey instead of username and password. Read Bohrium documentation for details.

SGE#

batch_type: SGE

The Sun Grid Engine (SGE) scheduler is a batch-queueing system distributed resource management. The commands and flags of SGE share a lot of similarity with PBS except when checking job status. Use this argument if one is submitting job to an SGE-based batch system.