Running multiple MD tasks on a GPU workstation

In this example, we are going to show how to run multiple MD tasks on a GPU workstation. This workstation does not install any job scheduling packages installed, so we will use Shell as batch_type.

 1{
 2  "batch_type": "Shell",
 3  "local_root": "./",
 4  "remote_root": "/data2/jinzhe/dpgen_workdir",
 5  "clean_asynchronously": true,
 6  "context_type": "SSHContext",
 7  "remote_profile": {
 8    "hostname": "mandu.iqb.rutgers.edu",
 9    "username": "jz748",
10    "port": 22
11  }
12}

The workstation has 48 cores of CPUs and 8 RTX3090 cards. Here we hope each card runs 6 tasks at the same time, as each task does not consume too many GPU resources. Thus, strategy/if_cuda_multi_devices is set to true and para_deg is set to 6.

 1{
 2  "number_node": 1,
 3  "cpu_per_node": 48,
 4  "gpu_per_node": 8,
 5  "queue_name": "shell",
 6  "group_size": 9999,
 7  "strategy": {
 8    "if_cuda_multi_devices": true
 9  },
10  "source_list": [
11    "activate /home/jz748/deepmd-kit"
12  ],
13  "envs": {
14    "OMP_NUM_THREADS": 1,
15    "TF_INTRA_OP_PARALLELISM_THREADS": 1,
16    "TF_INTER_OP_PARALLELISM_THREADS": 1
17  },
18  "para_deg": 6
19}

Note that group_size should be set as large as possible to ensure there is only one job and avoid running multiple jobs at the same time.