deepmd.cluster package

Module that reads node resources, auto detects if running local or on SLURM.

deepmd.cluster.get_resource() Tuple[str, List[str], Optional[List[int]]][source]

Get local or slurm resources: nodename, nodelist, and gpus.

Returns
Tuple[str, List[str], Optional[List[int]]]

nodename, nodelist, and gpus

Submodules

deepmd.cluster.local module

Get local GPU resources.

deepmd.cluster.local.get_gpus()[source]

Get available IDs of GPU cards at local. These IDs are valid when used as the TensorFlow device ID.

Returns
Optional[List[int]]

List of available GPU IDs. Otherwise, None.

deepmd.cluster.local.get_resource() Tuple[str, List[str], Optional[List[int]]][source]

Get local resources: nodename, nodelist, and gpus.

Returns
Tuple[str, List[str], Optional[List[int]]]

nodename, nodelist, and gpus

deepmd.cluster.slurm module

MOdule to get resources on SLURM cluster.

References

https://github.com/deepsense-ai/tensorflow_on_slurm ####

deepmd.cluster.slurm.get_resource() Tuple[str, List[str], Optional[List[int]]][source]

Get SLURM resources: nodename, nodelist, and gpus.

Returns
Tuple[str, List[str], Optional[List[int]]]

nodename, nodelist, and gpus

Raises
RuntimeError

if number of nodes could not be retrieved

ValueError

list of nodes is not of the same length sa number of nodes

ValueError

if current nodename is not found in node list