deepmd.cluster package

Module that reads node resources, auto detects if running local or on SLURM.

deepmd.cluster.get_resource() Tuple[str, List[str], List[int] | None][source]

Get local or slurm resources: nodename, nodelist, and gpus.

Returns:
Tuple[str, List[str], Optional[List[int]]]

nodename, nodelist, and gpus

Submodules

deepmd.cluster.local module

Get local GPU resources.

deepmd.cluster.local.get_gpus()[source]

Get available IDs of GPU cards at local. These IDs are valid when used as the TensorFlow device ID.

Returns:
Optional[List[int]]

List of available GPU IDs. Otherwise, None.

deepmd.cluster.local.get_resource() Tuple[str, List[str], List[int] | None][source]

Get local resources: nodename, nodelist, and gpus.

Returns:
Tuple[str, List[str], Optional[List[int]]]

nodename, nodelist, and gpus

deepmd.cluster.slurm module

MOdule to get resources on SLURM cluster.

References

https://github.com/deepsense-ai/tensorflow_on_slurm ####

deepmd.cluster.slurm.get_resource() Tuple[str, List[str], List[int] | None][source]

Get SLURM resources: nodename, nodelist, and gpus.

Returns:
Tuple[str, List[str], Optional[List[int]]]

nodename, nodelist, and gpus

Raises:
RuntimeError

if number of nodes could not be retrieved

ValueError

list of nodes is not of the same length sa number of nodes

ValueError

if current nodename is not found in node list