12. Runtime environment variables#
Note
For build-time environment variables, see Install from source code.
12.1. All interfaces#
- DP_INTER_OP_PARALLELISM_THREADS#
Alias:
TF_INTER_OP_PARALLELISM_THREADSDefault:0Control parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs for CPU devices. See How to control the parallelism of a job for details.
- DP_INTRA_OP_PARALLELISM_THREADS#
Alias:
TF_INTRA_OP_PARALLELISM_THREADS** Default:0Control parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs. See How to control the parallelism of a job for details.
12.2. Environment variables of dependencies#
If OpenMP is used, OpenMP environment variables can be used to control OpenMP threads, such as
OMP_NUM_THREADS.If CUDA is used, CUDA environment variables can be used to control CUDA devices, such as
CUDA_VISIBLE_DEVICES.If ROCm is used, ROCm environment variables can be used to control ROCm devices.
If TensorFlow is used, TensorFlow environment variables can be used.
If PyTorch is used, PyTorch environment variables can be used.
JAX_PLATFORMSandXLA_FLAGSare commonly used.
12.3. Python interface only#
- DP_INTERFACE_PREC#
Choices:
high,low; Default:highControl high (double) or low (float) precision of training.
- DP_AUTO_PARALLELIZATION#
Choices:
0,1; Default:0Enable auto parallelization for CPU operators.
- DP_JIT#
Choices:
0,1; Default:0Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow to support JIT.
- DP_INFER_BATCH_SIZE#
Default:
1024on CPUs and as maximum as possible until out-of-memory on GPUsInference batch size, calculated by multiplying the number of frames with the number of atoms.
- DP_BACKEND#
Default:
tensorflowDefault backend.
- NUM_WORKERS#
Default: 4 or the number of cores (whichever is smaller)
Number of subprocesses to use for data loading in the PyTorch backend. See PyTorch documentation for details.
12.4. C++ interface only#
These environment variables also apply to third-party programs using the C++ interface, such as LAMMPS.
- DP_PLUGIN_PATH#
Type: List of paths, split by
:on Unix and;on WindowsList of customized OP plugin libraries to load, such as
/path/to/plugin1.so:/path/to/plugin2.soon Linux and/path/to/plugin1.dll;/path/to/plugin2.dllon Windows.
- DP_PROFILER#
Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend.
Type: string (output file stem)
Default: unset (disabled)
When set to a non-empty value, profiling is enabled for the lifetime of the loaded PyTorch model (e.g. during LAMMPS runs). A JSON trace file is created on finish. The final file name is constructed as:
<ENV_VALUE>_gpu<ID>.jsonif running on GPU<ENV_VALUE>.jsonif running on CPU
The trace can be examined with Chrome trace viewer (alternatively chrome://tracing). It includes:
CPU operator activities
CUDA activities (if available)
Example:
export DP_PROFILER=result mpirun -np 4 lmp -in in.lammps # Produces result_gpuX.json, where X is the GPU id used by each MPI rank.
Tips:
Large runs can generate sizable JSON files; consider limiting numbers of MD steps, like 20.
Currently this feature only supports single process, or multi-process runs where each process uses a distinct GPU on the same node.