Descriptor DPA-2

4.6. Descriptor DPA-2 #

Note

Supported backends: PyTorch , JAX , Paddle , DP

The DPA-2 model implementation. See DPA-2 paper for more details.

Training example: examples/water/dpa2/input_torch_medium.json, see README for inputs in different levels.

4.6.1. Theory#

DPA-2 is an attention-based descriptor architecture proposed for large atomic models (LAMs); see the DPA-2 paper.

At a high level, DPA-2 builds local representations with three coupled channels (paper notation):

Single-atom channel \(\mathbf{f}_\alpha\)
Rotationally invariant pair channel \(\mathbf{g}_{\alpha\beta}\)
Rotationally equivariant pair channel \(\mathbf{h}_{\alpha\beta}\)

for neighbors \(\beta\in\mathcal{N}(\alpha)\) within cutoffs.

4.6.1.1. Descriptor pipeline#

The descriptor follows two main stages:

repinit (representation initializer)
- Initializes and fuses type and geometry information from local environments.
repformer (representation transformer)
- Stacked message-passing layers that update \(\mathbf{f}\), \(\mathbf{g}\), and per-atom representations \(\mathbf{h}\) through convolution/symmetrization/MLP and attention-style interactions.

The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.

4.6.1.2. Message-passing intuition#

DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions:

\[\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right)\]

where \(\mathrm{MP}^{(l)}\) denotes the layer-specific message-passing operator.

4.6.1.3. Physical properties#

Consistent with the DPA-2 design goals in the paper, the model family is built to satisfy:

Translational invariance (depends on relative coordinates)
Rotational and permutational symmetry requirements
Conservative formulation when used in energy models (forces/virials from energy gradients)
Smoothness up to second-order derivatives

4.6.1.4. Multi-task training context#

DPA-2 is designed for multi-task pre-training with a shared descriptor and task-specific downstream heads. See Multi-task training for workflow details.

4.6.2. Requirements of installation #

If one wants to run the DPA-2 model on LAMMPS, the customized OP library for the Python interface must be installed when freezing the model.

The customized OP library for the Python interface can be installed by setting environment variable DP_ENABLE_PYTORCH to 1 during installation.

If one runs LAMMPS with MPI, the customized OP library for the C++ interface should be compiled against the same MPI library as the runtime MPI. If one runs LAMMPS with MPI and CUDA devices, it is recommended to compile the customized OP library for the C++ interface with a CUDA-Aware MPI library and CUDA, otherwise the communication between GPU cards falls back to the slower CPU implementation.

4.6.3. Limitations of the JAX backend with LAMMPS #

When using the JAX backend, 2 or more MPI ranks are not supported. One must set map to yes using the atom_modify command.

atom_modify map yes

See the example examples/water/lmp/jax_dpa.lammps.

4.6.4. Data format#

DPA-2 supports both the standard data format and the mixed type data format.

4.6.5. Type embedding#

Type embedding is within this descriptor with the tebd_dim argument.

4.6.6. Model compression#

Model compression is supported when repinit/tebd_input_mode is strip.

If repinit/attn_layer is 0, both the type embedding and geometric parts inside repinit are compressed.
If repinit/attn_layer is not 0, only the type embedding tables are compressed and the geometric attention layers remain as neural networks.

An example is given in examples/water/dpa2/input_torch_compressible.json. The performance improvement will be limited if other parts are more expensive.