9.2. Input Formats#

Project/package name: dpa-adapt Python import: dpa_adapt Main CLI: dpa-adapt Optional short alias: dpaad Display name: DPA-ADAPT — Atomistic DPA Adaptation for Property Tasks

dpa-adapt data convert and the Python dpa_adapt.convert() helper auto-detect the input type and route it to the correct pipeline: SMILES table → RDKit 3D conformer generation, structure files → dpdata (auto-detect or explicit --fmt).

9.2.1. SMILES Tables (CSV)#

Trigger: file extension .csv and a SMILES column. By default, the converter reads SMILES/smiles; use --smiles-col for other column names such as smi or mol. Or pass --fmt smiles explicitly.

Parameter

Default

Description

--smiles-col

SMILES

Column name for SMILES strings

--property-col

Property

Input table column to read target values from; also used as the output label name

--train-ratio

0.9

Fraction of samples used for training set

--mol-dir

Directory of pre-generated .mol, .sdf, .xyz, or .pdb structure files (skips RDKit 3D conformer generation)

--mol-template

id{row}.mol

Filename template under --mol-dir; use {row} for the CSV row index

--split-seed

42

Random seed for train/valid splitting

--conformer-seed

42

Random seed for RDKit 3D conformer generation

# Auto-detected via SMILES column
dpa-adapt data convert --input molecules.csv --output ./npy \
    --property-col homo
# Short alias
dpaad data convert --input molecules.csv --output ./npy \
    --property-col homo

# Explicit fmt + custom column names
dpa-adapt data convert --input data.csv --output ./npy --fmt smiles \
    --smiles-col smi --property-col GAP --train-ratio 0.85 \
    --split-seed 42 --conformer-seed 43
# Short alias
dpaad data convert --input data.csv --output ./npy --fmt smiles \
    --smiles-col smi --property-col GAP --train-ratio 0.85 \
    --split-seed 42 --conformer-seed 43

9.2.2. Structure Files via dpdata#

Trigger: inputs not routed to the SMILES pipeline. This means --fmt is not smiles; when --fmt is omitted, CSV inputs are routed here only if they do not contain a recognized SMILES column. Calls dpdata for format auto-detection or explicit conversion.

9.2.2.1. Common Formats#

--fmt value

Typical file(s)

Notes

xyz

*.xyz

Plain XYZ

vasp/poscar / vasp/contcar

POSCAR, CONTCAR

VASP input/final structure

vasp/outcar

OUTCAR

VASP output (energies, forces, stress)

vasp/xml

vasprun.xml

VASP XML output

vasp/string

VASP structure string

VASP structure from a string

abacus/stru / stru

STRU

ABACUS input structure

abacus/scf / abacus/pw/scf / abacus/lcao/scf

SCF output

ABACUS SCF calculation

abacus/md / abacus/pw/md / abacus/lcao/md

MD output

ABACUS molecular dynamics

abacus/relax / abacus/pw/relax / abacus/lcao/relax

Relax output

ABACUS relaxation

cp2k/aimd_output

CP2K MD output

CP2K AIMD output file

cp2k/output

CP2K SCF output

CP2K single-point output

deepmd/raw

set.*/ dirs

DeePMD-kit raw format

deepmd/comp / deepmd/npy

set.*/ dirs

DeePMD-kit compressed/npy format

deepmd/npy/mixed

mixed deepmd/npy dir

DeePMD-kit mixed npy format

deepmd/hdf5

*.hdf5

DeePMD-kit HDF5 format

lammps/dump / dump

dump.*

LAMMPS dump trajectory

lammps/lmp / lmp

*.lmp

LAMMPS data file

qe/cp/traj

CP trajectory

Quantum ESPRESSO Car-Parrinello MD

qe/pw/scf

PWscf output

Quantum ESPRESSO PWscf

siesta/output

Siesta output

SIESTA SCF output

siesta/aimd_output

Siesta MD output

SIESTA AIMD output

gaussian/log

*.log

Gaussian log file

gaussian/fchk

*.fchk

Gaussian formatted checkpoint

gaussian/md

Gaussian MD output

Gaussian MD trajectory

gaussian/gjf

*.gjf

Gaussian input file

amber/md

Amber MD output

Amber MD trajectory

gromacs/gro / gro

*.gro

GROMACS coordinate file

pwmat/output / pwmat/movement / pwmat/mlmd

REPORT, MOVEMENT, MLMD

PWmat output / movement / MLMD

pwmat/final.config / pwmat/atom.config

final.config, atom.config

PWmat final/input structure

fhi_aims/output / fhi_aims/md

FHI-aims output/MD

FHI-aims calculation or MD trajectory

fhi_aims/scf

FHI-aims SCF output

FHI-aims SCF

psi4/out

Psi4 output

Psi4 calculation output

psi4/inp

Psi4 input

Psi4 input file

orca/spout

ORCA output

ORCA single-point output

sqm/out

SQM output

SQM output

sqm/in

SQM input

SQM input

openmx/md

OpenMX MD output

OpenMX MD trajectory

n2p2

n2p2 output

n2p2/NNPack output

dftbplus

DFTB+ output

DFTB+ detailed.xml

mol / mol_file

*.mol

MDL Molfile

sdf / sdf_file

*.sdf

MDL SDFile

ase/structure

Any ASE format

ASE structure (single frame)

ase/traj

Any ASE trajectory

ASE trajectory (multi-frame)

pymatgen/structure

pymatgen objects

pymatgen Structure

pymatgen/molecule

pymatgen objects

pymatgen Molecule

pymatgen/computedstructureentry

pymatgen objects

pymatgen ComputedStructureEntry

lmdb

LMDB dir

DeePMD-kit LMDB format

list

List-format dir

List of system directories

3dmol

3Dmol format

3Dmol.js format

You can omit --fmt and let dpdata infer the input format from the file name or content. For example, files named POSCAR, OUTCAR, or *.xyz are often recognized automatically. Use --fmt when the file name is ambiguous or auto-detection fails.

9.2.2.2. Single file#

dpa-adapt data convert --input POSCAR --output ./npy
dpaad data convert --input POSCAR --output ./npy

dpa-adapt data convert --input OUTCAR --output ./npy --fmt vasp/outcar
dpaad data convert --input OUTCAR --output ./npy --fmt vasp/outcar

dpa-adapt data convert --input traj.xyz --output ./npy --fmt xyz
dpaad data convert --input traj.xyz --output ./npy --fmt xyz

9.2.2.3. Glob patterns#

When --input contains wildcards (*, ?, [), conversion uses mirrored batch output:

  • 1 or more matches → each matched file is converted into an output directory that mirrors its path relative to the non-wildcard prefix.

  • 0 matchesFileNotFoundError.

  • A manifest.json is written into the output root, recording converted and skipped files.

# Glob output mirrors the input tree under ./npy_root
dpa-adapt data convert --input "calcs/**/OUTCAR" --output ./npy_root --fmt vasp/outcar
dpaad data convert --input "calcs/**/OUTCAR" --output ./npy_root --fmt vasp/outcar

For example, calcs/run1/OUTCAR is written as npy_root/run1/OUTCAR/. When --strict is set, the first conversion error fails immediately. Without it, errors are skipped and logged in the manifest.