deepmd.utils.data

Contents

deepmd.utils.data#

Attributes#

Classes#

DeepmdData

Class for a data system.

DataRequirementItem

A class to store the data requirement for data systems.

Module Contents#

deepmd.utils.data.log[source]#
class deepmd.utils.data.DeepmdData(sys_path: str, set_prefix: str = 'set', shuffle_test: bool = True, type_map: list[str] | None = None, optional_type_map: bool = True, modifier=None, trn_all_set: bool = False, sort_atoms: bool = True)[source]#

Class for a data system.

It loads data from hard disk, and maintains the data as a data_dict

Parameters:
sys_path

Path to the data system

set_prefix

Prefix for the directories of different sets

shuffle_test

If the test data are shuffled

type_map

Gives the name of different atom types

optional_type_map

If the type_map.raw in each system is optional

modifier

Data modifier that has the method modify_data

trn_all_set

[DEPRECATED] Deprecated. Now all sets are trained and tested.

sort_atomsbool

Sort atoms by atom types. Required to enable when the data is directly fed to descriptors except mixed types.

dirs[source]#
mixed_type[source]#
atom_type[source]#
natoms[source]#
type_map[source]#
pbc[source]#
enforce_type_map = False[source]#
sort_atoms[source]#
idx_map[source]#
data_dict[source]#
set_count = 0[source]#
iterator = 0[source]#
shuffle_test[source]#
modifier[source]#
nframes[source]#
prefix_sum[source]#
add(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: list[int] | None = None, repeat: int = 1, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False)[source]#

Add a data item that to be loaded.

Parameters:
key

The key of the item. The corresponding data is stored in sys_path/set.*/key.npy

ndof

The number of dof

atomic

The item is an atomic property. If False, the size of the data should be nframes x ndof If True, the size of data should be nframes x natoms x ndof

must

The data file sys_path/set.*/key.npy must exist. If must is False and the data file does not exist, the data_dict[find_key] is set to 0.0

high_prec

Load the data and store in float64, otherwise in float32

type_sel

Select certain type of atoms

repeat

The data will be repeated repeat times.

defaultfloat, default=0.

default value of data

dtypenp.dtype, optional

the dtype of data, overwrites high_prec if provided

output_natoms_for_type_selbool, optional

if True and type_sel is True, the atomic dimension will be natoms instead of nsel

reduce(key_out: str, key_in: str)[source]#

Generate a new item from the reduction of another atom.

Parameters:
key_out

The name of the reduced item

key_in

The name of the data item to be reduced

get_data_dict() dict[source]#

Get the data_dict.

check_batch_size(batch_size)[source]#

Check if the system can get a batch of data with batch_size frames.

check_test_size(test_size)[source]#

Check if the system can get a test dataset with test_size frames.

get_item_torch(index: int) dict[source]#

Get a single frame data . The frame is picked from the data system by index. The index is coded across all the sets.

Parameters:
index

index of the frame

get_batch(batch_size: int) dict[source]#

Get a batch of data with batch_size frames. The frames are randomly picked from the data system.

Parameters:
batch_size

size of the batch

get_test(ntests: int = -1) dict[source]#

Get the test data with ntests frames.

Parameters:
ntests

Size of the test data set. If ntests is -1, all test data will be get.

get_ntypes() int[source]#

Number of atom types in the system.

get_type_map() list[str][source]#

Get the type map.

get_atom_type() list[int][source]#

Get atom types.

get_numb_set() int[source]#

Get number of training sets.

get_numb_batch(batch_size: int, set_idx: int) int[source]#

Get the number of batches in a set.

get_sys_numb_batch(batch_size: int) int[source]#

Get the number of batches in the data system.

get_natoms()[source]#

Get number of atoms.

get_natoms_vec(ntypes: int)[source]#

Get number of atoms and number of atoms in different types.

Parameters:
ntypes

Number of types (may be larger than the actual number of types in the system).

Returns:
natoms

natoms[0]: number of local atoms natoms[1]: total number of atoms held by this processor natoms[i]: 2 <= i < Ntypes+2, number of type i atoms

avg(key)[source]#

Return the average value of an item.

_idx_map_sel(atom_type, type_sel)[source]#
_get_natoms_2(ntypes)[source]#
_get_subdata(data, idx=None)[source]#
_load_batch_set(set_name: deepmd.utils.path.DPPath) None[source]#
reset_get_batch() None[source]#
_load_test_set(shuffle_test: bool) None[source]#
_shuffle_data(data)[source]#
_get_nframes(set_name: deepmd.utils.path.DPPath)[source]#
reformat_data_torch(data)[source]#

Modify the data format for the requirements of Torch backend.

Parameters:
data

original data

_load_set(set_name: deepmd.utils.path.DPPath)[source]#
_load_data(set_name, key, nframes, ndof_, atomic=False, must=True, repeat=1, high_prec=False, type_sel=None, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False)[source]#
_load_type(sys_path: deepmd.utils.path.DPPath)[source]#
_load_type_mix(set_name: deepmd.utils.path.DPPath)[source]#
_make_idx_map(atom_type)[source]#
_load_type_map(sys_path: deepmd.utils.path.DPPath)[source]#
_check_pbc(sys_path: deepmd.utils.path.DPPath)[source]#
_check_mode(set_path: deepmd.utils.path.DPPath)[source]#
class deepmd.utils.data.DataRequirementItem(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: list[int] | None = None, repeat: int = 1, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False)[source]#

A class to store the data requirement for data systems.

Parameters:
key

The key of the item. The corresponding data is stored in sys_path/set.*/key.npy

ndof

The number of dof

atomic

The item is an atomic property. If False, the size of the data should be nframes x ndof If True, the size of data should be nframes x natoms x ndof

must

The data file sys_path/set.*/key.npy must exist. If must is False and the data file does not exist, the data_dict[find_key] is set to 0.0

high_prec

Load the data and store in float64, otherwise in float32

type_sel

Select certain type of atoms

repeat

The data will be repeated repeat times.

defaultfloat, default=0.

default value of data

dtypenp.dtype, optional

the dtype of data, overwrites high_prec if provided

output_natoms_for_type_selbool, optional

if True and type_sel is True, the atomic dimension will be natoms instead of nsel

key[source]#
ndof[source]#
atomic[source]#
must[source]#
high_prec[source]#
type_sel[source]#
repeat[source]#
default[source]#
dtype[source]#
output_natoms_for_type_sel[source]#
dict[source]#
to_dict() dict[source]#
__getitem__(key: str)[source]#
__eq__(__value: object) bool[source]#
__repr__() str[source]#