deepmd.tf.utils.data#
Alias for backward compatibility.
Classes#
Class for a data system. |
Module Contents#
- class deepmd.tf.utils.data.DeepmdData(sys_path: str, set_prefix: str = 'set', shuffle_test: bool = True, type_map: list[str] | None = None, optional_type_map: bool = True, modifier: Any | None = None, trn_all_set: bool = False, sort_atoms: bool = True)[source]#
Class for a data system.
It loads data from hard disk, and maintains the data as a data_dict
- Parameters:
- sys_path
Path to the data system
- set_prefix
Prefix for the directories of different sets
- shuffle_test
If the test data are shuffled
- type_map
Gives the name of different atom types
- optional_type_map
If the type_map.raw in each system is optional
- modifier
Data modifier that has the method modify_data
- trn_all_set
[DEPRECATED] Deprecated. Now all sets are trained and tested.
- sort_atomsbool
Sort atoms by atom types. Required to enable when the data is directly fed to descriptors except mixed types.
- dirs#
- mixed_type#
- atom_type#
- natoms#
- type_map#
- pbc = True#
- enforce_type_map = False#
- sort_atoms = True#
- idx_map#
- data_dict#
- set_count = 0#
- iterator = 0#
- shuffle_test = True#
- modifier = None#
- nframes#
- prefix_sum#
- use_modifier_cache = True#
- add(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: list[int] | None = None, repeat: int = 1, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False) DeepmdData[source]#
Add a data item that to be loaded.
- Parameters:
- key
The key of the item. The corresponding data is stored in sys_path/set.*/key.npy
- ndof
The number of dof
- atomic
The item is an atomic property. If False, the size of the data should be nframes x ndof If True, the size of data should be nframes x natoms x ndof
- must
The data file sys_path/set.*/key.npy must exist. If must is False and the data file does not exist, the data_dict[find_key] is set to 0.0
- high_prec
Load the data and store in float64, otherwise in float32
- type_sel
Select certain type of atoms
- repeat
The data will be repeated repeat times.
- default
float, default=0. default value of data
- dtype
np.dtype,optional the dtype of data, overwrites high_prec if provided
- output_natoms_for_type_selbool,
optional if True and type_sel is True, the atomic dimension will be natoms instead of nsel
- reduce(key_out: str, key_in: str) DeepmdData[source]#
Generate a new item from the reduction of another atom.
- Parameters:
- key_out
The name of the reduced item
- key_in
The name of the data item to be reduced
- check_batch_size(batch_size: int) bool[source]#
Check if the system can get a batch of data with batch_size frames.
- check_test_size(test_size: int) bool[source]#
Check if the system can get a test dataset with test_size frames.
- get_item_torch(index: int, num_worker: int = 1) dict[source]#
Get a single frame data . The frame is picked from the data system by index. The index is coded across all the sets.
- Parameters:
- index
index of the frame
- num_worker
number of workers for parallel data modification
- get_item_paddle(index: int, num_worker: int = 1) dict[source]#
Get a single frame data . The frame is picked from the data system by index. The index is coded across all the sets. Same with PyTorch backend.
- Parameters:
- index
index of the frame
- num_worker
number of workers for parallel data modification
- get_batch(batch_size: int) dict[source]#
Get a batch of data with batch_size frames. The frames are randomly picked from the data system.
- Parameters:
- batch_size
size of the batch
- get_test(ntests: int = -1) dict[source]#
Get the test data with ntests frames.
- Parameters:
- ntests
Size of the test data set. If ntests is -1, all test data will be get.
- get_natoms_vec(ntypes: int) numpy.ndarray[source]#
Get number of atoms and number of atoms in different types.
- Parameters:
- ntypes
Number of types (may be larger than the actual number of types in the system).
- Returns:
natomsnatoms[0]: number of local atoms natoms[1]: total number of atoms held by this processor natoms[i]: 2 <= i < Ntypes+2, number of type i atoms
- get_single_frame(index: int, num_worker: int) dict[source]#
Orchestrates loading a single frame efficiently using memmap.
- preload_and_modify_all_data_torch(num_worker: int) None[source]#
Preload all frames and apply modifier to cache them.
This method is useful when use_modifier_cache is True and you want to avoid applying the modifier repeatedly during training.
- _idx_map_sel(atom_type: numpy.ndarray, type_sel: list[int]) numpy.ndarray[source]#
- _get_memmap(path: deepmd.utils.path.DPPath) numpy.memmap[source]#
Get or create a memory-mapped object for a given npy file. Uses file path and modification time as cache keys to detect file changes and invalidate cache when files are modified.
- _load_batch_set(set_name: deepmd.utils.path.DPPath) None[source]#
- _get_nframes(set_name: deepmd.utils.path.DPPath | str) int[source]#
- reformat_data_torch(data: dict[str, Any]) dict[str, Any][source]#
Modify the data format for the requirements of Torch backend.
- Parameters:
- data
original data
- _load_data(set_name: str, key: str, nframes: int, ndof_: int, atomic: bool = False, must: bool = True, repeat: int = 1, high_prec: bool = False, type_sel: list[int] | None = None, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False) numpy.ndarray[source]#
- _load_single_data(set_dir: deepmd.utils.path.DPPath, key: str, frame_idx: int, set_nframes: int) tuple[numpy.float32, numpy.ndarray][source]#
Loads and processes data for a SINGLE frame from a SINGLE key, fully replicating the logic from the original _load_data method.
- _load_type(sys_path: deepmd.utils.path.DPPath) numpy.ndarray[source]#
- _load_type_mix(set_name: deepmd.utils.path.DPPath) numpy.ndarray[source]#
- _make_idx_map(atom_type: numpy.ndarray) numpy.ndarray[source]#
- _check_pbc(sys_path: deepmd.utils.path.DPPath) bool[source]#
- _check_mode(set_path: deepmd.utils.path.DPPath) bool[source]#
- static _create_memmap(path_str: str, mtime_str: str) numpy.memmap[source]#
A cached helper function to create memmap objects. Using lru_cache to limit the number of open file handles.
- Parameters:
- path_str
The file path as a string.
- mtime_str
The modification time as a string, used for cache invalidation.