deepmd.tf.utils#
Submodules#
- deepmd.tf.utils.argcheck
- deepmd.tf.utils.batch_size
- deepmd.tf.utils.compat
- deepmd.tf.utils.compress
- deepmd.tf.utils.convert
- deepmd.tf.utils.data
- deepmd.tf.utils.data_system
- deepmd.tf.utils.errors
- deepmd.tf.utils.finetune
- deepmd.tf.utils.graph
- deepmd.tf.utils.learning_rate
- deepmd.tf.utils.neighbor_stat
- deepmd.tf.utils.network
- deepmd.tf.utils.nlist
- deepmd.tf.utils.pair_tab
- deepmd.tf.utils.parallel_op
- deepmd.tf.utils.path
- deepmd.tf.utils.plugin
- deepmd.tf.utils.random
- deepmd.tf.utils.region
- deepmd.tf.utils.serialization
- deepmd.tf.utils.sess
- deepmd.tf.utils.spin
- deepmd.tf.utils.tabulate
- deepmd.tf.utils.type_embed
- deepmd.tf.utils.update_sel
- deepmd.tf.utils.weight_avg
Classes#
Class for a data system. | |
Class for manipulating many data systems. | |
TensorFlow wrapper for BaseLR. | |
Pairwise tabulated potential. | |
A class to register and restore plugins. | |
A class to remove type from input arguments. |
Package Contents#
- class deepmd.tf.utils.DeepmdData(sys_path: str, set_prefix: str = 'set', shuffle_test: bool = True, type_map: list[str] | None = None, optional_type_map: bool = True, modifier: Any | None = None, trn_all_set: bool = False, sort_atoms: bool = True)[source]#
Class for a data system.
It loads data from hard disk, and maintains the data as a data_dict
- Parameters:
- sys_path
Path to the data system
- set_prefix
Prefix for the directories of different sets
- shuffle_test
If the test data are shuffled
- type_map
Gives the name of different atom types
- optional_type_map
If the type_map.raw in each system is optional
- modifier
Data modifier that has the method modify_data
- trn_all_set
[DEPRECATED] Deprecated. Now all sets are trained and tested.
- sort_atomsbool
Sort atoms by atom types. Required to enable when the data is directly fed to descriptors except mixed types.
- dirs#
- mixed_type#
- atom_type#
- natoms#
- type_map#
- pbc = True#
- enforce_type_map = False#
- sort_atoms = True#
- idx_map#
- data_dict#
- set_count = 0#
- iterator = 0#
- shuffle_test = True#
- modifier = None#
- nframes#
- prefix_sum#
- use_modifier_cache = True#
- add(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: list[int] | None = None, repeat: int = 1, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False) DeepmdData[source]#
Add a data item that to be loaded.
- Parameters:
- key
The key of the item. The corresponding data is stored in sys_path/set.*/key.npy
- ndof
The number of dof
- atomic
The item is an atomic property. If False, the size of the data should be nframes x ndof If True, the size of data should be nframes x natoms x ndof
- must
The data file sys_path/set.*/key.npy must exist. If must is False and the data file does not exist, the data_dict[find_key] is set to 0.0
- high_prec
Load the data and store in float64, otherwise in float32
- type_sel
Select certain type of atoms
- repeat
The data will be repeated repeat times.
- default
float, default=0. default value of data
- dtype
np.dtype,optional the dtype of data, overwrites high_prec if provided
- output_natoms_for_type_selbool,
optional if True and type_sel is True, the atomic dimension will be natoms instead of nsel
- reduce(key_out: str, key_in: str) DeepmdData[source]#
Generate a new item from the reduction of another atom.
- Parameters:
- key_out
The name of the reduced item
- key_in
The name of the data item to be reduced
- check_batch_size(batch_size: int) bool[source]#
Check if the system can get a batch of data with batch_size frames.
- check_test_size(test_size: int) bool[source]#
Check if the system can get a test dataset with test_size frames.
- get_item_torch(index: int, num_worker: int = 1) dict[source]#
Get a single frame data . The frame is picked from the data system by index. The index is coded across all the sets.
- Parameters:
- index
index of the frame
- num_worker
number of workers for parallel data modification
- get_item_paddle(index: int, num_worker: int = 1) dict[source]#
Get a single frame data . The frame is picked from the data system by index. The index is coded across all the sets. Same with PyTorch backend.
- Parameters:
- index
index of the frame
- num_worker
number of workers for parallel data modification
- get_batch(batch_size: int) dict[source]#
Get a batch of data with batch_size frames. The frames are randomly picked from the data system.
- Parameters:
- batch_size
size of the batch
- get_test(ntests: int = -1) dict[source]#
Get the test data with ntests frames.
- Parameters:
- ntests
Size of the test data set. If ntests is -1, all test data will be get.
- get_natoms_vec(ntypes: int) numpy.ndarray[source]#
Get number of atoms and number of atoms in different types.
- Parameters:
- ntypes
Number of types (may be larger than the actual number of types in the system).
- Returns:
natomsnatoms[0]: number of local atoms natoms[1]: total number of atoms held by this processor natoms[i]: 2 <= i < Ntypes+2, number of type i atoms
- get_single_frame(index: int, num_worker: int) dict[source]#
Orchestrates loading a single frame efficiently using memmap.
- preload_and_modify_all_data_torch(num_worker: int) None[source]#
Preload all frames and apply modifier to cache them.
This method is useful when use_modifier_cache is True and you want to avoid applying the modifier repeatedly during training.
- _idx_map_sel(atom_type: numpy.ndarray, type_sel: list[int]) numpy.ndarray[source]#
- _get_memmap(path: deepmd.utils.path.DPPath) numpy.memmap[source]#
Get or create a memory-mapped object for a given npy file. Uses file path and modification time as cache keys to detect file changes and invalidate cache when files are modified.
- _load_batch_set(set_name: deepmd.utils.path.DPPath) None[source]#
- _get_nframes(set_name: deepmd.utils.path.DPPath | str) int[source]#
- reformat_data_torch(data: dict[str, Any]) dict[str, Any][source]#
Modify the data format for the requirements of Torch backend.
- Parameters:
- data
original data
- _load_data(set_name: str, key: str, nframes: int, ndof_: int, atomic: bool = False, must: bool = True, repeat: int = 1, high_prec: bool = False, type_sel: list[int] | None = None, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False) numpy.ndarray[source]#
- _load_single_data(set_dir: deepmd.utils.path.DPPath, key: str, frame_idx: int, set_nframes: int) tuple[numpy.float32, numpy.ndarray][source]#
Loads and processes data for a SINGLE frame from a SINGLE key, fully replicating the logic from the original _load_data method.
- _load_type(sys_path: deepmd.utils.path.DPPath) numpy.ndarray[source]#
- _load_type_mix(set_name: deepmd.utils.path.DPPath) numpy.ndarray[source]#
- _make_idx_map(atom_type: numpy.ndarray) numpy.ndarray[source]#
- _check_pbc(sys_path: deepmd.utils.path.DPPath) bool[source]#
- _check_mode(set_path: deepmd.utils.path.DPPath) bool[source]#
- static _create_memmap(path_str: str, mtime_str: str) numpy.memmap[source]#
A cached helper function to create memmap objects. Using lru_cache to limit the number of open file handles.
- Parameters:
- path_str
The file path as a string.
- mtime_str
The modification time as a string, used for cache invalidation.
- class deepmd.tf.utils.DeepmdDataSystem(systems: list[str], batch_size: int, test_size: int, rcut: float | None = None, set_prefix: str = 'set', shuffle_test: bool = True, type_map: list[str] | None = None, optional_type_map: bool = True, modifier: Any | None = None, trn_all_set: bool = False, sys_probs: list[float] | None = None, auto_prob_style: str = 'prob_sys_size', sort_atoms: bool = True)[source]#
Class for manipulating many data systems.
It is implemented with the help of DeepmdData
- system_dirs#
- nsystems#
- data_systems = []#
- batch_size#
- mixed_systems = False#
- sys_ntypes#
- natoms = []#
- natoms_vec = []#
- nbatches = []#
- type_map = []#
- test_size#
- pick_idx = 0#
- sys_probs = None#
- property default_mesh: list[numpy.ndarray]#
Mesh for each system.
- compute_energy_shift(rcond: float | None = None, key: str = 'energy') tuple[numpy.ndarray, numpy.ndarray][source]#
- add_dict(adict: dict[str, dict[str, Any]]) None[source]#
Add items to the data system by a dict. adict should have items like .. code-block:: python.
- adict[key] = {
“ndof”: ndof, “atomic”: atomic, “must”: must, “high_prec”: high_prec, “type_sel”: type_sel, “repeat”: repeat,
}
For the explanation of the keys see add
- add_data_requirements(data_requirements: list[deepmd.utils.data.DataRequirementItem]) None[source]#
Add items to the data system by a list of DataRequirementItem.
- add(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: list[int] | None = None, repeat: int = 1, default: float = 0.0, dtype: numpy.dtype | None = None, output_natoms_for_type_sel: bool = False) None[source]#
Add a data item that to be loaded.
- Parameters:
- key
The key of the item. The corresponding data is stored in sys_path/set.*/key.npy
- ndof
The number of dof
- atomic
The item is an atomic property. If False, the size of the data should be nframes x ndof If True, the size of data should be nframes x natoms x ndof
- must
The data file sys_path/set.*/key.npy must exist. If must is False and the data file does not exist, the data_dict[find_key] is set to 0.0
- high_prec
Load the data and store in float64, otherwise in float32
- type_sel
Select certain type of atoms
- repeat
The data will be repeated repeat times.
- default, default=0.
Default value of data
- dtype
The dtype of data, overwrites high_prec if provided
- output_natoms_for_type_selbool
If True and type_sel is True, the atomic dimension will be natoms instead of nsel
- reduce(key_out: str, key_in: str) None[source]#
Generate a new item from the reduction of another atom.
- Parameters:
- key_out
The name of the reduced item
- key_in
The name of the data item to be reduced
- set_sys_probs(sys_probs: list[float] | None = None, auto_prob_style: str = 'prob_sys_size') None[source]#
- get_batch(sys_idx: int | None = None) dict[source]#
Get a batch of data from the data systems.
- Parameters:
- sys_idx
int The index of system from which the batch is get. If sys_idx is not None, sys_probs and auto_prob_style are ignored If sys_idx is None, automatically determine the system according to sys_probs or auto_prob_style, see the following. This option does not work for mixed systems.
- sys_idx
- Returns:
dictThe batch data
- get_batch_standard(sys_idx: int | None = None) dict[source]#
Get a batch of data from the data systems in the standard way.
- get_batch_mixed() dict[source]#
Get a batch of data from the data systems in the mixed way.
- Returns:
dictThe batch data
- get_test(sys_idx: int | None = None, n_test: int = -1) dict[str, numpy.ndarray][source]#
Get test data from the the data systems.
- Parameters:
- sys_idx
The test dat of system with index sys_idx will be returned. If is None, the currently selected system will be returned.
- n_test
Number of test data. If set to -1 all test data will be get.
- get_sys_ntest(sys_idx: int | None = None) int[source]#
Get number of tests for the currently selected system, or one defined by sys_idx.
- get_sys(idx: int) deepmd.utils.data.DeepmdData[source]#
Get a certain data system.
- class deepmd.tf.utils.LearningRateSchedule(params: dict[str, Any])[source]#
TensorFlow wrapper for BaseLR.
The learning rate is computed via
tf.numpy_function(), which prevents TensorFlow from optimizing this operation in the graph. This overhead is typically negligible compared to forward/backward passes.- _params#
- _base_lr: deepmd.dpmodel.utils.learning_rate.BaseLR | None = None#
- property base_lr: deepmd.dpmodel.utils.learning_rate.BaseLR#
Get the built BaseLR instance.
- Returns:
BaseLRThe built learning rate schedule.
- Raises:
RuntimeErrorIf the schedule has not been built.
- class deepmd.tf.utils.PairTab(filename: str, rcut: float | None = None)[source]#
Pairwise tabulated potential.
- Parameters:
- filename
File name for the short-range tabulated potential. The table is a text data file with (N_t + 1) * N_t / 2 + 1 columes. The first colume is the distance between atoms. The second to the last columes are energies for pairs of certain types. For example we have two atom types, 0 and 1. The columes from 2nd to 4th are for 0-0, 0-1 and 1-1 correspondingly.
- rcut
float,optional cutoff raduis for the tabulated potential
- data_type#
- reinit(filename: str, rcut: float | None = None) None[source]#
Initialize the tabulated interaction.
- Parameters:
- filename
File name for the short-range tabulated potential. The table is a text data file with (N_t + 1) * N_t / 2 + 1 columes. The first colume is the distance between atoms. The second to the last columes are energies for pairs of certain types. For example we have two atom types, 0 and 1. The columes from 2nd to 4th are for 0-0, 0-1 and 1-1 correspondingly.
- rcut
float,optional cutoff raduis for the tabulated potential
- _check_table_upper_boundary() None[source]#
Update User Provided Table Based on rcut.
This function checks the upper boundary provided in the table against rcut. If the table upper boundary values decay to zero before rcut, padding zeros will be added to the table to cover rcut; if the table upper boundary values do not decay to zero before ruct, extrapolation will be performed till rcut.
Examples
- table = [[0.005 1. 2. 3. ]
[0.01 0.8 1.6 2.4 ] [0.015 0. 1. 1.5 ]]
rcut = 0.022
- new_table = [[0.005 1. 2. 3. ]
[0.01 0.8 1.6 2.4 ] [0.015 0. 1. 1.5 ] [0.02 0. 0. 0. ]
- table = [[0.005 1. 2. 3. ]
[0.01 0.8 1.6 2.4 ] [0.015 0.5 1. 1.5 ] [0.02 0.25 0.4 0.75 ] [0.025 0. 0.1 0. ] [0.03 0. 0. 0. ]]
rcut = 0.031
- new_table = [[0.005 1. 2. 3. ]
[0.01 0.8 1.6 2.4 ] [0.015 0.5 1. 1.5 ] [0.02 0.25 0.4 0.75 ] [0.025 0. 0.1 0. ] [0.03 0. 0. 0. ] [0.035 0. 0. 0. ]]
- _extrapolate_table(pad_extrapolation: numpy.array) numpy.array[source]#
Soomth extrapolation between table upper boundary and rcut.
This method should only be used when the table upper boundary rmax is smaller than rcut, and the table upper boundary values are not zeros. To simplify the problem, we use a single cubic spline between rmax and rcut for each pair of atom types. One can substitute this extrapolation to higher order polynomials if needed.
- There are two scenarios:
- ruct - rmax >= hh:
Set values at the grid point right before rcut to 0, and perform exterapolation between the grid point and rmax, this allows smooth decay to 0 at rcut.
- rcut - rmax < hh:
Set values at rmax + hh to 0, and perform extrapolation between rmax and rmax + hh.
- Parameters:
- pad_extrapolation
np.array The emepty grid that holds the extrapolation values.
- pad_extrapolation
- Returns:
np.arrayThe cubic spline extrapolation.
- _make_data() numpy.ndarray[source]#