deepmd.dpmodel.utils.stat

deepmd.dpmodel.utils.stat#

Output statistics computation for dpmodel backend.

Attributes#

log

Functions#

`collect_observed_types`(→ list[str])	Collect observed element types from sampled training data.
`_restore_observed_type_from_file`(→ list[str] \| None)	Try to load observed_type from stat file.
`_save_observed_type_to_file`(→ None)	Save observed_type to stat file.
`_restore_from_file`(→ tuple[dict \| None, dict \| None])	Restore bias and std from stat file.
`_save_to_file`(→ None)	Save bias and std to stat file.
`_post_process_stat`(→ tuple[dict, dict])	Post process the statistics.
`_make_preset_out_bias`(→ numpy.ndarray \| None)	Make preset out bias.
`_fill_stat_with_global`(→ numpy.ndarray \| None)	This function is used to fill atomic stat with global stat.
`_compute_model_predict`(→ dict[str, list[numpy.ndarray]])	Compute model predictions for all samples.
`compute_output_stats`(→ tuple[dict, dict])	Compute the output statistics (e.g. energy bias) for the fitting net from packed data.
`_compute_output_stats_global`(→ tuple[dict[str, ...)	This function only handle stat computation from reduced global labels.
`_compute_output_stats_atomic`(→ tuple[dict[str, ...)	Compute output statistics from atomic labels.

Module Contents#

deepmd.dpmodel.utils.stat.log[source]#

deepmd.dpmodel.utils.stat.collect_observed_types(sampled: list[dict], type_map: list[str]) → list[str][source]#

Collect observed element types from sampled training data.

Parameters:

sampledlist[dict]: Sampled data from different data systems. Each dict must contain "atype" with shape [nframes, natoms].
type_maplist[str]: Mapping from type index to element symbol.

Returns:

list[str]: Sorted list of observed element symbols.

deepmd.dpmodel.utils.stat._restore_observed_type_from_file(stat_file_path: deepmd.utils.path.DPPath | None) → list[str] | None[source]#: Try to load observed_type from stat file.

deepmd.dpmodel.utils.stat._save_observed_type_to_file(stat_file_path: deepmd.utils.path.DPPath | None, observed_type: list[str]) → None[source]#: Save observed_type to stat file.

deepmd.dpmodel.utils.stat._restore_from_file(stat_file_path: deepmd.utils.path.DPPath, keys: list[str]) → tuple[dict | None, dict | None][source]#: Restore bias and std from stat file.

deepmd.dpmodel.utils.stat._save_to_file(stat_file_path: deepmd.utils.path.DPPath, bias_out: dict, std_out: dict) → None[source]#: Save bias and std to stat file.

deepmd.dpmodel.utils.stat._post_process_stat(out_bias: dict, out_std: dict) → tuple[dict, dict][source]#

Post process the statistics.

For global statistics, we do not have the std for each type of atoms, thus fake the output std by ones for all the types. If the shape of out_std is already the same as out_bias, we do not need to do anything.

deepmd.dpmodel.utils.stat._make_preset_out_bias(ntypes: int, ibias: list[numpy.ndarray | None]) → numpy.ndarray | None[source]#

Make preset out bias.

output:: a np array of shape [ntypes, *(odim0, odim1, …)] is any item is not None None if all items are None.

deepmd.dpmodel.utils.stat._fill_stat_with_global(atomic_stat: numpy.ndarray | None, global_stat: numpy.ndarray) → numpy.ndarray | None[source]#

This function is used to fill atomic stat with global stat.

Parameters:

atomic_statUnion[np.ndarray, None]: The atomic stat.
global_statnp.ndarray: The global stat.
if the atomic stat is None, use global stat.
if the atomic stat is not None, but has nan values (missing atypes), fill with global stat.

deepmd.dpmodel.utils.stat._compute_model_predict(sampled: list[dict], keys: list[str], model_forward: collections.abc.Callable) → dict[str, list[numpy.ndarray]][source]#: Compute model predictions for all samples.

deepmd.dpmodel.utils.stat.compute_output_stats(merged: collections.abc.Callable[[], list[dict]] | list[dict], ntypes: int, keys: str | list[str], stat_file_path: deepmd.utils.path.DPPath | None = None, rcond: float | None = None, preset_bias: dict[str, list[numpy.ndarray | None]] | None = None, model_forward: collections.abc.Callable | None = None, stats_distinguish_types: bool = True, intensive: bool = False) → tuple[dict, dict][source]#

Compute the output statistics (e.g. energy bias) for the fitting net from packed data.

Parameters:

mergedUnion[Callable[[], list[dict]], list[dict]]

list[dict]: A list of data samples from various data systems.
Each element, merged[i], is a data dictionary containing keys: np.ndarray originating from the i-th data system.
Callable[[], list[dict]]: A lazy function that returns data samples in the above format
only when needed. Since the sampling process can be slow and memory-intensive, the lazy function helps by only sampling once.

ntypesint

The number of atom types.

keysUnion[str, list[str]]

The keys of the output properties to compute statistics for.

stat_file_pathDPPath, optional

The path to the stat file.

rcondfloat, optional

The condition number for the regression of atomic energy.

preset_biasdict[str, list[Optional[np.ndarray]]], optional

Specifying atomic energy contribution in vacuum. Given by key:value pairs. The value is a list specifying the bias. the elements can be None or np.ndarray of output shape. For example: [None, [2.]] means type 0 is not set, type 1 is set to [2.] The set_davg_zero key in the descriptor should be set.

model_forwardCallable, optional

The wrapped forward function of atomic model. If not None, the model will be utilized to generate the original energy prediction, which will be subtracted from the energy label of the data. The difference will then be used to calculate the delta complement energy bias for each type.

stats_distinguish_typesbool, optional

Whether to distinguish different element types in the statistics.

intensivebool, optional

Whether the fitting target is intensive.

deepmd.dpmodel.utils.stat._compute_output_stats_global(sampled: list[dict], ntypes: int, keys: list[str], rcond: float | None = None, preset_bias: dict[str, list[numpy.ndarray | None]] | None = None, global_sampled_idx: dict | None = None, stats_distinguish_types: bool = True, intensive: bool = False, model_pred: dict[str, numpy.ndarray] | None = None) → tuple[dict[str, numpy.ndarray], dict[str, numpy.ndarray]][source]#: This function only handle stat computation from reduced global labels.

deepmd.dpmodel.utils.stat._compute_output_stats_atomic(sampled: list[dict], ntypes: int, keys: list[str], atomic_sampled_idx: dict | None = None, model_pred: dict[str, numpy.ndarray] | None = None) → tuple[dict[str, numpy.ndarray], dict[str, numpy.ndarray]][source]#: Compute output statistics from atomic labels.