deepmd.utils.data
=================

.. py:module:: deepmd.utils.data


Attributes
----------

.. autoapisummary::

   deepmd.utils.data.log


Classes
-------

.. autoapisummary::

   deepmd.utils.data.DeepmdData
   deepmd.utils.data.DataRequirementItem


Module Contents
---------------

.. py:data:: log

.. py:class:: DeepmdData(sys_path: str, set_prefix: str = 'set', shuffle_test: bool = True, type_map: Optional[list[str]] = None, optional_type_map: bool = True, modifier=None, trn_all_set: bool = False, sort_atoms: bool = True)

   
   Class for a data system.

   It loads data from hard disk, and maintains the data as a `data_dict`

   :Parameters:

       **sys_path**
           Path to the data system

       **set_prefix**
           Prefix for the directories of different sets

       **shuffle_test**
           If the test data are shuffled

       **type_map**
           Gives the name of different atom types

       **optional_type_map**
           If the type_map.raw in each system is optional

       **modifier**
           Data modifier that has the method `modify_data`

       **trn_all_set**
           [DEPRECATED] Deprecated. Now all sets are trained and tested.

       **sort_atoms** : :ref:`bool <python:bltin-boolean-values>`
           Sort atoms by atom types. Required to enable when the data is directly fed to
           descriptors except mixed types.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: dirs


   .. py:attribute:: mixed_type


   .. py:attribute:: atom_type


   .. py:attribute:: natoms


   .. py:attribute:: type_map


   .. py:attribute:: pbc
      :value: True


   .. py:attribute:: enforce_type_map
      :value: False


   .. py:attribute:: sort_atoms
      :value: True


   .. py:attribute:: idx_map


   .. py:attribute:: data_dict


   .. py:attribute:: set_count
      :value: 0


   .. py:attribute:: iterator
      :value: 0


   .. py:attribute:: shuffle_test
      :value: True


   .. py:attribute:: modifier
      :value: None


   .. py:attribute:: nframes


   .. py:attribute:: prefix_sum


   .. py:method:: add(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: Optional[list[int]] = None, repeat: int = 1, default: float = 0.0, dtype: Optional[numpy.dtype] = None, output_natoms_for_type_sel: bool = False)

      
      Add a data item that to be loaded.


      :Parameters:

          **key**
              The key of the item. The corresponding data is stored in `sys_path/set.*/key.npy`

          **ndof**
              The number of dof

          **atomic**
              The item is an atomic property.
              If False, the size of the data should be nframes x ndof
              If True, the size of data should be nframes x natoms x ndof

          **must**
              The data file `sys_path/set.*/key.npy` must exist.
              If must is False and the data file does not exist, the `data_dict[find_key]` is set to 0.0

          **high_prec**
              Load the data and store in float64, otherwise in float32

          **type_sel**
              Select certain type of atoms

          **repeat**
              The data will be repeated `repeat` times.

          **default** : :class:`python:float`, default=0.
              default value of data

          **dtype** : :obj:`np.dtype`, :obj:`optional`
              the dtype of data, overwrites `high_prec` if provided

          **output_natoms_for_type_sel** : :ref:`bool <python:bltin-boolean-values>`, :obj:`optional`
              if True and type_sel is True, the atomic dimension will be natoms instead of nsel


      ..
          !! processed by numpydoc !!


   .. py:method:: reduce(key_out: str, key_in: str)

      
      Generate a new item from the reduction of another atom.


      :Parameters:

          **key_out**
              The name of the reduced item

          **key_in**
              The name of the data item to be reduced


      ..
          !! processed by numpydoc !!


   .. py:method:: get_data_dict() -> dict

      
      Get the `data_dict`.


      ..
          !! processed by numpydoc !!


   .. py:method:: check_batch_size(batch_size)

      
      Check if the system can get a batch of data with `batch_size` frames.


      ..
          !! processed by numpydoc !!


   .. py:method:: check_test_size(test_size)

      
      Check if the system can get a test dataset with `test_size` frames.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_item_torch(index: int) -> dict

      
      Get a single frame data . The frame is picked from the data system by index. The index is coded across all the sets.


      :Parameters:

          **index**
              index of the frame


      ..
          !! processed by numpydoc !!


   .. py:method:: get_batch(batch_size: int) -> dict

      
      Get a batch of data with `batch_size` frames. The frames are randomly picked from the data system.


      :Parameters:

          **batch_size**
              size of the batch


      ..
          !! processed by numpydoc !!


   .. py:method:: get_test(ntests: int = -1) -> dict

      
      Get the test data with `ntests` frames.


      :Parameters:

          **ntests**
              Size of the test data set. If `ntests` is -1, all test data will be get.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_ntypes() -> int

      
      Number of atom types in the system.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_type_map() -> list[str]

      
      Get the type map.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_atom_type() -> list[int]

      
      Get atom types.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_numb_set() -> int

      
      Get number of training sets.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_numb_batch(batch_size: int, set_idx: int) -> int

      
      Get the number of batches in a set.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_sys_numb_batch(batch_size: int) -> int

      
      Get the number of batches in the data system.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_natoms()

      
      Get number of atoms.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_natoms_vec(ntypes: int)

      
      Get number of atoms and number of atoms in different types.


      :Parameters:

          **ntypes**
              Number of types (may be larger than the actual number of types in the system).


      :Returns:

          :obj:`natoms`
              natoms[0]: number of local atoms
              natoms[1]: total number of atoms held by this processor
              natoms[i]: 2 <= i < Ntypes+2, number of type i atoms


      ..
          !! processed by numpydoc !!


   .. py:method:: avg(key)

      
      Return the average value of an item.


      ..
          !! processed by numpydoc !!


   .. py:method:: _idx_map_sel(atom_type, type_sel)


   .. py:method:: _get_natoms_2(ntypes)


   .. py:method:: _get_subdata(data, idx=None)


   .. py:method:: _load_batch_set(set_name: deepmd.utils.path.DPPath) -> None


   .. py:method:: reset_get_batch() -> None


   .. py:method:: _load_test_set(shuffle_test: bool) -> None


   .. py:method:: _shuffle_data(data)


   .. py:method:: _get_nframes(set_name: deepmd.utils.path.DPPath)


   .. py:method:: reformat_data_torch(data)

      
      Modify the data format for the requirements of Torch backend.


      :Parameters:

          **data**
              original data


      ..
          !! processed by numpydoc !!


   .. py:method:: _load_set(set_name: deepmd.utils.path.DPPath)


   .. py:method:: _load_data(set_name, key, nframes, ndof_, atomic=False, must=True, repeat=1, high_prec=False, type_sel=None, default: float = 0.0, dtype: Optional[numpy.dtype] = None, output_natoms_for_type_sel: bool = False)


   .. py:method:: _load_type(sys_path: deepmd.utils.path.DPPath)


   .. py:method:: _load_type_mix(set_name: deepmd.utils.path.DPPath)


   .. py:method:: _make_idx_map(atom_type)


   .. py:method:: _load_type_map(sys_path: deepmd.utils.path.DPPath)


   .. py:method:: _check_pbc(sys_path: deepmd.utils.path.DPPath)


   .. py:method:: _check_mode(set_path: deepmd.utils.path.DPPath)


.. py:class:: DataRequirementItem(key: str, ndof: int, atomic: bool = False, must: bool = False, high_prec: bool = False, type_sel: Optional[list[int]] = None, repeat: int = 1, default: float = 0.0, dtype: Optional[numpy.dtype] = None, output_natoms_for_type_sel: bool = False)

   
   A class to store the data requirement for data systems.


   :Parameters:

       **key**
           The key of the item. The corresponding data is stored in `sys_path/set.*/key.npy`

       **ndof**
           The number of dof

       **atomic**
           The item is an atomic property.
           If False, the size of the data should be nframes x ndof
           If True, the size of data should be nframes x natoms x ndof

       **must**
           The data file `sys_path/set.*/key.npy` must exist.
           If must is False and the data file does not exist, the `data_dict[find_key]` is set to 0.0

       **high_prec**
           Load the data and store in float64, otherwise in float32

       **type_sel**
           Select certain type of atoms

       **repeat**
           The data will be repeated `repeat` times.

       **default** : :class:`python:float`, default=0.
           default value of data

       **dtype** : :obj:`np.dtype`, :obj:`optional`
           the dtype of data, overwrites `high_prec` if provided

       **output_natoms_for_type_sel** : :ref:`bool <python:bltin-boolean-values>`, :obj:`optional`
           if True and type_sel is True, the atomic dimension will be natoms instead of nsel


   ..
       !! processed by numpydoc !!

   .. py:attribute:: key


   .. py:attribute:: ndof


   .. py:attribute:: atomic
      :value: False


   .. py:attribute:: must
      :value: False


   .. py:attribute:: high_prec
      :value: False


   .. py:attribute:: type_sel
      :value: None


   .. py:attribute:: repeat
      :value: 1


   .. py:attribute:: default
      :value: 0.0


   .. py:attribute:: dtype
      :value: None


   .. py:attribute:: output_natoms_for_type_sel
      :value: False


   .. py:attribute:: dict


   .. py:method:: to_dict() -> dict


   .. py:method:: __getitem__(key: str)


   .. py:method:: __eq__(__value: object) -> bool


   .. py:method:: __repr__() -> str