dpdata.lmdb package#
- class dpdata.lmdb.LMDBFormat[source]#
Bases:
FormatClass for handling the LMDB format, which stores atomic configurations in a Lightning Memory-Mapped Database (LMDB).
This format is optimized for machine learning workflows where fast, random access to a large number of frames is required. All frames from multiple systems (with potentially different numbers of atoms) are stored in a single LMDB database file.
Both single systems and multiple systems are supported via the standard
dpdataAPIs.Methods
MultiModes()File mode for MultiSystems.
from_bond_order_system(file_name, **kwargs)Implement BondOrderSystem.from that converts from this format to BondOrderSystem.
from_labeled_system(file_name, **kwargs)Load data for a single LabeledSystem from an LMDB database.
from_multi_systems(file_name[, map_size])Load multiple systems from a single LMDB database.
from_system(file_name, **kwargs)Load data for a single System from an LMDB database.
get_formats()Get all registered formats.
get_from_methods()Get all registered from methods.
get_to_methods()Get all registered to methods.
mix_system(*system, type_map, **kwargs)Mix the systems into mixed_type ones according to the unified given type_map.
post(func_name)Register a post function for from method.
register(key)Register a format plugin.
register_from(key)Register a from method if the target method name is not default.
register_to(key)Register a to method if the target method name is not default.
to_bond_order_system(data, rdkit_mol, *args, ...)Implement BondOrderSystem.to that converts from BondOrderSystem to this format.
to_labeled_system(data, file_name, **kwargs)Save a single LabeledSystem to an LMDB database.
to_multi_systems(formulas, directory[, ...])Implement MultiSystems.to for LMDB format.
to_system(data, file_name, **kwargs)Save a single System to an LMDB database.
Examples
Saving a single LabeledSystem
>>> import dpdata >>> system = dpdata.LabeledSystem("path/to/input.vasp", fmt="vasp/outcar") >>> system.to("lmdb", "my_single_system.lmdb")
Loading a single LabeledSystem
>>> loaded_system = dpdata.LabeledSystem("my_single_system.lmdb", fmt="lmdb")
Saving multiple systems to a single LMDB database
>>> import dpdata >>> system_1 = dpdata.LabeledSystem("path/to/system1/OUTCAR", fmt="vasp/outcar") >>> system_2 = dpdata.LabeledSystem("path/to/system2/OUTCAR", fmt="vasp/outcar") >>> multi_systems_obj = dpdata.MultiSystems(system_1, system_2) >>> multi_systems_obj.to("lmdb", "my_multi_system_db.lmdb")
Loading multiple systems from a single LMDB database
>>> import dpdata >>> loaded_multi_systems = dpdata.MultiSystems.from_file("my_multi_system_db.lmdb", fmt="lmdb")
- from_labeled_system(file_name, **kwargs)[source]#
Load data for a single LabeledSystem from an LMDB database.
- from_multi_systems(file_name, map_size=1000000000, **kwargs)[source]#
Load multiple systems from a single LMDB database.
- Parameters:
- file_namestr
The path to the LMDB database directory.
- map_sizeint, optional
Maximum size of the LMDB database in bytes.
- **kwargsdict
other parameters
- Yields:
- dict
data dictionary for each system
- to_labeled_system(data, file_name, **kwargs)[source]#
Save a single LabeledSystem to an LMDB database.
- to_multi_systems(formulas, directory, map_size=1000000000, frame_idx_fmt='012d', **kwargs)[source]#
Implement MultiSystems.to for LMDB format.
- Parameters:
- formulaslist[str]
list of formulas
- directorystr
directory of system
- map_sizeint, optional
Maximum size of the LMDB database in bytes. Default is 1GB.
- frame_idx_fmtstr, optional
The format string used to encode the frame index as a key. Default is “012d”.
- **kwargsdict
other parameters
- Yields:
- tuple
(self, formula) to be used by to_system
Submodules#
dpdata.lmdb.format module#
- class dpdata.lmdb.format.LMDBFormat[source]#
Bases:
FormatClass for handling the LMDB format, which stores atomic configurations in a Lightning Memory-Mapped Database (LMDB).
This format is optimized for machine learning workflows where fast, random access to a large number of frames is required. All frames from multiple systems (with potentially different numbers of atoms) are stored in a single LMDB database file.
Both single systems and multiple systems are supported via the standard
dpdataAPIs.Methods
MultiModes()File mode for MultiSystems.
from_bond_order_system(file_name, **kwargs)Implement BondOrderSystem.from that converts from this format to BondOrderSystem.
from_labeled_system(file_name, **kwargs)Load data for a single LabeledSystem from an LMDB database.
from_multi_systems(file_name[, map_size])Load multiple systems from a single LMDB database.
from_system(file_name, **kwargs)Load data for a single System from an LMDB database.
get_formats()Get all registered formats.
get_from_methods()Get all registered from methods.
get_to_methods()Get all registered to methods.
mix_system(*system, type_map, **kwargs)Mix the systems into mixed_type ones according to the unified given type_map.
post(func_name)Register a post function for from method.
register(key)Register a format plugin.
register_from(key)Register a from method if the target method name is not default.
register_to(key)Register a to method if the target method name is not default.
to_bond_order_system(data, rdkit_mol, *args, ...)Implement BondOrderSystem.to that converts from BondOrderSystem to this format.
to_labeled_system(data, file_name, **kwargs)Save a single LabeledSystem to an LMDB database.
to_multi_systems(formulas, directory[, ...])Implement MultiSystems.to for LMDB format.
to_system(data, file_name, **kwargs)Save a single System to an LMDB database.
Examples
Saving a single LabeledSystem
>>> import dpdata >>> system = dpdata.LabeledSystem("path/to/input.vasp", fmt="vasp/outcar") >>> system.to("lmdb", "my_single_system.lmdb")
Loading a single LabeledSystem
>>> loaded_system = dpdata.LabeledSystem("my_single_system.lmdb", fmt="lmdb")
Saving multiple systems to a single LMDB database
>>> import dpdata >>> system_1 = dpdata.LabeledSystem("path/to/system1/OUTCAR", fmt="vasp/outcar") >>> system_2 = dpdata.LabeledSystem("path/to/system2/OUTCAR", fmt="vasp/outcar") >>> multi_systems_obj = dpdata.MultiSystems(system_1, system_2) >>> multi_systems_obj.to("lmdb", "my_multi_system_db.lmdb")
Loading multiple systems from a single LMDB database
>>> import dpdata >>> loaded_multi_systems = dpdata.MultiSystems.from_file("my_multi_system_db.lmdb", fmt="lmdb")
- from_labeled_system(file_name, **kwargs)[source]#
Load data for a single LabeledSystem from an LMDB database.
- from_multi_systems(file_name, map_size=1000000000, **kwargs)[source]#
Load multiple systems from a single LMDB database.
- Parameters:
- file_namestr
The path to the LMDB database directory.
- map_sizeint, optional
Maximum size of the LMDB database in bytes.
- **kwargsdict
other parameters
- Yields:
- dict
data dictionary for each system
- to_labeled_system(data, file_name, **kwargs)[source]#
Save a single LabeledSystem to an LMDB database.
- to_multi_systems(formulas, directory, map_size=1000000000, frame_idx_fmt='012d', **kwargs)[source]#
Implement MultiSystems.to for LMDB format.
- Parameters:
- formulaslist[str]
list of formulas
- directorystr
directory of system
- map_sizeint, optional
Maximum size of the LMDB database in bytes. Default is 1GB.
- frame_idx_fmtstr, optional
The format string used to encode the frame index as a key. Default is “012d”.
- **kwargsdict
other parameters
- Yields:
- tuple
(self, formula) to be used by to_system