deepmd.train package

Submodules

deepmd.train.run_options module

Module taking care of important package constants.

class deepmd.train.run_options.RunOptions(init_model: Optional[str] = None, init_frz_model: Optional[str] = None, finetune: Optional[str] = None, restart: Optional[str] = None, log_path: Optional[str] = None, log_level: int = 0, mpi_log: str = 'master')[source]

Bases: object

Class with info on how to run training (cluster, MPI and GPU config).

Attributes

gpus: Optional[List[int]]: list of GPUs if any are present else None
is_chief: bool: in distribured training it is true for tha main MPI process in serail it is always true
world_size: int: total worker count
my_rank: int: index of the MPI task
nodename: str: name of the node
node_list_List[str]: the list of nodes of the current mpirun
my_device: str: deviice type - gpu or cpu

Methods

print_resource_summary()

Print build and current running cluster configuration summary.

gpus: Optional[List[int]]

property is_chief: Whether my rank is 0.

my_device: str

my_rank: int

nodelist: List[int]

nodename: str

print_resource_summary()[source]: Print build and current running cluster configuration summary.

world_size: int

deepmd.train.trainer module

class deepmd.train.trainer.DPTrainer(jdata, run_opt, is_compress=False)[source]

Bases: object

Methods

save_compressed()

Save the compressed graph.

build
eval_single_list
get_evaluation_results
get_feed_dict
get_global_step
print_header
print_on_training
save_checkpoint
train
valid_on_the_fly

build(data=None, stop_batch=0, origin_type_map=None, suffix='')[source]

static eval_single_list(single_batch_list, loss, sess, get_feed_dict_func, prefix='')[source]

get_evaluation_results(batch_list)[source]

get_feed_dict(batch, is_training)[source]

get_global_step()[source]

static print_header(fp, train_results, valid_results, multi_task_mode=False)[source]

static print_on_training(fp, train_results, valid_results, cur_batch, cur_lr, multi_task_mode=False, cur_lr_dict=None)[source]

save_checkpoint(cur_batch: int)[source]

save_compressed()[source]: Save the compressed graph.

train(train_data=None, valid_data=None)[source]

valid_on_the_fly(fp, train_batches, valid_batches, print_header=False, fitting_key=None)[source]

class deepmd.train.trainer.DatasetLoader(train_data: DeepmdDataSystem)[source]

Bases: object

Generate an OP that loads the training data from the given DeepmdDataSystem.

It can be used to load the training data in the training process, so there is no waiting time between training steps.

Parameters

train_dataDeepmdDataSystem: The training data.

Examples

>>> loader = DatasetLoader(train_data)
>>> data_op = loader.build()
>>> with tf.Session() as sess:
>>>     data_list = sess.run(data_op)
>>> data_dict = loader.get_data_dict(data_list)

Methods

`build`()	Build the OP that loads the training data.
`get_data_dict`(batch_list)	Generate a dict of the loaded data.

build() → List[Tensor][source]

Build the OP that loads the training data.

Returns

List[tf.Tensor]: Tensor of the loaded data.

get_data_dict(batch_list: List[ndarray]) → Dict[str, ndarray][source]

Generate a dict of the loaded data.

Parameters

batch_listList[np.ndarray]: The loaded data.

Returns

Dict[str, np.ndarray]: The dict of the loaded data.