cntk.train.distributed module

Distributed learners manage learners in distributed environment.

class Communicator(*args, **kwargs)[source]

Bases: cntk.cntk_py.DistributedCommunicator

A communicator interface exposing communication primitives that serve as building blocks for distributed training.

barrier()[source]

Sync point to make sure all workers reach the same state.

current_worker()[source]

Returns worker descriptor of current process.

Returns:descriptor of current process.
Return type:WorkerDescriptor
static finalize()[source]

Should be called when all communication is finished. No more communication should happen after this call.

is_main()[source]

Indicates if the current communicator is instantiated on the main node. The node with rank 0 is considered the main.

static num_workers()[source]

Returns information about all MPI workers.

static rank()[source]

Returns rank of current process.

workers()[source]

Returns workers in this communicator.

Returns:workers in this communicator.
Return type:(list) of WorkerDescriptor
class DistributedLearner(*args, **kwargs)[source]

Bases: cntk.cntk_py.DistributedLearner

A distributed learner that handles data like gradients/momentums across multiple MPI workers

communicator()[source]

Returns the distributed communicator that talks to other MPI workers

Returns:descriptor of current process.
Return type:Communicator
total_number_of_samples_seen

The number of samples seen by the distributed learner.

class WorkerDescriptor[source]

Bases: cntk.cntk_py.DistributedWorkerDescriptor

Distributed worker descriptor, returned by Communicator instance.

global_rank

The global rank of the worker.

host_id

The host id of the worker.

block_momentum_distributed_learner(learner, block_size, block_momentum_as_time_constant=None, use_nestrov_momentum=True, reset_sgd_momentum_after_aggregation=True, block_learning_rate=1.0, distributed_after=0)[source]

Creates a block momentum distributed learner. See [1] for more information.

Block Momentum divides the full dataset into M non-overlapping blocks, and each block is partitioned into N non-overlapping splits.

During training, a random, unprocessed block is randomly taken by the trainer and the N partitions of this block are dispatched on the workers.

Parameters:
  • learner – a local learner (i.e. sgd)
  • block_size (int) – size of the partition in samples
  • block_momentum_as_time_constant (float) – block momentum as time constant
  • use_nestrov_momentum (bool) – use nestrov momentum
  • reset_sgd_momentum_after_aggregation (bool) – reset SGD momentum after aggregation
  • block_learning_rate (float) – block learning rate
  • distributed_after (int) – number of samples after which distributed training starts
Returns:

a distributed learner instance

data_parallel_distributed_learner(learner, distributed_after=0, num_quantization_bits=32, use_async_buffered_parameter_update=False)[source]

Creates a data parallel distributed learner

Parameters:
  • learner – a local learner (i.e. sgd)
  • distributed_after (int) – number of samples after which distributed training starts
  • num_quantization_bits (int) – number of bits for quantization (1 to 32)
  • use_async_buffered_parameter_update (bool) – use async buffered parameter update, currently must be False
Returns:

a distributed learner instance

mpi_communicator()[source]

Creates a non quantized MPI communicator.