cntk.contrib.deeprl.agent.shared.models module¶

A set of predefined models used by Q learning or Actor-Critic.

class Models[source]¶

Bases: object

A set of predefined models to approximate Q or log of pi (policy).

The loss function needs to be ‘cross_entropy_with_softmax’ for policy gradient methods.

static dueling_network(shape_of_inputs, number_of_outputs, model_hidden_layers, loss_function=None, use_placeholder_for_input=False)[source]¶

Dueling network to approximate Q function.

See paper at https://arxiv.org/pdf/1511.06581.pdf.

Parameters:

shape_of_inputs – tuple of array (input) dimensions.
number_of_outputs – dimension of output, equals the number of possible actions.
model_hidden_layers – in the form of “[comma-separated integers, [comma-separated integers], [comma-separated integers]]”. Each integer is the number of nodes in a hidden layer.The first set of integers represent the shared component in dueling network. The second set correponds to the state value function V and the third set correponds to the advantage function A.
loss_function – if not specified, use squared loss by default.
use_placeholder_for_input – if true, inputs have to be replaced later with actual input_variable.

Returns: a Python dictionary with string-valued keys including: ‘inputs’, ‘outputs’, ‘loss’ and ‘f’.

static feedforward_network(shape_of_inputs, number_of_outputs, model_hidden_layers, loss_function=None, use_placeholder_for_input=False)[source]¶

Feedforward network to approximate Q or log of pi.

Parameters:

shape_of_inputs – tuple of array (input) dimensions.
number_of_outputs – dimension of output, equals the number of possible actions.
model_hidden_layers – string representing a list of integers corresponding to number of nodes in each hidden layer.
loss_function – if not specified, use squared loss by default.
use_placeholder_for_input – if true, inputs have to be replaced later with actual input_variable.

Returns: a Python dictionary with string valued keys including: ‘inputs’, ‘outputs’, ‘loss’ and ‘f’.