cntk.contrib.deeprl.agent.policy_gradient module

Actor-Critic Policy Gradient.

class ActorCritic(config_filename, o_space, a_space)[source]

Bases: cntk.contrib.deeprl.agent.agent.AgentBaseClass

Actor-Critic Policy Gradient.

See https://arxiv.org/pdf/1602.01783.pdf for a description of algorithm.

end(reward, next_state)[source]

Last observed reward/state of the episode (which then terminates).

Parameters:
  • reward (float) – amount of reward returned after previous action.
  • next_state (object) – observation provided by the environment.
save(filename)[source]

Save model to file.

save_parameter_settings(filename)[source]

Save parameter settings to file.

set_as_best_model()[source]

Copy current model to best model.

start(state)[source]

Start a new episode.

Parameters:state (object) – observation provided by the environment.
Returns:action choosen by agent. debug_info (dict): auxiliary diagnostic information.
Return type:action (int)
step(reward, next_state)[source]

Observe one transition and choose an action.

Parameters:
  • reward (float) – amount of reward returned after previous action.
  • next_state (object) – observation provided by the environment.
Returns:

action choosen by agent. debug_info (dict): auxiliary diagnostic information.

Return type:

action (int)