cntk.contrib.deeprl.agent.policy_gradient module¶

Actor-Critic Policy Gradient.

class ActorCritic(config_filename, o_space, a_space)[source]¶

Actor-Critic Policy Gradient.

end(reward, next_state)[source]¶

Last observed reward/state of the episode (which then terminates).

Parameters:	reward (float) – amount of reward returned after previous action. next_state (object) – observation provided by the environment.

save_parameter_settings(filename)[source]¶: Save parameter settings to file.

start(state)[source]¶

Start a new episode.

Parameters:	state (object) – observation provided by the environment.
Returns:	action choosen by agent. debug_info (dict): auxiliary diagnostic information.
Return type:	action (int)

step(reward, next_state)[source]¶

Observe one transition and choose an action.

Parameters:	reward (float) – amount of reward returned after previous action. next_state (object) – observation provided by the environment.
Returns:	action choosen by agent. debug_info (dict): auxiliary diagnostic information.
Return type:	action (int)