cntk.contrib.deeprl.agent.shared.replay_memory module¶

Replay memory for Q learning.

class ReplayMemory(capacity, prioritized=False)[source]¶

Bases: object

Replay memory to store samples of experience.

Each transition is represented as (state, action, reward, next_state, priority) tuple. ‘priority’ is ignored for non-prioritized experience replay.

sample_minibatch(batch_size)[source]¶: Sample minibatch of size batch_size.

store(*args)[source]¶

Store a transition in replay memory.

If the memory is full, the oldest one gets overwritten.

update_priority(map_from_position_to_priority)[source]¶

Update priority of transitions.

Parameters:	map_from_position_to_priority – dictionary mapping position of transition to its new priority. position should come from tuples returned by sample_minibatch().