cntk.contrib.deeprl.agent.shared.replay_memory module

Replay memory for Q learning.

class ReplayMemory(capacity, prioritized=False)[source]

Bases: object

Replay memory to store samples of experience.

Each transition is represented as (state, action, reward, next_state, priority) tuple. ‘priority’ is ignored for non-prioritized experience replay.

sample_minibatch(batch_size)[source]

Sample minibatch of size batch_size.

size()[source]

Return the current number of transitions.

store(*args)[source]

Store a transition in replay memory.

If the memory is full, the oldest one gets overwritten.

update_priority(map_from_position_to_priority)[source]

Update priority of transitions.

Parameters:map_from_position_to_priority – dictionary mapping position of transition to its new priority. position should come from tuples returned by sample_minibatch().