manager#


class ReplayBufferManager(buffer_list: list[ReplayBuffer] | list[HERReplayBuffer])[source]#

Bases: ReplayBuffer

ReplayBufferManager contains a list of ReplayBuffer with exactly the same configuration.

These replay buffers have contiguous memory layout, and the storage space each buffer has is a shallow copy of the topmost memory.

Parameters:

buffer_list – a list of ReplayBuffer needed to be handled.

See also

Please refer to ReplayBuffer for other APIs’ usage.

property subbuffer_edges: ndarray#

Edges of contained buffers, mostly needed as part of the VectorReplayBuffer interface.

For the standard ReplayBuffer it is always [0, maxsize]. Transitions can be added to the buffer indefinitely, and one episode can “go over the edge”. Having the edges available is useful for fishing out whole episodes from the buffer and for input validation.

reset(keep_statistics: bool = False) None[source]#

Clear all the data in replay buffer and episode statistics.

set_batch(batch: RolloutBatchProtocol) None[source]#

Manually choose the batch you want the ReplayBuffer to manage.

unfinished_index() ndarray[source]#

Return the index of unfinished episode.

prev(index: int | ndarray) ndarray[source]#

Return the index of previous transition.

The index won’t be modified if it is the beginning of an episode.

next(index: int | ndarray) ndarray[source]#

Return the index of next transition.

The index won’t be modified if it is the end of an episode.

update(buffer: ReplayBuffer) ndarray[source]#

The ReplayBufferManager cannot be updated by any buffer.

add(batch: RolloutBatchProtocol, buffer_ids: ndarray | list[int] | None = None) tuple[ndarray, ndarray, ndarray, ndarray][source]#

Add a batch of data into ReplayBufferManager.

Each of the data’s length (first dimension) must equal to the length of buffer_ids. By default buffer_ids is [0, 1, …, buffer_num - 1].

Return (current_index, episode_reward, episode_length, episode_start_index). If the episode is not finished, the return value of episode_length and episode_reward is 0.

sample_indices(batch_size: int | None) ndarray[source]#

Get a random sample of index with size = batch_size.

Return all available indices in the buffer if batch_size is 0; return an empty numpy array if batch_size < 0 or no available index can be sampled.

Parameters:

batch_size – the number of indices to be sampled. If None, it will be set to the length of the buffer (i.e. return all available indices in a random order).

class PrioritizedReplayBufferManager(buffer_list: Sequence[PrioritizedReplayBuffer])[source]#

Bases: PrioritizedReplayBuffer, ReplayBufferManager

PrioritizedReplayBufferManager contains a list of PrioritizedReplayBuffer with exactly the same configuration.

These replay buffers have contiguous memory layout, and the storage space each buffer has is a shallow copy of the topmost memory.

Parameters:

buffer_list – a list of PrioritizedReplayBuffer needed to be handled.

See also

Please refer to ReplayBuffer for other APIs’ usage.

class HERReplayBufferManager(buffer_list: list[HERReplayBuffer])[source]#

Bases: ReplayBufferManager

HERReplayBufferManager contains a list of HERReplayBuffer with exactly the same configuration.

These replay buffers have contiguous memory layout, and the storage space each buffer has is a shallow copy of the topmost memory.

Parameters:

buffer_list – a list of HERReplayBuffer needed to be handled.

See also

Please refer to ReplayBuffer for other APIs’ usage.

save_hdf5(path: str, compression: str | None = None) None[source]#

Save replay buffer to HDF5 file.

set_batch(batch: RolloutBatchProtocol) None[source]#

Manually choose the batch you want the ReplayBuffer to manage.

update(buffer: HERReplayBuffer | ReplayBuffer) ndarray[source]#

The ReplayBufferManager cannot be updated by any buffer.

add(batch: RolloutBatchProtocol, buffer_ids: ndarray | list[int] | None = None) tuple[ndarray, ndarray, ndarray, ndarray][source]#

Add a batch of data into ReplayBufferManager.

Each of the data’s length (first dimension) must equal to the length of buffer_ids. By default buffer_ids is [0, 1, …, buffer_num - 1].

Return (current_index, episode_reward, episode_length, episode_start_index). If the episode is not finished, the return value of episode_length and episode_reward is 0.