class HERReplayBuffer(size: int, compute_reward_fn: Callable[[ndarray, ndarray], ndarray], horizon: int, future_k: float = 8.0, **kwargs: Any)[source]#

Implementation of Hindsight Experience Replay. arXiv:1707.01495.

HERReplayBuffer is to be used with goal-based environment where the observation is a dictionary with keys observation, achieved_goal and desired_goal. Currently support only HER’s future strategy, online sampling.

  • size – the size of the replay buffer.

  • compute_reward_fn – a function that takes 2 np.array arguments, acheived_goal and desired_goal, and returns rewards as np.array. The two arguments are of shape (batch_size, …original_shape) and the returned rewards must be of shape (batch_size,).

  • horizon – the maximum number of steps in an episode.

  • future_k – the ‘k’ parameter introduced in the paper. In short, there will be at most k episodes that are re-written for every 1 unaltered episode during the sampling.

See also

Please refer to ReplayBuffer for other APIs’ usage.

add(batch: RolloutBatchProtocol, buffer_ids: ndarray | list[int] | None = None) tuple[ndarray, ndarray, ndarray, ndarray][source]#

Add a batch of data into replay buffer.

  • batch – the input data batch. “obs”, “act”, “rew”, “terminated”, “truncated” are required keys.

  • buffer_ids – to make consistent with other buffer’s add function; if it is not None, we assume the input batch’s first dimension is always 1.

Return (current_index, episode_reward, episode_length, episode_start_index). If the episode is not finished, the return value of episode_length and episode_reward is 0.

reset(keep_statistics: bool = False) None[source]#

Clear all the data in replay buffer and episode statistics.

rewrite_transitions(indices: ndarray) None[source]#

Re-write the goal of some sampled transitions’ episodes according to HER.

Currently applies only HER’s ‘future’ strategy. The new goals will be written directly to the internal batch data temporarily and will be restored right before the next sampling or when using some of the buffer’s method (e.g. add, save_hdf5, etc.). This is to make sure that n-step returns calculation etc., performs correctly without additional alteration.

sample_indices(batch_size: int | None) ndarray[source]#

Get a random sample of index with size = batch_size.

Return all available indices in the buffer if batch_size is 0; return an empty numpy array if batch_size < 0 or no available index can be sampled. Additionally, some episodes of the sampled transitions will be re-written according to HER.

save_hdf5(path: str, compression: str | None = None) None[source]#

Save replay buffer to HDF5 file.

set_batch(batch: RolloutBatchProtocol) None[source]#

Manually choose the batch you want the ReplayBuffer to manage.

update(buffer: HERReplayBuffer | ReplayBuffer) ndarray[source]#

Move the data from the given buffer to current buffer.

Return the updated indices. If update fails, return an empty array.