her

her#

Source code: tianshou/data/buffer/her.py

class HERReplayBuffer(size: int, compute_reward_fn: Callable[[ndarray, ndarray], ndarray], horizon: int, future_k: float = 8.0, **kwargs: Any)[source]#

Bases: ReplayBuffer

Implementation of Hindsight Experience Replay. arXiv:1707.01495.

HERReplayBuffer is to be used with goal-based environment where the observation is a dictionary with keys observation, achieved_goal and desired_goal. Currently support only HER’s future strategy, online sampling.

Parameters:

size – the size of the replay buffer.
compute_reward_fn – a function that takes 2 np.array arguments, acheived_goal and desired_goal, and returns rewards as np.array. The two arguments are of shape (batch_size, …original_shape) and the returned rewards must be of shape (batch_size,).
horizon – the maximum number of steps in an episode.
future_k – the ‘k’ parameter introduced in the paper. In short, there will be at most k episodes that are re-written for every 1 unaltered episode during the sampling.

See also

Please refer to ReplayBuffer for other APIs’ usage.

reset(keep_statistics: bool = False) → None[source]#: Clear all the data in replay buffer and episode statistics.

save_hdf5(path: str, compression: str | None = None) → None[source]#: Save replay buffer to HDF5 file.

set_batch(batch: RolloutBatchProtocol) → None[source]#: Manually choose the batch you want the ReplayBuffer to manage.

update(buffer: HERReplayBuffer | ReplayBuffer) → ndarray[source]#

Move the data from the given buffer to current buffer.

Return the updated indices. If update fails, return an empty array.

add(batch: RolloutBatchProtocol, buffer_ids: ndarray | list[int] | None = None) → tuple[ndarray, ndarray, ndarray, ndarray][source]#

Add a batch of data into replay buffer.

Parameters:

batch – the input data batch. “obs”, “act”, “rew”, “terminated”, “truncated” are required keys.
buffer_ids – id’s of subbuffers, allowed here to be consistent with classes similar to VectorReplayBuffer. Since the ReplayBuffer has a single subbuffer, if this is not None, it must be a single element with value 0. In that case, the batch is expected to have the shape (1, len(data)). Failure to adhere to this will result in a ValueError.

Return (current_index, episode_return, episode_length, episode_start_index). If the episode is not finished, the return value of episode_length and episode_reward is 0.

sample_indices(batch_size: int | None) → ndarray[source]#

Get a random sample of index with size = batch_size.

Return all available indices in the buffer if batch_size is 0; return an empty numpy array if batch_size < 0 or no available index can be sampled. Additionally, some episodes of the sampled transitions will be re-written according to HER.

rewrite_transitions(indices: ndarray) → None[source]#

Re-write the goal of some sampled transitions’ episodes according to HER.

Currently applies only HER’s ‘future’ strategy. The new goals will be written directly to the internal batch data temporarily and will be restored right before the next sampling or when using some of the buffer’s method (e.g. add, save_hdf5, etc.). This is to make sure that n-step returns calculation etc., performs correctly without additional alteration.

her

Contents

her#