atari_wrapper#


get_space_dtype(obs_space: Box) type[floating] | type[integer][source]#

TODO.

class NoopResetEnv(env: Env, noop_max: int = 30)[source]#

Bases: Wrapper

Sample initial states by taking random number of no-ops on reset.

No-op is assumed to be action 0.

Parameters:
  • env (gym.Env) – the environment to wrap.

  • noop_max (int) – the maximum value of no-ops to run.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Args:

env: The environment to wrap

reset(**kwargs: Any) tuple[Any, dict[str, Any]][source]#

Uses the reset() of the env that can be overwritten to change the returned data.

class MaxAndSkipEnv(env: Env, skip: int = 4)[source]#

Bases: Wrapper

Return only every skip-th frame (frameskipping) using most recent raw observations (for max pooling across time steps).

Parameters:
  • env (gym.Env) – the environment to wrap.

  • skip (int) – number of skip-th frame.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Args:

env: The environment to wrap

step(action: Any) tuple[Any, float, bool, bool, dict[str, Any]][source]#

Step the environment with the given action.

Repeat action, sum reward, and max over last observations.

class EpisodicLifeEnv(env: Env)[source]#

Bases: Wrapper

Make end-of-life == end-of-episode, but only reset on true game over.

It helps the value estimation.

Parameters:

env (gym.Env) – the environment to wrap.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Args:

env: The environment to wrap

step(action: Any) tuple[Any, float, bool, bool, dict[str, Any]][source]#

Uses the step() of the env that can be overwritten to change the returned data.

reset(**kwargs: Any) tuple[Any, dict[str, Any]][source]#

Calls the Gym environment reset, only when lives are exhausted.

This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

class FireResetEnv(env: Env)[source]#

Bases: Wrapper

Take action on reset for environments that are fixed until firing.

Related discussion: openai/baselines#240.

Parameters:

env (gym.Env) – the environment to wrap.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Args:

env: The environment to wrap

reset(**kwargs: Any) tuple[Any, dict][source]#

Uses the reset() of the env that can be overwritten to change the returned data.

class WarpFrame(env: Env)[source]#

Bases: ObservationWrapper

Warp frames to 84x84 as done in the Nature paper and later work.

Parameters:

env (gym.Env) – the environment to wrap.

Constructor for the observation wrapper.

observation(frame: ndarray) ndarray[source]#

Returns the current observation from a frame.

class ScaledFloatFrame(env: Env)[source]#

Bases: ObservationWrapper

Normalize observations to 0~1.

Parameters:

env (gym.Env) – the environment to wrap.

Constructor for the observation wrapper.

observation(observation: ndarray) ndarray[source]#

Returns a modified observation.

Args:

observation: The env observation

Returns:

The modified observation

class ClipRewardEnv(env: Env)[source]#

Bases: RewardWrapper

clips the reward to {+1, 0, -1} by its sign.

Parameters:

env (gym.Env) – the environment to wrap.

Constructor for the Reward wrapper.

reward(reward: SupportsFloat) int[source]#

Bin reward to {+1, 0, -1} by its sign. Note: np.sign(0) == 0.

class FrameStack(env: Env, n_frames: int)[source]#

Bases: Wrapper

Stack n_frames last frames.

Parameters:
  • env (gym.Env) – the environment to wrap.

  • n_frames (int) – the number of frames to stack.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Args:

env: The environment to wrap

reset(**kwargs: Any) tuple[ndarray, dict][source]#

Uses the reset() of the env that can be overwritten to change the returned data.

step(action: Any) tuple[ndarray, float, bool, bool, dict[str, Any]][source]#

Uses the step() of the env that can be overwritten to change the returned data.

wrap_deepmind(env: Env, episode_life: bool = True, clip_rewards: bool = True, frame_stack: int = 4, scale: bool = False, warp_frame: bool = True) MaxAndSkipEnv | EpisodicLifeEnv | FireResetEnv | WarpFrame | ScaledFloatFrame | ClipRewardEnv | FrameStack[source]#

Configure environment for DeepMind-style Atari.

The observation is channel-first: (c, h, w) instead of (h, w, c).

Parameters:
  • env – the Atari environment to wrap.

  • episode_life (bool) – wrap the episode life wrapper.

  • clip_rewards (bool) – wrap the reward clipping wrapper.

  • frame_stack (int) – wrap the frame stacking wrapper.

  • scale (bool) – wrap the scaling observation wrapper.

  • warp_frame (bool) – wrap the grayscale + resize observation wrapper.

Returns:

the wrapped atari environment.

make_atari_env(task: str, seed: int, num_training_envs: int, num_test_envs: int, scale: int | bool = False, frame_stack: int = 4) tuple[Env, BaseVectorEnv, BaseVectorEnv][source]#

Wrapper function for Atari env.

If EnvPool is installed, it will automatically switch to EnvPool’s Atari env.

Returns:

a tuple of (single env, training envs, test envs).

class AtariEnvFactory(task: str, frame_stack: int, scale: bool = False, use_envpool_if_available: bool = True, venv_type: VectorEnvType = VectorEnvType.SUBPROC_SHARED_MEM_AUTO)[source]#

Bases: EnvFactoryRegistered

Parameters:
  • task – the gymnasium task/environment identifier

  • seed – the random seed

  • venv_type – the type of vectorized environment to use (if envpool_factory is not specified)

  • envpool_factory – the factory to use for vectorized environment creation based on envpool; envpool must be installed.

  • render_mode_training – the render mode to use for training environments

  • render_mode_test – the render mode to use for test environments

  • render_mode_watch – the render mode to use for environments that are used to watch agent performance

  • make_kwargs – additional keyword arguments to pass on to gymnasium.make. If envpool is used, the gymnasium parameters will be appropriately translated for use with envpool.make_gymnasium.

class EnvPoolFactoryAtari(parent: AtariEnvFactory)[source]#

Bases: EnvPoolFactory

Atari-specific envpool creation. Since envpool internally handles the functions that are implemented through the wrappers in wrap_deepmind, it sets the creation keyword arguments accordingly.

class AtariEpochStopCallback(task: str)[source]#

Bases: EpochStopCallback

should_stop(mean_rewards: float, context: TrainingContext) bool[source]#

Determines whether training should stop.

Parameters:
  • mean_rewards – the average undiscounted returns of the testing result

  • context – the training context

Returns:

True if the goal has been reached and training should stop, False otherwise