tianshou.env¶
-
class
tianshou.env.BaseVectorEnv(env_fns)[source]¶ Bases:
abc.ABC,gym.core.WrapperBase class for vectorized environments wrapper. Usage:
env_num = 8 envs = VectorEnv([lambda: gym.make(task) for _ in range(env_num)]) assert len(envs) == env_num
It accepts a list of environment generators. In other words, an environment generator
efnof a specific task means thatefn()returns the environment of the given task, for example,gym.make(task).All of the VectorEnv must inherit
BaseVectorEnv. Here are some other usages:envs.seed(2) # which is equal to the next line envs.seed([2, 3, 4, 5, 6, 7, 8, 9]) # set specific seed for each env obs = envs.reset() # reset all environments obs = envs.reset([0, 5, 7]) # reset 3 specific environments obs, rew, done, info = envs.step([1] * 8) # step synchronously envs.render() # render all environments envs.close() # close all environments
-
abstract
reset(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None, otherwise reset the specific environments with given id, either an int or a list.
-
abstract
seed(seed=None)[source]¶ Set the seed for all environments. Accept
None, an int (which will extendito[i, i + 1, i + 2, ...]) or a list.
-
abstract
step(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obsa numpy.ndarray, the agent’s observation of current environmentsrewa numpy.ndarray, the amount of rewards returned after previous actionsdonea numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfoa numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
abstract
-
class
tianshou.env.VectorEnv(env_fns)[source]¶ Bases:
tianshou.env.vecenv.BaseVectorEnvDummy vectorized environment wrapper, implemented in for-loop. The usage is in
BaseVectorEnv.-
reset(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None, otherwise reset the specific environments with given id, either an int or a list.
-
seed(seed=None)[source]¶ Set the seed for all environments. Accept
None, an int (which will extendito[i, i + 1, i + 2, ...]) or a list.
-
step(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obsa numpy.ndarray, the agent’s observation of current environmentsrewa numpy.ndarray, the amount of rewards returned after previous actionsdonea numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfoa numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
tianshou.env.SubprocVectorEnv(env_fns)[source]¶ Bases:
tianshou.env.vecenv.BaseVectorEnvVectorized environment wrapper based on subprocess. The usage is in
BaseVectorEnv.-
reset(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None, otherwise reset the specific environments with given id, either an int or a list.
-
seed(seed=None)[source]¶ Set the seed for all environments. Accept
None, an int (which will extendito[i, i + 1, i + 2, ...]) or a list.
-
step(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obsa numpy.ndarray, the agent’s observation of current environmentsrewa numpy.ndarray, the amount of rewards returned after previous actionsdonea numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfoa numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
tianshou.env.RayVectorEnv(env_fns)[source]¶ Bases:
tianshou.env.vecenv.BaseVectorEnvVectorized environment wrapper based on ray. However, according to our test, it is about two times slower than
SubprocVectorEnv. The usage is inBaseVectorEnv.-
reset(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None, otherwise reset the specific environments with given id, either an int or a list.
-
seed(seed=None)[source]¶ Set the seed for all environments. Accept
None, an int (which will extendito[i, i + 1, i + 2, ...]) or a list.
-
step(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obsa numpy.ndarray, the agent’s observation of current environmentsrewa numpy.ndarray, the amount of rewards returned after previous actionsdonea numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfoa numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-