mapolicy#
Source code: tianshou/policy/multiagent/mapolicy.py
- class MapTrainingStats(agent_id_to_stats: dict[str | int, TrainingStats], train_time_aggregator: Literal['min', 'max', 'mean'] = 'max')[source]#
- class MultiAgentPolicyManager(*, policies: list[BasePolicy], env: PettingZooEnv, action_scaling: bool = False, action_bound_method: Literal['clip', 'tanh'] | None = 'clip', lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#
Multi-agent policy manager for MARL.
This multi-agent policy manager accepts a list of
BasePolicy
. It dispatches the batch data to each of these policies when the “forward” is called. The same as “process_fn” and “learn”: it splits the data and feeds them to each policy. A figure in Multi-Agent Reinforcement Learning can help you better understand this procedure.- Parameters:
policies – a list of policies.
env – a PettingZooEnv.
action_scaling – if True, scale the action from [-1, 1] to the range of action_space. Only used if the action_space is continuous.
action_bound_method – method to bound action to range [-1, 1]. Only used if the action_space is continuous.
lr_scheduler – if not None, will be called in policy.update().
- exploration_noise(act: ndarray | BatchProtocol, batch: RolloutBatchProtocol) ndarray | BatchProtocol [source]#
Add exploration noise from sub-policy onto act.
- forward(batch: Batch, state: dict | Batch | None = None, **kwargs: Any) Batch [source]#
Dispatch batch data from obs.agent_id to every policy’s forward.
- Parameters:
batch – TODO: document what is expected at input and make a BatchProtocol for it
state – if None, it means all agents have no state. If not None, it should contain keys of “agent_1”, “agent_2”, …
- Returns:
a Batch with the following contents: TODO: establish a BatcProtocol for this
{ "act": actions corresponding to the input "state": { "agent_1": output state of agent_1's policy for the state "agent_2": xxx ... "agent_n": xxx} "out": { "agent_1": output of agent_1's policy for the input "agent_2": xxx ... "agent_n": xxx} }
- learn(batch: MAPRolloutBatchProtocol, *args: Any, **kwargs: Any) MapTrainingStats [source]#
Dispatch the data to all policies for learning.
- Parameters:
batch – must map agent_ids to rollout batches
- policies: dict[str | int, BasePolicy]#
Maps agent_id to policy.
- process_fn(batch: MAPRolloutBatchProtocol, buffer: ReplayBuffer, indice: ndarray) MAPRolloutBatchProtocol [source]#
Dispatch batch data from obs.agent_id to every policy’s process_fn.
Save original multi-dimensional rew in “save_rew”, set rew to the reward of each agent during their “process_fn”, and restore the original reward afterwards.
- replace_policy(policy: BasePolicy, agent_id: int) None [source]#
Replace the “agent_id”th policy in this manager.