utils#
Source code: tianshou/trainer/utils.py
- gather_info(start_time: float, policy_update_time: float, gradient_step: int, best_reward: float, best_reward_std: float, train_collector: Collector | None = None, test_collector: Collector | None = None) InfoStats [source]#
A simple wrapper of gathering information from collectors.
- Returns:
A dataclass object with the following members (depending on available collectors):
gradient_step
the total number of gradient steps;best_reward
the best reward over the test results;best_reward_std
the standard deviation of best reward over the test results;train_step
the total collected step of training collector;train_episode
the total collected episode of training collector;test_step
the total collected step of test collector;test_episode
the total collected episode of test collector;timing
the timing statistics, with the following members:total_time
the total time elapsed;train_time
the total time elapsed for learning training (collecting samples plus model update);train_time_collect
the time for collecting transitions in the training collector;train_time_update
the time for training models;test_time
the time for testing;update_speed
the speed of updating (env_step per second).
- test_episode(policy: BasePolicy, collector: Collector, test_fn: Callable[[int, int | None], None] | None, epoch: int, n_episode: int, logger: BaseLogger | None = None, global_step: int | None = None, reward_metric: Callable[[ndarray], ndarray] | None = None) CollectStats [source]#
A simple wrapper of testing policy in collector.