rliable_evaluation_hl#
Source code: tianshou/evaluation/rliable_evaluation_hl.py
The rliable-evaluation module provides a high-level interface to evaluate the results of an experiment with multiple runs on different seeds using the rliable library. The API is experimental and subject to change!.
- class LoggedSummaryData(mean: numpy.ndarray, std: numpy.ndarray, max: numpy.ndarray, min: numpy.ndarray)[source]#
Bases:
object- mean: ndarray#
- std: ndarray#
- max: ndarray#
- min: ndarray#
- class LoggedCollectStats(env_step: numpy.ndarray | None = None, n_collected_episodes: numpy.ndarray | None = None, n_collected_steps: numpy.ndarray | None = None, collect_time: numpy.ndarray | None = None, collect_speed: numpy.ndarray | None = None, returns_stat: LoggedSummaryData | None = None, lens_stat: LoggedSummaryData | None = None)[source]#
Bases:
object- env_step: ndarray | None = None#
- n_collected_episodes: ndarray | None = None#
- n_collected_steps: ndarray | None = None#
- collect_time: ndarray | None = None#
- collect_speed: ndarray | None = None#
- returns_stat: LoggedSummaryData | None = None#
- lens_stat: LoggedSummaryData | None = None#
- classmethod from_data_dict(data: dict) LoggedCollectStats[source]#
Create a LoggedCollectStats object from a dictionary.
Converts SequenceSummaryStats from dict format to dataclass format and ignores fields that are not present.
- class RLiableExperimentResult(exp_dir: str, test_episode_returns_RE: ndarray, train_episode_returns_RE: ndarray, env_steps_E: ndarray, env_steps_train_E: ndarray)[source]#
Bases:
objectThe result of an experiment that can be used with the rliable library.
- exp_dir: str#
The base directory where each sub-directory contains the results of one experiment run.
- test_episode_returns_RE: ndarray#
The test episodes for each run of the experiment where each row corresponds to one run.
- train_episode_returns_RE: ndarray#
The training episodes for each run of the experiment where each row corresponds to one run.
- env_steps_E: ndarray#
The number of environment steps at which the test episodes were evaluated.
- env_steps_train_E: ndarray#
The number of environment steps at which the training episodes were evaluated.
- classmethod load_from_disk(exp_dir: str, max_env_step: int | None = None) RLiableExperimentResult[source]#
Load the experiment result from disk.
- Parameters:
exp_dir – The directory from where the experiment results are restored.
max_env_step – The maximum number of environment steps to consider. If None, all data is considered. Note: if the experiments have different numbers of steps, the minimum number is used.
- eval_results(algo_name: str | None = None, score_thresholds: ndarray | None = None, save_plots: bool = False, show_plots: bool = True, scope: DataScope = DataScope.TEST, ax_iqm: Axes | None = None, ax_profile: Axes | None = None, algo2color: dict[str, str] | None = None) tuple[Figure, Axes, Figure, Axes][source]#
Evaluate the results of an experiment and create a sample efficiency curve and a performance profile.
- Parameters:
algo_name – The name of the algorithm to be shown in the figure legend. If None, the name of the algorithm is set to the experiment dir.
score_thresholds – The score thresholds for the performance profile. If None, the thresholds are inferred from the minimum and maximum test episode returns.
save_plots – If True, the figures are saved to the experiment directory.
show_plots – If True, the figures are shown.
scope – The scope of the evaluation, either ‘TEST’ or ‘TRAIN’.
ax_iqm – The axis to plot the IQM sample efficiency curve on. If None, a new figure is created.
ax_profile – The axis to plot the performance profile on. If None, a new figure is created.
algo2color – A dictionary mapping algorithm names to colors. Useful for plotting the evaluations of multiple algorithms in the same figure, e.g., by first creating an ax_iqm and ax_profile with one evaluation and then passing them into the other evaluation. Same as the colors kwarg in the rliable plotting utils.
- Returns:
The created figures and axes in the order: fig_iqm, ax_iqm, fig_profile, ax_profile.
- load_and_eval_experiments(log_dir: str, show_plots: bool = True, save_plots: bool = True, scope: DataScope | Literal['both'] = DataScope.TEST, max_env_step: int | None = None) RLiableExperimentResult[source]#
Evaluate the experiments in the given log directory using the rliable API and return the loaded results object.
If neither show_plots nor save_plots is set to True, this is equivalent to just loading the results from disk.
- Parameters:
log_dir – The directory containing the experiment results.
show_plots – whether to display plots.
save_plots – whether to save plots to the log_dir.
scope – The scope of the evaluation, either ‘test’, ‘train’ or ‘both’.
max_env_step – The maximum number of environment steps to consider. If None, all data is considered. Note: if the experiments have different numbers of steps, the minimum number is used.