rliable_evaluation_hl

rliable_evaluation_hl#

Source code: tianshou/evaluation/rliable_evaluation_hl.py

The rliable-evaluation module provides a high-level interface to evaluate the results of an experiment with multiple runs on different seeds using the rliable library. The API is experimental and subject to change!.

class LoggedSummaryData(mean: numpy.ndarray, std: numpy.ndarray, max: numpy.ndarray, min: numpy.ndarray)[source]#

Bases: object

mean: ndarray#

std: ndarray#

max: ndarray#

min: ndarray#

class LoggedCollectStats(env_step: numpy.ndarray | None = None, n_collected_episodes: numpy.ndarray | None = None, n_collected_steps: numpy.ndarray | None = None, collect_time: numpy.ndarray | None = None, collect_speed: numpy.ndarray | None = None, returns_stat: LoggedSummaryData | None = None, lens_stat: LoggedSummaryData | None = None)[source]#

Bases: object

env_step: ndarray | None = None#

n_collected_episodes: ndarray | None = None#

n_collected_steps: ndarray | None = None#

collect_time: ndarray | None = None#

collect_speed: ndarray | None = None#

returns_stat: LoggedSummaryData | None = None#

lens_stat: LoggedSummaryData | None = None#

classmethod from_data_dict(data: dict) → LoggedCollectStats[source]#

Create a LoggedCollectStats object from a dictionary.

Converts SequenceSummaryStats from dict format to dataclass format and ignores fields that are not present.

class RLiableExperimentResult(exp_dir: str, test_episode_returns_RE: ndarray, train_episode_returns_RE: ndarray, env_steps_E: ndarray, env_steps_train_E: ndarray)[source]#

Bases: object

The result of an experiment that can be used with the rliable library.

exp_dir: str#: The base directory where each sub-directory contains the results of one experiment run.

test_episode_returns_RE: ndarray#: The test episodes for each run of the experiment where each row corresponds to one run.

train_episode_returns_RE: ndarray#: The training episodes for each run of the experiment where each row corresponds to one run.

env_steps_E: ndarray#: The number of environment steps at which the test episodes were evaluated.

env_steps_train_E: ndarray#: The number of environment steps at which the training episodes were evaluated.

classmethod load_from_disk(exp_dir: str, max_env_step: int | None = None) → RLiableExperimentResult[source]#

Load the experiment result from disk.

Parameters:

exp_dir – The directory from where the experiment results are restored.
max_env_step – The maximum number of environment steps to consider. If None, all data is considered. Note: if the experiments have different numbers of steps, the minimum number is used.

eval_results(algo_name: str | None = None, score_thresholds: ndarray | None = None, save_plots: bool = False, show_plots: bool = True, scope: DataScope = DataScope.TEST, ax_iqm: Axes | None = None, ax_profile: Axes | None = None, algo2color: dict[str, str] | None = None) → tuple[Figure, Axes, Figure, Axes][source]#

Evaluate the results of an experiment and create a sample efficiency curve and a performance profile.

Parameters:

algo_name – The name of the algorithm to be shown in the figure legend. If None, the name of the algorithm is set to the experiment dir.
score_thresholds – The score thresholds for the performance profile. If None, the thresholds are inferred from the minimum and maximum test episode returns.
save_plots – If True, the figures are saved to the experiment directory.
show_plots – If True, the figures are shown.
scope – The scope of the evaluation, either ‘TEST’ or ‘TRAIN’.
ax_iqm – The axis to plot the IQM sample efficiency curve on. If None, a new figure is created.
ax_profile – The axis to plot the performance profile on. If None, a new figure is created.
algo2color – A dictionary mapping algorithm names to colors. Useful for plotting the evaluations of multiple algorithms in the same figure, e.g., by first creating an ax_iqm and ax_profile with one evaluation and then passing them into the other evaluation. Same as the colors kwarg in the rliable plotting utils.

Returns:

The created figures and axes in the order: fig_iqm, ax_iqm, fig_profile, ax_profile.

load_and_eval_experiments(log_dir: str, show_plots: bool = True, save_plots: bool = True, scope: DataScope | Literal['both'] = DataScope.TEST, max_env_step: int | None = None) → RLiableExperimentResult[source]#

Evaluate the experiments in the given log directory using the rliable API and return the loaded results object.

If neither show_plots nor save_plots is set to True, this is equivalent to just loading the results from disk.

Parameters:

log_dir – The directory containing the experiment results.
show_plots – whether to display plots.
save_plots – whether to save plots to the log_dir.
scope – The scope of the evaluation, either ‘test’, ‘train’ or ‘both’.
max_env_step – The maximum number of environment steps to consider. If None, all data is considered. Note: if the experiments have different numbers of steps, the minimum number is used.

rliable_evaluation_hl

Contents

rliable_evaluation_hl#