rliable_evaluation_hl#


The rliable-evaluation module provides a high-level interface to evaluate the results of an experiment with multiple runs on different seeds using the rliable library. The API is experimental and subject to change!.

class LoggedSummaryData(mean: numpy.ndarray, std: numpy.ndarray, max: numpy.ndarray, min: numpy.ndarray)[source]#

Bases: object

mean: ndarray#
std: ndarray#
max: ndarray#
min: ndarray#
class LoggedCollectStats(env_step: numpy.ndarray | None = None, n_collected_episodes: numpy.ndarray | None = None, n_collected_steps: numpy.ndarray | None = None, collect_time: numpy.ndarray | None = None, collect_speed: numpy.ndarray | None = None, returns_stat: LoggedSummaryData | None = None, lens_stat: LoggedSummaryData | None = None)[source]#

Bases: object

env_step: ndarray | None = None#
n_collected_episodes: ndarray | None = None#
n_collected_steps: ndarray | None = None#
collect_time: ndarray | None = None#
collect_speed: ndarray | None = None#
returns_stat: LoggedSummaryData | None = None#
lens_stat: LoggedSummaryData | None = None#
classmethod from_data_dict(data: dict) LoggedCollectStats[source]#

Create a LoggedCollectStats object from a dictionary.

Converts SequenceSummaryStats from dict format to dataclass format and ignores fields that are not present.

class RLiableExperimentResult(exp_dir: str, test_episode_returns_RE: ndarray, train_episode_returns_RE: ndarray, env_steps_E: ndarray, env_steps_train_E: ndarray)[source]#

Bases: object

The result of an experiment that can be used with the rliable library.

exp_dir: str#

The base directory where each sub-directory contains the results of one experiment run.

test_episode_returns_RE: ndarray#

The test episodes for each run of the experiment where each row corresponds to one run.

train_episode_returns_RE: ndarray#

The training episodes for each run of the experiment where each row corresponds to one run.

env_steps_E: ndarray#

The number of environment steps at which the test episodes were evaluated.

env_steps_train_E: ndarray#

The number of environment steps at which the training episodes were evaluated.

classmethod load_from_disk(exp_dir: str, max_env_step: int | None = None) RLiableExperimentResult[source]#

Load the experiment result from disk.

Parameters:
  • exp_dir – The directory from where the experiment results are restored.

  • max_env_step – The maximum number of environment steps to consider. If None, all data is considered. Note: if the experiments have different numbers of steps, the minimum number is used.

eval_results(algo_name: str | None = None, score_thresholds: ndarray | None = None, save_plots: bool = False, show_plots: bool = True, scope: DataScope = DataScope.TEST, ax_iqm: Axes | None = None, ax_profile: Axes | None = None, algo2color: dict[str, str] | None = None) tuple[Figure, Axes, Figure, Axes][source]#

Evaluate the results of an experiment and create a sample efficiency curve and a performance profile.

Parameters:
  • algo_name – The name of the algorithm to be shown in the figure legend. If None, the name of the algorithm is set to the experiment dir.

  • score_thresholds – The score thresholds for the performance profile. If None, the thresholds are inferred from the minimum and maximum test episode returns.

  • save_plots – If True, the figures are saved to the experiment directory.

  • show_plots – If True, the figures are shown.

  • scope – The scope of the evaluation, either ‘TEST’ or ‘TRAIN’.

  • ax_iqm – The axis to plot the IQM sample efficiency curve on. If None, a new figure is created.

  • ax_profile – The axis to plot the performance profile on. If None, a new figure is created.

  • algo2color – A dictionary mapping algorithm names to colors. Useful for plotting the evaluations of multiple algorithms in the same figure, e.g., by first creating an ax_iqm and ax_profile with one evaluation and then passing them into the other evaluation. Same as the colors kwarg in the rliable plotting utils.

Returns:

The created figures and axes in the order: fig_iqm, ax_iqm, fig_profile, ax_profile.

load_and_eval_experiments(log_dir: str, show_plots: bool = True, save_plots: bool = True, scope: DataScope | Literal['both'] = DataScope.TEST, max_env_step: int | None = None) RLiableExperimentResult[source]#

Evaluate the experiments in the given log directory using the rliable API and return the loaded results object.

If neither show_plots nor save_plots is set to True, this is equivalent to just loading the results from disk.

Parameters:
  • log_dir – The directory containing the experiment results.

  • show_plots – whether to display plots.

  • save_plots – whether to save plots to the log_dir.

  • scope – The scope of the evaluation, either ‘test’, ‘train’ or ‘both’.

  • max_env_step – The maximum number of environment steps to consider. If None, all data is considered. Note: if the experiments have different numbers of steps, the minimum number is used.