cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer module

class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MaskableDictRolloutBufferSamples(observations: torch.Tensor, actions: torch.Tensor, old_values: torch.Tensor, old_log_prob: torch.Tensor, advantages: torch.Tensor, returns: torch.Tensor, action_masks: torch.Tensor)

Bases: MaskableRolloutBufferSamples

class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MaskableRolloutBufferSamples(observations, actions, old_values, old_log_prob, advantages, returns, action_masks)

Bases: NamedTuple

action_masks: torch.Tensor

Alias for field number 6

actions: torch.Tensor

Alias for field number 1

advantages: torch.Tensor

Alias for field number 4

observations: torch.Tensor

Alias for field number 0

old_log_prob: torch.Tensor

Alias for field number 3

old_values: torch.Tensor

Alias for field number 2

returns: torch.Tensor

Alias for field number 5

class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MultiRobotMaskableDictRolloutBuffer(*args: Any, **kwargs: Any)

Bases: DictRolloutBuffer

Maskable Dict Rollout Buffer that is compatible with the MuliRobotMaskablePPO Agent.

Based on the MaskableDictRolloutBuffer implemented by the Stable Baselines3 Team.

Parameters:
  • buffer_size – Max number of element in the buffer

  • observation_space – Observation space

  • action_space – Action space

  • device – PyTorch device

  • gae_lambda – Factor for trade-off of bias vs variance for Generalized Advantage Estimator Equivalent to classic advantage when set to 1.

  • gamma – Discount factor

  • n_envs – Number of parallel environments

add(*args, action_masks: numpy.ndarray | None = None, **kwargs) None
Parameters:

action_masks – Masks applied to constrain the choice of actions.

get(batch_size: int | None = None) Generator[MaskableDictRolloutBufferSamples, None, None]
reset() None
class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MultiRobotMaskableRolloutBuffer(*args: Any, **kwargs: Any)

Bases: RolloutBuffer

Maskable Rollout Buffer that is compatible with the MuliRobotMaskablePPO Agent.

Based on the MaskableRolloutBuffer implemented by the Stable Baselines3 Team.

Parameters:
  • buffer_size – Max number of element in the buffer

  • observation_space – Observation space

  • action_space – Action space

  • device – PyTorch device

  • gae_lambda – Factor for trade-off of bias vs variance for Generalized Advantage Estimator Equivalent to classic advantage when set to 1.

  • gamma – Discount factor

  • n_envs – Number of parallel environments

add(*args, action_masks: numpy.ndarray | None = None, **kwargs) None
Parameters:

action_masks – Masks applied to constrain the choice of actions.

get(batch_size: int | None = None) Generator[MaskableRolloutBufferSamples, None, None]
reset() None