cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer module
- class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MaskableDictRolloutBufferSamples(observations: torch.Tensor, actions: torch.Tensor, old_values: torch.Tensor, old_log_prob: torch.Tensor, advantages: torch.Tensor, returns: torch.Tensor, action_masks: torch.Tensor)
Bases:
MaskableRolloutBufferSamples
- class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MaskableRolloutBufferSamples(observations, actions, old_values, old_log_prob, advantages, returns, action_masks)
Bases:
NamedTuple- action_masks: torch.Tensor
Alias for field number 6
- actions: torch.Tensor
Alias for field number 1
- advantages: torch.Tensor
Alias for field number 4
- observations: torch.Tensor
Alias for field number 0
- old_log_prob: torch.Tensor
Alias for field number 3
- old_values: torch.Tensor
Alias for field number 2
- returns: torch.Tensor
Alias for field number 5
- class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MultiRobotMaskableDictRolloutBuffer(*args: Any, **kwargs: Any)
Bases:
DictRolloutBufferMaskable Dict Rollout Buffer that is compatible with the MuliRobotMaskablePPO Agent.
Based on the MaskableDictRolloutBuffer implemented by the Stable Baselines3 Team.
- Parameters:
buffer_size – Max number of element in the buffer
observation_space – Observation space
action_space – Action space
device – PyTorch device
gae_lambda – Factor for trade-off of bias vs variance for Generalized Advantage Estimator Equivalent to classic advantage when set to 1.
gamma – Discount factor
n_envs – Number of parallel environments
- add(*args, action_masks: numpy.ndarray | None = None, **kwargs) None
- Parameters:
action_masks – Masks applied to constrain the choice of actions.
- get(batch_size: int | None = None) Generator[MaskableDictRolloutBufferSamples, None, None]
- reset() None
- class cx_rl_multi_robot_mppo.MultiRobotMaskableRolloutBuffer.MultiRobotMaskableRolloutBuffer(*args: Any, **kwargs: Any)
Bases:
RolloutBufferMaskable Rollout Buffer that is compatible with the MuliRobotMaskablePPO Agent.
Based on the MaskableRolloutBuffer implemented by the Stable Baselines3 Team.
- Parameters:
buffer_size – Max number of element in the buffer
observation_space – Observation space
action_space – Action space
device – PyTorch device
gae_lambda – Factor for trade-off of bias vs variance for Generalized Advantage Estimator Equivalent to classic advantage when set to 1.
gamma – Discount factor
n_envs – Number of parallel environments
- add(*args, action_masks: numpy.ndarray | None = None, **kwargs) None
- Parameters:
action_masks – Masks applied to constrain the choice of actions.
- get(batch_size: int | None = None) Generator[MaskableRolloutBufferSamples, None, None]
- reset() None