Reinforcement Learning CLIPS Interfaces
To facilitate the development of CLIPS-based RL agents, this extension provides CLIPS logic to abstract away the ROS interactions and to provide a symbolic representation of the RL workflow as provided by the environments derived from the CXRLGym class.
In the following, the CLIPS interface is described as provided by the cx_rl_clips package.
Important
The cx_rl_clips package requires the usage of ROS Kilted or above as it heavily relies on service introspection.
Using the ROS2 CLIPS-Executive with cx_rl_clips
In order to integrate the CXRLGym with the ROS2 CLIPS-Executive, the following plugins are needed:
cx::ExecutivePlugin: Manages the overall reasoning and control flow, interleaving ROS feedback with CLIPS reasoning.
cx::RosMsgsPlugin: Provides access to ROS interfaces.
cx::RosParamPlugin: Used to fetch RL parameters that are also required within CLIPS.
cx::AmentIndexPlugin: Resolves package paths via
ament_index. Required for setting up the CLIPS interfaces.
Also, as the current configuration is compatible with ROS 2 kilted or above, action server introspection is not supported, hence the following plugins (generated through ros_comm_gen by the cx_rl_clips package) are needed:
cx::CXCxRlInterfacesGetFreeRobotPlugincx::CXCxRlInterfacesExecActionSelectionPlugincx::CXCxRlInterfacesResetEnvPlugin
With all required plugins loaded, the CLIPS interface can be initialized in one of two ways:
By batch-loading the
cx-rl.clpfile from thecx_rl_clipspackage as provided.By loading
deftemplates.clpandcx_rl_no_deftemplates.clpseparately.
The latter approach allows extending or modifying the provided deftemplate definitions between the two loading steps.
A minimal configuration file is depicted below.
clips_manager:
ros__parameters:
environments: ["cx_rl_bringup"]
cx_rl_bringup:
plugins: ["executive",
"ament_index",
"ros_msgs",
"ros_param",
"action_selection",
"get_free_robot",
"reset_env",
"rl_files"]
ament_index:
plugin: "cx::AmentIndexPlugin"
executive:
plugin: "cx::ExecutivePlugin"
ros_msgs:
plugin: "cx::RosMsgsPlugin"
ros_param:
plugin: "cx::RosParamPlugin"
reset_env:
plugin: "cx::CXCxRlInterfacesResetEnvPlugin"
action_selection:
plugin: "cx::CXCxRlInterfacesActionSelectionPlugin"
get_free_robot:
plugin: "cx::CXCxRlInterfacesGetFreeRobotPlugin"
rl_files:
plugin: "cx::FileLoadPlugin"
pkg_share_dirs: ["cx_rl_clips"]
batch: [
"clips/cx_rl_clips/cx-rl.clp",
]
Implementing RL Workflows in CLIPS
Once the CLIPS interfaces are loaded, several deftemplates, rules and functions are loaded to the specified CLIPS environment to handle ROS communication and workflow logic to interact with RL nodes based on CXRLGym environments.
The following table summarizes the mapping between ROS interfaces and their corresponding CLIPS deftemplates (all deftemplate definitions can be found).
ROS interface |
Corresponding Deftemplate(s) |
|---|---|
/get_predefined_observables |
|
/get_observable_predicates |
|
/get_observable_objects |
|
/get_predefined_actions |
|
/get_observable_actions |
|
/get_status |
|
/reset_env |
|
/get_current_observations |
|
/get_free_robot |
|
/get_action_list_executable_for_robot |
|
/action_selection |
rl-action, rl-ros-action-meta-action-selection, rl-action-request-meta |
/get_episode_end |
|
/exec_action_selection |
By using the provided deftemplates, the individual steps for training and executing RL models can be naturally integrated in CLIPS agents.
Step 0: Configuration via Global Variables
Before starting the RL workflow, optional global configuration values can be defined to customize logging behavior, reward shaping, and node identification within CLIPS.
These settings are provided as CLIPS global variables and influence the behavior of the predefined rules and interfaces.
The following global variables may be overridden after loading the cx_rl.clp
file by asserting a new defglobal definition:
(defglobal
?*CX-RL-LOG-LEVEL* = debug
?*CX-RL-REWARD-EPISODE-SUCCESS* = 0
?*CX-RL-REWARD-EPISODE-FAILURE* = 0
)
To customize the ROS node name of the RL node, the following global variable must be
defined before loading the cx_rl.clp file. If not specified, it defaults to
"/cx_rl_node".
(defglobal
?*CX-RL-NODE-NAME* = "/cx_rl_node"
)
Step 1: Defining the Environment
In order to define the RL observation space, individual observations can be added using rl-predefined-observable facts. Parameterized observations can be added using r:ref:rl-observable-predicate facts, defining parameters and their types, along with rl-observable-type facts describing the possible objects of a given type. Similarly, the action space is defined using rl-predefined-action and rl-observable-action facts, along with the observable types.
The below snippet would describe an observation space containing on(block1#block2), clear(block1), clear(block2), clear(block3), clear(block4) and an action space containing pickup(robot1#block1):
(assert
(rl-predefined-observable (name on) (params block1 block2))
(rl-observable-predicate (name clear) (param-names a) (param-types block))
(rl-observable-type (type block) (objects block1 block2 block3 block4))
(rl-predefined-action (name pickup) (params robot1 block1))
)
Aside from the action and observation space, the initial state needs to be defined. This is handled through facts of type rl-observation.
The below example registers clear(block1) and on-table(block1) as current observation.
(assert
(rl-observation (name clear) (params block1))
(rl-observation (name on-table) (params block1))
)
Once the observation space, action space, and initial observations are available, the RL environment is ready to be initialized.
Training or execution is started by asserting a cx-rl-node fact. This assertion triggers a backup of the current fact base that can be used for resetting the environment (as detailed in Step 2). The fact must only be asserted if it does not already exist, as it will persist and be updated across environment resets.
(if (not (any-factp ((?node cx-rl-node)) (eq ?node:name ?*CX-RL-NODE-NAME*))) then
(assert (cx-rl-node (name ?*CX-RL-NODE-NAME*) (mode UNSET)))
)
Step 2: Defining the Reset Procedure
During training, environment resets are triggered automatically by asserting
an rl-reset-env fact. The reset process is executed as a staged procedure,
where progress is controlled via the state slot of the fact.
The following reset states are processed in order:
ABORT-RUNNING-ACTIONS Automatic step that gracefully terminates any currently executing actions before the episode reset begins.
USER-CLEANUP User-defined hook executed before the default reset logic. From this state, users may either:
transition to
LOAD-FACTSto continue with the default reset behavior, ortransition directly to
DONEto fully replace the default reset procedure.
LOAD-FACTS Automatic step that restores the CLIPS fact base to the snapshot taken after the initial assertion of the
cx-rl-nodefact.USER-INIT User-defined hook executed after the default reset has completed and before the next episode starts. Transitioning to
DONEresumes training.DONE Finalizes the reset procedure and hands control back to the training loop.
When no customization of the reset procedure is required, it is sufficient to define rules that advance the reset state through the user-defined stages without performing additional actions:
(defrule reset-to-load-facts
?reset <- (rl-reset-env (state USER-CLEANUP))
=>
(modify ?reset (state LOAD-FACTS))
)
(defrule reset-to-done
?reset <- (rl-reset-env (state USER-INIT))
=>
(modify ?reset (state DONE))
)
Step 3: Action Execution
Action generation and selection are driven by the assertion of a rl-current-action-space fact. The workflow differs slightly depending on whether the system is operating in training or execution mode.
Training Mode
During training, the action-selection cycle is initiated automatically:
An rl-current-action-space fact with state
PENDINGis asserted by the system once a robot becomes available and new observations are present.User-defined rules generate candidate actions by asserting rl-action facts based on the current observations and robot state.
After all candidate actions have been generated, the user transitions rl-current-action-space to state
DONE.The system automatically selects one of the candidate actions by:
marking the corresponding rl-action fact with
is-selected TRUE, andmarking the associated rl-robot as busy by setting the
waitingslot toFALSE.
If no candidate action is provided, the episode terminates pre-emptively. In this case, a
no-opaction is registered and rewarded using the value defined by the global variable?*CX-RL-REWARD-EPISODE-SUCCESS*.The user executes the selected action and:
updates environment observations via rl-observation facts,
marks the action as completed by setting the
is-finishedslot toTRUE,assigns a reward using the
rewardslot of the rl-action fact.
Collected rewards are consumed by the RL node to advance the learning procedure.
The user may also explicitly terminate the episode by asserting an rl-episode-end fact and setting the
successslot accordingly. Depending on its value, either?*CX-RL-REWARD-EPISODE-SUCCESS*or?*CX-RL-REWARD-EPISODE-FAILURE*is registered.Once training is complete, the fact rl-end-training is asserted to inform the user.
Execution Mode
In execution mode, action selection is explicitly controlled by the user:
The user asserts an rl-current-action-space fact in
EXECUTIONmode.Based on the current observations and robot availability, the user asserts executable rl-action facts and assigns them to a robot.
After all candidate actions have been defined, the user transitions rl-current-action-space to state
DONE.The system automatically marks one of the candidate actions as selected by setting
is-selected TRUEand marking the associated rl-robot as busy by setting slotwaitingtoFALSE.The user executes the selected action and updates the environment state by asserting new rl-observation facts.
This process may be repeated until no further actions are available or the execution goals are satisfied.
If no actions are specified before setting the rl-current-action-space fact to DONE, then the prediction asserts a rl-action fact with name no-op.
Provided Deftemplates
In the remainder of this document, all provided deftemplates of the cx_rl_clips
integration are described. These templates form the symbolic interface between
CLIPS-based reasoning and a CXRLGym reinforcement learning environment.
They are used to define symbolic observation and action spaces, represent the current environment state, generate and execute actions, and control training and execution workflows.
When working with templates, make sure you understand which facts and slots are automatically populated by cx_rl_clips and which must be defined by the user. Providing or overriding automatically managed fields may lead to unintended behavior.
rl-reset-env
Represents a request to reset the RL environment.
This fact is asserted by the environment and processed to coordinate cleanup, initialization, and reset transitions.
Users are responsible to transition the state slot from USER-CLEANUP to LOAD-FACTS and from USER-INIT to DONE.
(deftemplate rl-reset-env
(slot node (type STRING) (default "/cx_rl_node"))
(slot state (type SYMBOL)
(allowed-values ABORT-RUNNING-ACTIONS USER-CLEANUP LOAD-FACTS USER-INIT DONE))
(slot uuid (type STRING))
)
cx-rl-node
Represents the RL node instance and its current lifecycle state. It also serves as the main entry point for tracking training and execution progress.
This fact is asserted once by the user during initialization and updated automatically
via status queries.
Users should assert this fact and specify the slot name according to the global variable *?CX-RL-NODE-NAME* used for loading cx_rl_clips.
(deftemplate cx-rl-node
(slot name (type STRING) (default "/cx_rl_node"))
(slot ros-comm-init (type SYMBOL) (allowed-values TRUE FALSE) (default FALSE))
(slot fact-reset-file (type STRING) (default ""))
(slot mode (type SYMBOL) (allowed-values UNSET TRAINING EXECUTION))
(slot episode (type INTEGER))
(slot step (type INTEGER))
(slot total-steps (type INTEGER))
(slot model-loaded (type SYMBOL) (allowed-values TRUE FALSE) (default FALSE))
)
rl-get-status
Request the current status of the RL environment.
Asserting this fact causes the corresponding cx-rl-node fact to be updated.
The system initially requests updates in order to monitor the startup process.
Users may assert this fact (and set the slot node) request updates to observe training progression.
(deftemplate rl-get-status
(slot node (type STRING) (default "/cx_rl_node"))
(slot request-id (type INTEGER)) ; internal
)
rl-end-training
Signals that training has completed. This fact is asserted by the environmnent and notifies users about the end of training.
(deftemplate rl-end-training
(slot node (type STRING) (default "/cx_rl_node"))
)
rl-episode-end
Marks the end of an episode during training. Users can assert this to indicate indicate success or failure of the current episode and trigger environment resets.
(deftemplate rl-episode-end
(slot node (type STRING) (default "/cx_rl_node"))
(slot success (type SYMBOL)
(allowed-values TRUE FALSE)
(default TRUE))
)
rl-observable-type
Defines a symbolic type and its corresponding objects. These definitions are provided by the user and used to construct grounded observation and action spaces.
(deftemplate rl-observable-type
(slot node (type STRING) (default "/cx_rl_node"))
(slot type (type SYMBOL))
(multislot objects (type SYMBOL) (default (create$)))
)
rl-observable-predicate
Defines a predicate with typed parameters. All valid groundings are generated using the objects defined via rl-observable-type for spanning the observation space. Users are responsible for asserting facts of this type.
(deftemplate rl-observable-predicate
(slot node (type STRING) (default "/cx_rl_node"))
(slot name (type SYMBOL))
(multislot param-names (type SYMBOL))
(multislot param-types (type SYMBOL))
)
rl-predefined-observable
Defines a grounded observable directly. Used to add individual observations into the observation space. Users are responsible for asserting facts of this type.
(deftemplate rl-predefined-observable
(slot node (type STRING) (default "/cx_rl_node"))
(slot name (type SYMBOL))
(multislot params (type SYMBOL))
)
rl-predefined-action
Defines a grounded action directly. Used to add individual actions into the action space. Users are responsible for asserting facts of this type.
(deftemplate rl-predefined-action
(slot node (type STRING) (default "/cx_rl_node"))
(slot name (type SYMBOL))
(multislot params (type SYMBOL))
)
rl-observable-action
Defines a parameterized symbolic action. All valid groundings are generated based on the defined parameter types and objects. Users are responsible for asserting facts of this type.
(deftemplate rl-observable-action
(slot node (type STRING) (default "/cx_rl_node"))
(slot name (type SYMBOL))
(multislot param-names (type SYMBOL))
(multislot param-types (type SYMBOL))
)
rl-observation
Represents a currently active observation.
The set of all asserted rl-observation facts defines the current environment state.
Users are responsible for asserting facts of this type.
(deftemplate rl-observation
(slot node (type STRING) (default "/cx_rl_node"))
(slot name (type SYMBOL))
(multislot params (type SYMBOL))
)
rl-robot
Represents a robot that can execute actions.
A corresponding fact must be asserted for each robot whose actions are to be controlled by RL.
The waiting slot is managed automatically by the RL interface, indicating
that a robot is idling and ready for getting a new action assigned.
(deftemplate rl-robot
(slot node (type STRING) (default "/cx_rl_node"))
(slot name (type SYMBOL))
(slot waiting (type SYMBOL) (allowed-values TRUE FALSE) (default TRUE))
)
rl-current-action-space
Asserted automatically to indicate the generation phase of the current action space.
Facts of this type are created with the slot state set to PENDING, signaling that the user must assert rl-action facts before advancing the state to DONE.
Once the state is set to DONE, action selection proceeds automatically.
(deftemplate rl-current-action-space
(slot node (type STRING) (default "/cx_rl_node"))
(slot state (type SYMBOL) (allowed-values PENDING DONE) (default PENDING))
)
rl-action
Users must assert facts of this type whenever a corresponding
rl-current-action-space fact is present.
When asserting an rl-action fact, the slots node, id,
name, and params must be specified.
Note that the slot id should uniquely identify rl-action facts, while the slot name and params have to match those used when creating the action space.
The slots is-selected and assigned-to are managed automatically.
They indicate whether the action has been selected for execution and
which rl-robot is assigned to execute it.
Once an action is selected, the user is responsible for carrying out
its execution. Upon completion, the user must update the slots
is-finished and reward to signal termination and provide the
reward obtained from the execution.
(deftemplate rl-action
(slot node (type STRING) (default "/cx_rl_node"))
(slot id (type SYMBOL))
(slot name (type SYMBOL))
(multislot params (type SYMBOL))
(slot is-finished (type SYMBOL) (allowed-values TRUE FALSE) (default FALSE))
(slot reward (type INTEGER) (default 0))
(slot is-selected (type SYMBOL) (allowed-values TRUE FALSE) (default FALSE))
(slot assigned-to (type SYMBOL) (default nil))
)
rl-ros-action-meta-get-free-robot
Represents internal metadata associated with the get_free_robot ROS action.
Facts of this type are asserted and managed automatically by the framework.
(deftemplate rl-ros-action-meta-get-free-robot
(slot uuid (type STRING))
(slot node (type STRING) (default "/cx_rl_node"))
(slot robot (type STRING))
(slot last-search (type FLOAT))
(slot found (type SYMBOL) (allowed-values TRUE FALSE))
(slot abort-action (type SYMBOL) (allowed-values FALSE TRUE) (default FALSE))
)
rl-ros-action-meta-action-selection
Internal metadata for the action_selection ROS action.
Facts of this type are asserted and managed automatically by the framework.
(deftemplate rl-ros-action-meta-action-selection
(slot uuid (type STRING))
(slot node (type STRING) (default "/cx_rl_node"))
(slot action-id (type SYMBOL))
(slot abort-action (type SYMBOL) (allowed-values FALSE TRUE) (default FALSE))
)
rl-action-request-meta
Internal request tracking for ROS service calls. Facts of this type are asserted and managed automatically by the framework.
(deftemplate rl-action-request-meta
(slot node (type STRING) (default "/cx_rl_node"))
(slot service (type STRING))
(slot request-id (type INTEGER))
(slot action-id (type SYMBOL))
)