You're reading the documentation for a development version. For the latest released version, please have a look at Kilted.
Using rosidl::Buffer backends
Overview
rosidl::Buffer<T> is the generated C++ representation of variable-length
primitive array fields in ROS 2 messages (uint8[], float32[], …).
It behaves like a std::vector<T> by default, but its storage is pluggable
so that vendors can back those fields with non-CPU memory (for example, GPU
memory) and transport them with as few copies as the RMW and the backend
allow.
See About rosidl::Buffer backends for a conceptual overview.
This guide covers:
how to keep existing publishers and subscribers working without changes;
how to opt a subscription in to a specific set of backends;
how to use a backend’s user-facing API to read and write its native data type;
how
tensor_msgs/msg/ExperimentalTensorandtorch_conversionsuse the same backend mechanism for tensor data;how to make sure publisher and subscriber are configured compatibly.
Default behavior (no changes required)
If no backend is explicitly configured, rosidl::Buffer<T> still behaves
like std::vector<T>.
Existing code such as
auto msg = std::make_unique<sensor_msgs::msg::Image>();
msg->data.resize(width * height * 3);
std::memcpy(msg->data.data(), source_ptr, msg->data.size());
publisher_->publish(std::move(msg));
continues to compile and run unchanged because rosidl::Buffer<uint8_t>
implicitly converts to std::vector<uint8_t> & when its active backend is
CPU.
On the subscription side, if a subscription does not opt in to any
non-CPU backend (see below), the RMW delivers the data with the CPU backend
just like it did before the feature was introduced, so
msg->data.size(), msg->data[i], msg->data.data() and so on keep
working.
RMW support
Buffer backends require support from the RMW implementation to negotiate and serialize descriptors on the wire.
The first supported RMW integration is rmw_fastrtps_cpp.
With other RMW implementations, a subscription that requests a non-CPU
backend still functions – it simply receives CPU-backed data, exactly as if
acceptable_buffer_backends had not been set.
The non-CPU backend path currently applies to topic publish/subscribe only.
Services and actions do not expose an acceptable_buffer_backends option and continue to use their normal request, response, feedback, status, and result serialization paths.
Discovering installed backends
Backend plugins are discovered through pluginlib and live in packages
whose package.xml exports the rosidl_buffer_backend plugin
description file.
To see which backends are installed in the current environment:
$ ros2 pkg list | grep _buffer_backend
Each backend package usually installs two things:
the backend plugin itself (a shared library), registered under a short name such as
cudain the plugin XML;a descriptor-message package (
*_msgs) containing the backend’s descriptor.msgtype.
Both must be installed on every node that participates in the topic – publishers and subscribers.
Enabling a backend on a subscription
Publishers do not advertise a list of backends; they simply publish
whatever backend their buffer currently uses.
Subscriptions, on the other hand, opt in to non-CPU backends via the
acceptable_buffer_backends option on rclcpp::SubscriptionOptions
(or the acceptable_buffer_backends keyword argument in rclpy).
The option is a comma-separated string of backend names. It accepts the following forms:
Value |
Meaning |
|---|---|
empty or |
CPU only. This is the default and preserves backward compatibility. |
|
Accept any backend that is installed in this process. |
|
Accept only the listed backends, in addition to CPU.
Names match the plugin |
CPU is always implicitly acceptable, so no matter what is specified, a subscription can always receive CPU-backed messages (for example, when the publisher’s backend cannot serve this particular peer).
C++ example
#include <cuda_runtime.h>
#include "rclcpp/rclcpp.hpp"
#include "sensor_msgs/msg/image.hpp"
#include "cuda_buffer/cuda_buffer_api.hpp"
class GpuImageSubscriber : public rclcpp::Node
{
public:
GpuImageSubscriber()
: Node("gpu_image_subscriber")
{
cudaStreamCreate(&stream_);
rclcpp::SubscriptionOptions sub_opts;
sub_opts.acceptable_buffer_backends = "cuda";
subscription_ = this->create_subscription<sensor_msgs::msg::Image>(
"image", 10,
[this](sensor_msgs::msg::Image::ConstSharedPtr msg) {
if (msg->data.get_backend_type() == "cuda") {
// Zero-copy GPU path: read the device pointer directly.
auto rh = cuda_buffer_backend::from_input_buffer(msg->data, stream_);
process_on_gpu(rh.get_ptr(), msg->data.size(), stream_);
} else {
// CPU fallback: msg->data behaves like std::vector<uint8_t>.
process_on_cpu(msg->data.data(), msg->data.size());
}
},
sub_opts);
}
private:
rclcpp::Subscription<sensor_msgs::msg::Image>::SharedPtr subscription_;
cudaStream_t stream_{nullptr};
};
Two things to note:
The subscription only declares which backends are acceptable; the actual backend of a received message is whatever the publisher sent, subject to negotiation. Always branch on
msg->data.get_backend_type()before using a backend-specific API.Backend APIs (
cuda_buffer_backend::from_input_buffer,cuda_buffer_backend::from_output_buffer, …) are provided by the backend package, not byrclcpp. Consult each backend’s own documentation for its full API surface.
Python
rclpy exposes the same option as a keyword argument on
Node.create_subscription:
self.create_subscription(
Image,
'image',
self.image_callback,
10,
acceptable_buffer_backends='cuda')
On the Python side the data field still surfaces as a byte sequence;
backend user-facing APIs are currently C++-only, so Python subscribers that
accept non-CPU backends typically rely on the backend’s own CPU-fallback
path (the RMW serializes the buffer to CPU if the backend cannot serve the
Python endpoint directly).
Publisher side
Publishers use the backend’s allocation API to produce a message whose
uint8[] field is already backed by that backend’s memory.
For example, with the CUDA backend:
#include "cuda_buffer/cuda_buffer_api.hpp"
sensor_msgs::msg::Image msg;
msg.data = cuda_buffer_backend::allocate_buffer(width * height * 3);
msg.header.stamp = this->now();
msg.width = width;
msg.height = height;
msg.encoding = "rgb8";
msg.step = width * 3;
{
auto wh = cuda_buffer_backend::from_output_buffer(msg.data, stream_);
my_kernel<<<grid, block, 0, stream_>>>(wh.get_ptr(), ...);
}
publisher_->publish(msg);
A publisher does not need any rclcpp option changes; the backend of the
buffer inside the message is what the RMW sees at publish time.
If a given subscriber has not opted in to that backend, the RMW falls back
to CPU serialization for that peer transparently – the publisher writes the
same message either way.
QoS considerations
QoS compatibility is still checked by the RMW implementation in the usual way. The backend selection controls how eligible buffer fields are represented once a sample is being delivered; it does not replace DDS reliability, history, deadline, lifespan, or liveliness behavior.
The CUDA backend’s zero-copy path is intended for live topic samples. Use volatile durability for CUDA-backed topics. Transient-local durability for late-joining subscribers is not a supported zero-copy CUDA use case, because the descriptor refers to publisher-owned live memory that may be recycled by the backend.
Tensor messages with torch_conversions
The experimental tensor path uses a normal ROS 2 message,
tensor_msgs/msg/ExperimentalTensor.
Its data field is a rosidl::Buffer<uint8_t>, while the other fields
carry DLPack-aligned tensor metadata such as dtype, shape, strides, and byte
offset.
The torch_conversions package is a header-only helper library that fills
that message and converts it to or from at::Tensor; it does not register a
separate buffer backend.
On the publisher side:
#include "torch_conversions/torch_conversions.hpp"
#include "tensor_msgs/msg/experimental_tensor.hpp"
auto guard = torch_conversions::set_stream();
auto msg = torch_conversions::allocate_tensor_msg(
{height, width, 4}, torch::kByte, c10::kCUDA);
torch_conversions::to_tensor_msg(msg, rendered_frame);
publisher_->publish(msg);
On the subscriber side, opt in to the storage backend you want to accept and then convert the received tensor message:
rclcpp::SubscriptionOptions sub_opts;
sub_opts.acceptable_buffer_backends = "cuda";
subscription_ = this->create_subscription<tensor_msgs::msg::ExperimentalTensor>(
"image", 10,
[](const tensor_msgs::msg::ExperimentalTensor::SharedPtr msg) {
auto guard = torch_conversions::set_stream();
at::Tensor frame =
torch_conversions::from_input_tensor_msg(*msg, /*clone=*/false);
consume(frame);
},
sub_opts);
Ensuring a compatible pub/sub pair
Aside from matching backend names ("cuda" on both sides, and so on),
there are three practical rules for making a non-CPU path actually work
end-to-end.
1. Install the same backend on both sides
Both the backend plugin package and its descriptor-message package
(*_backend_msgs) must be installed and available to the process.
If the subscriber is missing either, it will silently fall back to CPU and
log a warning from the RMW buffer-backend loader.
Check with:
$ ros2 pkg list | grep cuda_buffer
Libraries built on top of rosidl::Buffer may have their own dependencies.
For example, torch_conversions uses the cuda backend for CUDA tensor
storage when the CUDA packages are available, but it does not register a
separate buffer backend name.
2. Use the same RMW on both sides
Use the same RMW implementation for publisher and subscriber.
Buffer descriptors are serialized through the RMW’s normal type-support
pipeline, so the ordinary advice for
multiple RMW implementations
applies: set RMW_IMPLEMENTATION consistently.
3. Keep backend versions aligned with ROS 2 core
Backend plugins are ABI-coupled to rosidl_buffer_backend and to the
RMW’s type-support library.
Rebuild installed backends after upgrading ROS 2 core, the same way you
would rebuild any other plugin package.
Transport scope depends on the backend
rosidl::Buffer only guarantees that the descriptor round-trips through
the RMW.
Whether a given backend can actually carry data intra-process,
inter-process on the same host, or inter-host is entirely a property of the
backend implementation.
Consult each backend’s own documentation for its support matrix.
For reference, the CUDA backend currently supports:
intra-process (same Python/C++ process);
inter-process on the same host, same GPU, same user (via CUDA VMM IPC);
inter-host CUDA transport is not currently supported by this backend, so it declines that peer and the publish/subscribe path uses CPU serialization for the field.
Other backends have their own constraints, documented in their own repositories.
Diagnostics
Inspecting negotiated transport
Inside a subscription callback, msg->data.get_backend_type() returns the
backend of the just-received message ("cpu", "cuda", …).
Comparing it to what you expected is the quickest way to tell whether the
zero-copy path actually engaged or the RMW fell back to CPU.
Listing loaded backends at runtime
From C++, the registry exposes a list of discovered plugins:
#include "rosidl_buffer_backend_registry/buffer_backend_registry.hpp"
rosidl_buffer_backend_registry::BufferBackendRegistry registry;
for (const auto & name : registry.get_backend_names()) {
RCLCPP_INFO(rclcpp::get_logger("app"), "Loaded buffer backend: %s", name.c_str());
}
Most backends also log a warning when they fall back to CPU serialization, which is visible at the default log level.
Interaction with other transport features
Loaned messages operate at a different layer: they let the RMW own the message memory, while
rosidl::Bufferbackends control the storage of individual variable-length array fields. The two features can be combined when both the RMW and the backend support it.Intra-process communication is orthogonal: a backend may implement its own intra-process fast path (the CUDA backend does), but the decision is up to the backend.