You're reading the documentation for a development version. For the latest released version, please have a look at Kilted.

Using rosidl::Buffer backends

Overview

rosidl::Buffer<T> is the generated C++ representation of variable-length primitive array fields in ROS 2 messages (uint8[], float32[], …). It behaves like a std::vector<T> by default, but its storage is pluggable so that vendors can back those fields with non-CPU memory (for example, GPU memory) and transport them with as few copies as the RMW and the backend allow.

See About rosidl::Buffer backends for a conceptual overview.

This guide covers:

  • how to keep existing publishers and subscribers working without changes;

  • how to opt a subscription in to a specific set of backends;

  • how to use a backend’s user-facing API to read and write its native data type;

  • how tensor_msgs/msg/ExperimentalTensor and torch_conversions use the same backend mechanism for tensor data;

  • how to make sure publisher and subscriber are configured compatibly.

Default behavior (no changes required)

If no backend is explicitly configured, rosidl::Buffer<T> still behaves like std::vector<T>. Existing code such as

auto msg = std::make_unique<sensor_msgs::msg::Image>();
msg->data.resize(width * height * 3);
std::memcpy(msg->data.data(), source_ptr, msg->data.size());
publisher_->publish(std::move(msg));

continues to compile and run unchanged because rosidl::Buffer<uint8_t> implicitly converts to std::vector<uint8_t> & when its active backend is CPU.

On the subscription side, if a subscription does not opt in to any non-CPU backend (see below), the RMW delivers the data with the CPU backend just like it did before the feature was introduced, so msg->data.size(), msg->data[i], msg->data.data() and so on keep working.

RMW support

Buffer backends require support from the RMW implementation to negotiate and serialize descriptors on the wire.

The first supported RMW integration is rmw_fastrtps_cpp. With other RMW implementations, a subscription that requests a non-CPU backend still functions – it simply receives CPU-backed data, exactly as if acceptable_buffer_backends had not been set.

The non-CPU backend path currently applies to topic publish/subscribe only. Services and actions do not expose an acceptable_buffer_backends option and continue to use their normal request, response, feedback, status, and result serialization paths.

Discovering installed backends

Backend plugins are discovered through pluginlib and live in packages whose package.xml exports the rosidl_buffer_backend plugin description file. To see which backends are installed in the current environment:

$ ros2 pkg list | grep _buffer_backend

Each backend package usually installs two things:

  • the backend plugin itself (a shared library), registered under a short name such as cuda in the plugin XML;

  • a descriptor-message package (*_msgs) containing the backend’s descriptor .msg type.

Both must be installed on every node that participates in the topic – publishers and subscribers.

Enabling a backend on a subscription

Publishers do not advertise a list of backends; they simply publish whatever backend their buffer currently uses. Subscriptions, on the other hand, opt in to non-CPU backends via the acceptable_buffer_backends option on rclcpp::SubscriptionOptions (or the acceptable_buffer_backends keyword argument in rclpy).

The option is a comma-separated string of backend names. It accepts the following forms:

Value

Meaning

empty or "cpu"

CPU only. This is the default and preserves backward compatibility.

"any"

Accept any backend that is installed in this process.

"cuda,mydev"

Accept only the listed backends, in addition to CPU. Names match the plugin name attribute in the backend’s plugin XML.

CPU is always implicitly acceptable, so no matter what is specified, a subscription can always receive CPU-backed messages (for example, when the publisher’s backend cannot serve this particular peer).

C++ example

#include <cuda_runtime.h>

#include "rclcpp/rclcpp.hpp"
#include "sensor_msgs/msg/image.hpp"
#include "cuda_buffer/cuda_buffer_api.hpp"

class GpuImageSubscriber : public rclcpp::Node
{
public:
  GpuImageSubscriber()
  : Node("gpu_image_subscriber")
  {
    cudaStreamCreate(&stream_);

    rclcpp::SubscriptionOptions sub_opts;
    sub_opts.acceptable_buffer_backends = "cuda";

    subscription_ = this->create_subscription<sensor_msgs::msg::Image>(
      "image", 10,
      [this](sensor_msgs::msg::Image::ConstSharedPtr msg) {
        if (msg->data.get_backend_type() == "cuda") {
          // Zero-copy GPU path: read the device pointer directly.
          auto rh = cuda_buffer_backend::from_input_buffer(msg->data, stream_);
          process_on_gpu(rh.get_ptr(), msg->data.size(), stream_);
        } else {
          // CPU fallback: msg->data behaves like std::vector<uint8_t>.
          process_on_cpu(msg->data.data(), msg->data.size());
        }
      },
      sub_opts);
  }

private:
  rclcpp::Subscription<sensor_msgs::msg::Image>::SharedPtr subscription_;
  cudaStream_t stream_{nullptr};
};

Two things to note:

  • The subscription only declares which backends are acceptable; the actual backend of a received message is whatever the publisher sent, subject to negotiation. Always branch on msg->data.get_backend_type() before using a backend-specific API.

  • Backend APIs (cuda_buffer_backend::from_input_buffer, cuda_buffer_backend::from_output_buffer, …) are provided by the backend package, not by rclcpp. Consult each backend’s own documentation for its full API surface.

Python

rclpy exposes the same option as a keyword argument on Node.create_subscription:

self.create_subscription(
    Image,
    'image',
    self.image_callback,
    10,
    acceptable_buffer_backends='cuda')

On the Python side the data field still surfaces as a byte sequence; backend user-facing APIs are currently C++-only, so Python subscribers that accept non-CPU backends typically rely on the backend’s own CPU-fallback path (the RMW serializes the buffer to CPU if the backend cannot serve the Python endpoint directly).

Publisher side

Publishers use the backend’s allocation API to produce a message whose uint8[] field is already backed by that backend’s memory. For example, with the CUDA backend:

#include "cuda_buffer/cuda_buffer_api.hpp"

sensor_msgs::msg::Image msg;
msg.data = cuda_buffer_backend::allocate_buffer(width * height * 3);
msg.header.stamp = this->now();
msg.width = width;
msg.height = height;
msg.encoding = "rgb8";
msg.step = width * 3;

{
  auto wh = cuda_buffer_backend::from_output_buffer(msg.data, stream_);
  my_kernel<<<grid, block, 0, stream_>>>(wh.get_ptr(), ...);
}

publisher_->publish(msg);

A publisher does not need any rclcpp option changes; the backend of the buffer inside the message is what the RMW sees at publish time. If a given subscriber has not opted in to that backend, the RMW falls back to CPU serialization for that peer transparently – the publisher writes the same message either way.

QoS considerations

QoS compatibility is still checked by the RMW implementation in the usual way. The backend selection controls how eligible buffer fields are represented once a sample is being delivered; it does not replace DDS reliability, history, deadline, lifespan, or liveliness behavior.

The CUDA backend’s zero-copy path is intended for live topic samples. Use volatile durability for CUDA-backed topics. Transient-local durability for late-joining subscribers is not a supported zero-copy CUDA use case, because the descriptor refers to publisher-owned live memory that may be recycled by the backend.

Tensor messages with torch_conversions

The experimental tensor path uses a normal ROS 2 message, tensor_msgs/msg/ExperimentalTensor. Its data field is a rosidl::Buffer<uint8_t>, while the other fields carry DLPack-aligned tensor metadata such as dtype, shape, strides, and byte offset. The torch_conversions package is a header-only helper library that fills that message and converts it to or from at::Tensor; it does not register a separate buffer backend.

On the publisher side:

#include "torch_conversions/torch_conversions.hpp"
#include "tensor_msgs/msg/experimental_tensor.hpp"

auto guard = torch_conversions::set_stream();
auto msg = torch_conversions::allocate_tensor_msg(
  {height, width, 4}, torch::kByte, c10::kCUDA);
torch_conversions::to_tensor_msg(msg, rendered_frame);
publisher_->publish(msg);

On the subscriber side, opt in to the storage backend you want to accept and then convert the received tensor message:

rclcpp::SubscriptionOptions sub_opts;
sub_opts.acceptable_buffer_backends = "cuda";

subscription_ = this->create_subscription<tensor_msgs::msg::ExperimentalTensor>(
  "image", 10,
  [](const tensor_msgs::msg::ExperimentalTensor::SharedPtr msg) {
    auto guard = torch_conversions::set_stream();
    at::Tensor frame =
      torch_conversions::from_input_tensor_msg(*msg, /*clone=*/false);
    consume(frame);
  },
  sub_opts);

Ensuring a compatible pub/sub pair

Aside from matching backend names ("cuda" on both sides, and so on), there are three practical rules for making a non-CPU path actually work end-to-end.

1. Install the same backend on both sides

Both the backend plugin package and its descriptor-message package (*_backend_msgs) must be installed and available to the process. If the subscriber is missing either, it will silently fall back to CPU and log a warning from the RMW buffer-backend loader.

Check with:

$ ros2 pkg list | grep cuda_buffer

Libraries built on top of rosidl::Buffer may have their own dependencies. For example, torch_conversions uses the cuda backend for CUDA tensor storage when the CUDA packages are available, but it does not register a separate buffer backend name.

2. Use the same RMW on both sides

Use the same RMW implementation for publisher and subscriber. Buffer descriptors are serialized through the RMW’s normal type-support pipeline, so the ordinary advice for multiple RMW implementations applies: set RMW_IMPLEMENTATION consistently.

3. Keep backend versions aligned with ROS 2 core

Backend plugins are ABI-coupled to rosidl_buffer_backend and to the RMW’s type-support library. Rebuild installed backends after upgrading ROS 2 core, the same way you would rebuild any other plugin package.

Transport scope depends on the backend

rosidl::Buffer only guarantees that the descriptor round-trips through the RMW. Whether a given backend can actually carry data intra-process, inter-process on the same host, or inter-host is entirely a property of the backend implementation. Consult each backend’s own documentation for its support matrix.

For reference, the CUDA backend currently supports:

  • intra-process (same Python/C++ process);

  • inter-process on the same host, same GPU, same user (via CUDA VMM IPC);

  • inter-host CUDA transport is not currently supported by this backend, so it declines that peer and the publish/subscribe path uses CPU serialization for the field.

Other backends have their own constraints, documented in their own repositories.

Diagnostics

Inspecting negotiated transport

Inside a subscription callback, msg->data.get_backend_type() returns the backend of the just-received message ("cpu", "cuda", …). Comparing it to what you expected is the quickest way to tell whether the zero-copy path actually engaged or the RMW fell back to CPU.

Listing loaded backends at runtime

From C++, the registry exposes a list of discovered plugins:

#include "rosidl_buffer_backend_registry/buffer_backend_registry.hpp"

rosidl_buffer_backend_registry::BufferBackendRegistry registry;
for (const auto & name : registry.get_backend_names()) {
  RCLCPP_INFO(rclcpp::get_logger("app"), "Loaded buffer backend: %s", name.c_str());
}

Most backends also log a warning when they fall back to CPU serialization, which is visible at the default log level.

Interaction with other transport features

  • Loaned messages operate at a different layer: they let the RMW own the message memory, while rosidl::Buffer backends control the storage of individual variable-length array fields. The two features can be combined when both the RMW and the backend support it.

  • Intra-process communication is orthogonal: a backend may implement its own intra-process fast path (the CUDA backend does), but the decision is up to the backend.