eCAL Zero Copy

Note

The eCAL Zero Copy mode was introduced in two steps:

eCAL 5.10 (zero-copy on subscriber side only, still one memory copy on the publisher side)
eCAL 5.12 (zero-copy on publisher and subscriber side, see Full Zero Copy Behavior)

In all versions it is turned off by default.

Enabling eCAL Zero Copy

Use zero-copy as system-default:

Zero Copy can be enabled as system default from the ecal.yaml file like follows:

# Publisher specific base settings
publisher:
  layer:
  # Base configuration for shared memory publisher
    shm:
      [..]
      # Enable zero copy shared memory transport mode
      zero_copy_mode: true

Use zero-copy for a single publisher (from your code):

Zero-copy can be activated (or deactivated) for a single publisher by using a specific publisher configuration:
```
#include <ecal/config/publisher.h>

...

// Create a publisher configuration object
eCAL::Publisher::Configuration pub_config;

// Set the option for zero copy mode in layer->shm->zero_copy_mode to true
pub_config.layer.shm.zero_copy_mode = true;

// Create a publisher (topic name "person") and pass the configuration object
eCAL::protobuf::CPublisher<pb::People::Person> pub("person", pub_config);

...
```
Keep in mind, that using protobuf for serialization will still:
1. Require to set the data in the protobuf object
2. Later cause a copy by serializing into the SHM buffer.
If you want to avoid this copy, you can use the low-level API to directly operate on the SHM buffer.

Full Zero Copy behavior

The Full eCAL Zero Copy mechanism is working for local (inner-host) publish-subscribe connections only. Sending data over a network connection will not benefit from that feature.

Shared-Memory-only connection

This describes the case, where a publisher publishes it’s data only via shared memory to 1 or more subscribers.

Publisher:

Protobuf API Level:
1. The user sets the data in the protobuf object
2. The publisher locks the SHM buffer.
  
  This operation may take some time, as the publisher needs to wait for the subscriber to release the buffer. This can be relaxed by using the multi-buffering feature.
3. The publisher serializes the protobuf object directly into the SHM buffer
  
  Due to the technical implementation of protobuf, this will cause the entire message to be serialized and re-written
4. The publisher unlocks the SHM buffer
Binary API Level:
1. The publisher locks the SHM buffer
2. The user directly writes data to the SHM buffer.
3. The publisher unlocks the SHM buffer
4. The publisher informs all connected subscriber

Subscriber:

The subscriber locks the SHM buffer

The subscriber will need to wait for any publisher and subscriber to unlock the buffer. Currently, there is no parallel read access to the SHM buffer.
The subscriber calls the callback function directly with the SHM buffer as parameter
After the callback has finished, the subscriber unlocks the SHM buffer

Mixed Layer connection

This describes the case where a publisher publishes its data parallel via shared memory and network (tcp or udp). So we have at least one local subscription and one external (network) subscription on the provided topic.

Publisher:

Regardless of whether the data is generated by a Low Level Binary Publisher or by a Protobuf Publisher, it is always written to an process internal cache first. This memory cache is then passed sequentially to the connected transport layers “shared memory”, “innerprocess”, “udp” and “tcp” in this order.

Compared to the Full Zero Copy behavior described above with only local (shm) connections, we have a copy of the user payload on the publisher side again.

This leads to the following publication sequence for a local connection:

Protobuf API Level:
1. The user sets the data in the protobuf object
2. The publisher serializes the protobuf object into a process internal data cache
3. The publisher locks the SHM buffer.
4. The publisher copies the process internal data cache to the SHM buffer.
5. The publisher unlocks the SHM buffer
Binary API Level:
1. The publisher copies the binary user data into a process internal data cache
2. The publisher locks the SHM buffer
3. The publisher copies the process internal data cache to the SHM buffer.
4. The publisher unlocks the SHM buffer
5. The publisher informs all connected subscriber

Subscriber:

Subscribers will always use Zero Copy, if enabled. So they will directly read from the SHM buffer.

The subscriber locks the SHM buffer
The subscriber calls the callback function directly with the SHM buffer as parameter
After the callback has finished, the subscriber unlocks the SHM buffer

Low Level Memory Access

For unleashing the full power of Full eCAL Zero Copy, the user needs to directly work on the eCAL Shared Memory via the CPayloadWriter API. The idea behind the new CPayloadWriter API is to give the user the possibility to modify only the data in the memory that has changed since the last time the date was sent. The aim is to avoid writing the complete memory and thus save computing time and reduce the latency of data transmission.

The new payload type CPayloadWriter looks like this (all functions unnecessary for the explanation have been omitted):

/**
 * @brief Base payload writer class to allow zero copy memory operations.
 *
 * This class serves as the base class for payload writers, allowing zero-copy memory
 * operations. The `WriteFull` and `WriteModified` calls may operate on the target
 * memory file directly in zero-copy mode.
 *
 * A partial writing / modification of the memory file is only possible when zero-copy mode
 * is activated. If zero-copy is not enabled, the `WriteModified` method is ignored and the
 * `WriteFull` method is always executed (see CPublisher::ShmEnableZeroCopy)
 *
 */
class CPayloadWriter
{
public:
  /**
   * @brief Perform a full write operation on uninitialized memory.
   *
   * This virtual function allows derived classes to perform a full write operation
   * when the provisioned memory is uninitialized. Typically, this is the case when a
   * memory file had to be recreated or its size had to be changed.
   *
   * @param buffer_ Pointer to the buffer containing the data to be written.
   * @param size_   Size of the data to be written.
   *
   * @return True if the write operation is successful, false otherwise.
   */
  virtual bool WriteFull(void* buffer_, size_t size_) = 0;

  /**
   * @brief Perform a partial write operation to modify existing data.
   *
   * This virtual function allows derived classes to modify existing data when the provisioned
   * memory is already initialized by a WriteFull call (i.e. contains the data from that full write operation).
   *
   * The memory can be partially modified and does not have to be completely rewritten, which leads to significantly
   * higher performance (lower latency).
   *
   * If not implemented (by default), this operation will just call the `WriteFull` function.
   *
   * @param buffer_ Pointer to the buffer containing the data to be modified.
   * @param size_   Size of the data to be modified.
   *
   * @return True if the write/update operation is successful, false otherwise.
   */
  virtual bool WriteModified(void* buffer_, size_t size_) { return WriteFull(buffer_, size_); };

  /**
   * @brief Get the size of the required memory.
   *
   * This virtual function allows derived classes to provide the size of the memory
   * that eCAL needs to allocate.
   *
   * @return The size of the required memory.
   */
  virtual size_t GetSize() = 0;
};

The user must derive his own playload data class and implement at least the WriteFull function. This WriteFull function will be called by the low level eCAL SHM layer when finally the shared memory file needs to be written the first time (initial full write action).

For writing partial content (modifying the memory content) the user may define a second function called WriteModified. This function is called by the eCAL SHM layer if the shared memory file is in an initialized state i.e. if it was written with the previously mentioned WriteFull method. As you can see, the WriteModified function simply calls the WriteFull function by default if it is not overwritten.

The implementation of the GetSize method is mandatory. This method is used by the eCAL SHM layer to obtain the size of the memory file that needs to be allocated.

Example:

The following primitive example shows the usage of the CPayloadWriter API to send a simple binary struct efficient by implementing a full WriteFull and an WriteModified method that is modifying a few struct elements without memcopying the whole structure again into memory. Note the in case of the none Full Zero Copy Mode only the WriteFull function will be called by eCAL.

This is the customized new payload writer class. The WriteFull method is creating a new SSimpleStruct struct, updating its content and copying the whole structure into the opened shared memory file buffer. The WriteModified method gets a view of the opened shared memory file, and applies modifications on the struct elements clock and bytes by just apllying UpdateStruct.

// a simple struct to demonstrate
// zero copy modifications
struct alignas(4) SSimpleStruct
{
  uint32_t version      = 1;
  uint16_t rows         = 5;
  uint16_t cols         = 3;
  uint32_t clock        = 0;
  uint8_t  bytes[5 * 3] = { 0 };
};

// a binary payload object that handles
// SSimpleStruct WriteFull and WriteModified functionality
class CStructPayload : public eCAL::CPayloadWriter
{
public:
  // Write the complete SSimpleStruct to the shared memory
  bool WriteFull(void* buf_, size_t len_) override
  {
    // check available size and pointer
    if (len_ < GetSize() || buf_ == nullptr) return false;

    // create a new struct and update its content
    SSimpleStruct simple_struct;
    UpdateStruct(&simple_struct);

    // copy complete struct into the memory
    *static_cast<SSimpleStruct*>(buf_) = simple_struct;

    return true;
  };

  // Modify the SSimpleStruct in the shared memory
  bool WriteModified(void* buf_, size_t len_) override
  {
    // check available size and pointer
    if (len_ < GetSize() || buf_ == nullptr) return false;

    // update the struct in memory
    UpdateStruct(static_cast<SSimpleStruct*>(buf_));

    return true;
  };

  size_t GetSize() override { return sizeof(SSimpleStruct); };

private:
  void UpdateStruct(SSimpleStruct* simple_struct)
  {
    // modify the simple_struct
    simple_struct->clock = clock;
    for (auto i = 0; i < (simple_struct->rows * simple_struct->cols); ++i)
    {
      simple_struct->bytes[i] = static_cast<char>(simple_struct->clock);
    }

    // increase internal state clock
    clock++;
  };

  uint32_t clock = 0;
};

To send this payload you just need a few lines of code:

#include <ecal/config/publisher.h>

int main(int argc, char** argv)
{
  // initialize eCAL API
  eCAL::Initialize(argc, argv, "binary_payload_snd");

  // Create a publisher configuration object
  eCAL::Publisher::Configuration pub_config;

  // Set the option for zero copy mode in layer->shm->zero_copy_mode to true
  pub_config.layer.shm.zero_copy_mode = true;

  // publisher for topic "simple_struct"
  eCAL::CPublisher pub("simple_struct", pub_config);

  // create the simple struct payload
  CStructPayload struct_payload;

  // send updates every 100 ms
  while (eCAL::Ok())
  {
    pub.Send(struct_payload);
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
  }

  // finalize eCAL API
  eCAL::Finalize();

  return(0);
}

Default eCAL SHM vs. Full Zero Copy SHM

Default eCAL SHM vs. Full Zero Copy SHM
	Default eCAL SHM	Full Zero Copy SHM
Memcopies	❌ 2 additional memcpy (1 for publishing, 1 for each subscriber)	✅ No memcpy (if Low Level API is used)
Partial changes	❌ Changing only 1 byte causes the entire updated message to be copied to the buffer, again	✅ Changing only 1 byte only costs as much as changing that 1 byte in the target memory, independent from the message size
Subscriber decoupling	✅ Good decoupling between subscribers. Subscribers only block each other for the duration of that 1 memcpy	❌ Subscribers need to wait for each other to finish their callbacks
Pub/Sub decoupling	✅ Good decoupling between publisher and subscribers. If the serialization takes a long time, this can be done beforehand without having a lock on the SHM buffer Publishers don’t have to wait for the subscribers to finish their callbacks, only for them to copy the data to their own process memory	❌ Subscribers may block publishers Publishers need to wait for all subscriber callbacks to finish Publishers need to keep the the SHM buffer locked while performing the message serialization

Combining Zero Copy and Multibuffering

For technical reasons the Full Zero Copy mode described above is turned off if the Multibuffering option CPublisher::ShmSetBufferCount is activated.

Default (subscriber side) Zero Copy is working in combination with Multibuffering as described.