habib's rabbit hole

Zero to ROS2 - Building pub/sub from scratch

Why?

The goal was simple: understand why ROS2 chose DDS as its middleware by building the alternative from scratch first. Instead of reading docs, implement a ZeroMQ pub/sub system, benchmark it, then implement the same thing in DDS and compare. Every design decision becomes visible when you have to make it yourself.

Stack: Python 3.8, pyzmq, CycloneDDS 0.10.5, matplotlib. Machine: Ubuntu 18.04 + ROS1 Noetic (no ROS2 available — used CycloneDDS Python bindings directly). All experiments on loopback (localhost), CPU-only.

Architecture:

Chose round-trip latency over one-way. One-way needs clock sync between processes. RTT doesn't — publisher timestamps the message, subscriber echoes it back unchanged, publisher measures recv_time minus send_time. No NTP. Clean.

ZeroMQ — two sockets per side

publisher: PUB bind tcp://:5555 (send pings) SUB connect localhost:5556 (receive echoes) subscriber: SUB connect localhost:5555 (receive pings) PUB bind tcp://:5556 (send echoes)

architecutre_zmq

DDS — two topics, no addresses

publisher: DataWriter → topic 'ping' DataReader ← topic 'pong' subscriber: DataReader ← topic 'ping' DataWriter → topic 'pong'

No IP. No port. DDS Participant Discovery Protocol (PDP) broadcasts UDP multicast beacons on 239.255.0.1:7400. Publisher and subscriber find each other by topic name automatically.

Bugs Hit and Fixed (includes some stupid mistakes)

ZeroMQ

• Double bind — created two sockets both binding to port 5555. Fix: deleted the orphan. • RTT measured nothing — called time.time_ns() immediately after send_multipart() before any echo arrived. send_multipart() is non-blocking. Fix: block on recv_multipart() before recording recv time.

• Subscriber received nothing — setsockopt(SUBSCRIBE, b"") called on wrong socket variable. ZMQ SUB sockets receive nothing by default. Fix: set filter on the correct object.

• Slow joiner — ZMQ has no discovery. connect() is async. First messages sent before TCP handshake drop silently. Fix: time.sleep(0.5) before timed run. This is a hack.

CycloneDDS

• byte not found — cyclonedds.idl.types has no 'byte' alias in 0.10.5. Fix: use uint8 instead. Identical at wire level.

• PingMsg type mismatch — struct definition must be byte-for-byte identical on both sides. DDS rejects mismatched types at connection time.

ZeroMQ Benchmark Results

Payload sweep at 100Hz. 1000 messages per condition. Fixed rate, variable payload size.

zeromq benchmark results

Key finding: median barely moves across payload sizes. Fixed ZMQ overhead (~1.2ms TCP stack + socket wakeup) dominates. Bytes are not the bottleneck at 100Hz on loopback.

ZMQ round-trip latency percentile — payload sweep

Read this chart: the gap between the median bar (green) and the p99 bar (red) is jitter. 10KB and 1KB have the tightest gap — lowest jitter — despite being 156x larger than 64B.

zmq latency dist

histogram

The spike at 64B seq~700 (8.5ms) is an OS scheduling event — Linux preempted the process. Not a ZMQ bug. The tail gets fatter at 100KB (p99 jumps to 2.575ms) because larger buffer copies give the scheduler more opportunities to interrupt.

DDS Benchmark Results

Same benchmark, same machine, same payload sweep, same 100Hz rate. Only the middleware changed.

results

Latency Percentiles

latency percentiles

RTT Distribution

rttd

100KB DDS (coral) is a completely separate distribution — peaks at 7-9ms while 64B/1KB/10KB all peak below 1ms. This is not jitter. It is a different operating regime caused by UDP fragmentation.

RTT over time

rtt over time

1KB panel: high variance in first ~150 messages, then dramatically settles to a tight band. That is DDS RTPS endpoint matching completing mid-run. After discovery is done, DDS is faster and more stable than ZMQ was.

Head to head comparison

table

There is no single winner. The result depends entirely on payload size.

Findings

  1. DDS uses UDP. ZMQ uses TCP. At small payloads UDP has no connection overhead, no ACK, no Nagle delay — it is faster. At 100KB a single message exceeds UDP's maximum payload per packet (1472 bytes on standard Ethernet MTU). DDS must fragment 100KB into ~70 UDP packets, send them separately, and reassemble. Fragmentation cost grows with payload size. TCP cost doesn't. They cross between 10KB and 100KB.

  2. At 10KB: ZMQ p99 = 2.074ms, DDS p99 = 2.845ms. ZMQ wins on p99 despite DDS having lower median. If you chose middleware based on median alone you'd pick DDS for 10KB and get worse worst-case behavior. The metric that determines control loop reliability is jitter: p99 minus median. Not median alone.

  3. Published literature (Vanderbilt 2020, eProsima) shows ZMQ faster at small payloads in C++ over real networks. Our Python loopback results show DDS faster at small payloads. On loopback there is no actual network unreliability — TCP's reliability machinery (ACKs, flow control, connection state) is pure overhead with zero benefit. UDP's advantage is more pronounced. The finding is real for this deployment context and should be stated as such, not hidden.

What was not measured

• Rate sweep (10Hz / 100Hz / 1000Hz / flood) — would show where ZMQ drops messages and DDS retransmits

• Multi-subscriber scaling — DDS multicast advantage grows with subscriber count, ZMQ unicasts to each

• RELIABLE vs BEST_EFFORT QoS — ran default QoS throughout

• Real network vs loopback — findings may not transfer to multi-machine deployment

• C++ baseline — cannot isolate Python binding overhead from protocol overhead

Each of these is a known limitation, not a hidden flaw. The rate sweep is the highest priority next experiment — that is where drop behavior and QoS differences become visible.