In my last blog post, I described an approach for synchronizing the broadcast intervals of the RF modules in the network based on the system clock. In this blog post, I’d like to compare and contrast this open-loop approach with an alternative approach to synchronize the modules in lock-step. Both methods have their merits and downsides that need to be considered.
For background, these RF modules are not the only link between the robots and the control computer. The goal is not use them to establish a reliable link and send data between them. Instead, we use them to collect data on the link quality and signal strength while the robots run. There is a secondary WiFi network that links the robots to a central computer which controls the robots and assembles the datasets for CoLo AT.
Local Clock Synchronization
The method I described last week allowed each robot in the network to independently determine when to safely transmit based on their internal clocks as shown in the figure above. This relied on the assumption that the clocks were synchronized between each robot. To keep this assumption true, NTP (Network Time Protocol, a standard for network time synchronization) was used to keep the clocks in sync ideally under 1 ms [1]. If even higher accuracy is required, then other protocols like PTP (Precision Time Protocol) which can maintain accuracy under 1 microsecond.
Advantages:
- Simple to Implement: Since each robot uses their own clock, the program that controls the RF module does not need to worry about networking at all.
- Capable of handling more simultaneous robots at higher update rates
- Minimizes network usage: The WiFi network is already taxed with multiple ssh connections
Disadvantages:
- One major downside to this method is that it is fundamentally open loop. There is no guarantee that no two modules will be ordered to transmit at the same time since one robot does not check the status of any other robot.
Network Based Synchronziation
The alternative method is to have a central computer on the network explicitly coordinate when each of the robots can transmit. I implemented this using ZeroMQ REQ and REP sockets between the central computer and each of the robots. At a set interval, the coordinating computer will first send a request to enter receiving mode to all but one robot. Once all the robots have responded confirming their status, the computer will send a request to the one remaining robot to send transmit its message. Finally once the transmitting robot responds that it is finished, the cycle repeats selecting a different robot to transmit. This occurs \(n\times r\) times per second where \(n\) is the number of robots and \(r\) is the number of updates per second desired.
Advantages:
- Guarantees synchronization: The coordinator will not tell the robot to transmit until it has confirmed that all other robots are ready to receive.
Disadvantages:
- Even if the requests to set the robots to receive mode are done in parallel, the maximum rate at which it can produce updates is \((2p)^{-1}\), where \(p\) is the ping.
Which method should we use?
While only the second method guarantees that the radios will never transmit when other radios are not ready, the rate at which these updates can be made is severely limited by the ping between the robots in the WiFi network. While NTP can also be affected by longer pings, it is overall more tolerant of network congestion than the alternative which requires the same number of network requests as robots in the network per update.
382 packets transmitted, 381 received, 0% packet loss, time 381244ms
rtt min/avg/max/mdev = 6.388/253.312/639.316/145.043 ms
In my tests, the average ping between two computers in our network was ~250 ms. While this is abnormal for a local network and requires further investigation, if we consider this to be accurate, with only 5 robots running, we can only make updates up to \((2\times 0.25)^{-1}=2\) times per second.
[1] https://www.eecis.udel.edu/~mills/ntp/html/discipline.html