Reducing latency in Serial communication

mnissov

Well-known member
I am working on a sensor triggering project with a teensy and am nearing the end, but have stumbled upon some very haphazard latency in the serial communication to the computer. The code is relatively long at this point so I won't post everything but the basic gist is
Code:
#include <Arduino.h>
#include <TeensyTimerTool.h>

const uint8_t _pin_ = 1;  // output pin
IntervalTimer g_trigger_timer;
TeensyTimerTool::OneShotTimer g_one_shot_timer;

elapsedMicros g_micros_since_second;
volatile uint32_t g_seconds;

void periodicCallback()
{
  // get current time [seconds, micros since second]

  // set trigger pin HIGH
  digitalWriteFast(_pin_, HIGH);
  // set one shot timer to high time (10us)
  // g_one_shot_timer.trigger(10)

  // pack information into message
  // [seconds, micros, count]

  // Send message (10 bytes)
  // Serial.write(msg, 10);
  // Serial.send_now();
}

elapsedMillis g_led_millis;
uint32_t g_led_interval = 1000;
void setup()
{
  // setting pin modes
  // setting PPM interrupt for rising edge with callback:
  //   void ppmInterrupt(){
  //       g_micros_since_second = 0;
  //       g_seconds++;
  //   }
  setup_pins();

  Serial.begin(9600);
  while (!Serial) {
  }

  // connecting to RTC for PPM timekeeping signal
  if (!connectToRTC()) {
    Serial.println("RTC problems");
  }

  g_trigger_timer.begin(periodicCallback, 5000);
  g_one_shot_timer.begin([] { digitalWriteFast(_pin_, LOW); });

  g_led_millis = 0;
}

void loop()
{
  if (g_led_millis > g_led_interval) {
    digitalToggleFast(LED_BUILTIN);
    g_led_millis -= g_led_interval;
  }
}
Instead of listing every last piece of code, I tried to describe it as meaningfully as I thought was necessary, hopefully this is understandable.

Now this is totally capable of generating the trigger square wave with low variance along with the corresponding timestamp
temp1.png

The problem
The problem is there is often delay in communicating this 10 byte message to the PC. I tried to do some basic logging of the sampling time of this message (with a C++ node and std::chrono::system_clock::now()) and found that, for two serparate runs:
Code:
serialRate: mean_ms=4.99, max_ms=11.99, min_ms=0.02, total_s=304.79
serialRate: mean_ms=5.00, max_ms=8.47, min_ms=2.11, total_s=562.42

Note logging happened by way of
Code:
while(True){
     if (serial_conn_->available()) {
         if (first_iteration) {
             first_iteration = false;
         } else {
             profiler_.stop("serialRate");
         }
         serial_conn_->read(buffer, expected_size_);
         profiler_.start("serialRate");
    }
}
where
  • serial_conn_ comes from https://github.com/wjwwood/serial
  • buffer is std::vector<uint8_t>
  • expected_size_ = 10
  • profiler_.start/profiler_.stop makes called to std::chrono::system_clock::now() for timing


The output rate here is 200Hz, I believe there was a message which was delayed until arriving almost simultaneously with the following sampling time, and this is the reason for the max_ms being double the sampling rate and the min_ms being so small. As far as I understand, USB serial w/ teensy 4.1 isn't set by Serial.begin(9600); and is actually either 12 or 480 Mbit/sec, following USB speeds.

A 10 byte message at 12 Mbit/sec is should take 0.0067 ms = (10 × 8) ÷ (12*10^6) × 1000. So that is far and away sufficiently fast. I figure this means that the problem lies in the serial interface to my laptop, just to note I have limited experience with usb communication at a very hands-on level so please correct me if I say anything wrong.

Testing USB comm latency
I also employed a different test of the latency, I can't quite remember the source but I found this in another forum post:
Code:
#include <Arduino.h>

void setup()
{
  Serial.begin(9600);
  Serial.flush();
  delay(1000);
}

const size_t buf_len = 10;
uint8_t buffer[buf_len];

void loop()
{
  size_t buf_pos = 0;
  for (size_t buf_pos = 0; buf_pos < buf_len; buf_pos++) {
    while (Serial.available() <= 0)
      ;
    int c = Serial.read();
    buffer[buf_pos] = c;
    if (c == 13 || c == 10) {
      Serial.write(buffer, buf_pos + 1);
      Serial.send_now();
      break;
    }
  }
}

Code:
#!/usr/bin/env python3

import serial
import sys
import io
import time

message_count = 10

if len(sys.argv) != 2:
    print("Usage: test-lat <Serial Port>")
    raise SystemExit

port_name = sys.argv[1]

ser = serial.Serial(port_name, 9600, xonxoff=False, rtscts=False, dsrdtr=False)
ser.flushInput()
ser.flushOutput()

sio = io.TextIOWrapper(io.BufferedRWPair(ser, ser, 1), encoding="ascii", newline="\n")
sio._CHUNK_SIZE = 1

max_time = 0
total_time = 0
for i in range(message_count):
    start_time = time.perf_counter()
    out_msg = str(i) + "\n"
    sio.write(out_msg)
    sio.flush()
    in_msg = sio.readline()
    if out_msg != in_msg:
        print("Error: invalid response: ", in_msg)
    end_time = time.perf_counter()
    elapsed = end_time - start_time
    if elapsed > max_time:
        max_time = elapsed
    total_time += elapsed

print(f"Message count: {message_count}")
print(f"Average time: {total_time / message_count * 1e3} ms")
print(f"Max time: {max_time*1e3} ms")

ser.close()

Code:
❯ python3 test_serial_latency.py /dev/ttyACM0
Message count: 10
Average time: 0.15015669050626457 ms
Max time: 0.22571900626644492 ms
❯ python3 test_serial_latency.py /dev/ttyACM0
Message count: 10
Average time: 0.11744970106519759 ms
Max time: 0.17464201664552093 ms
❯ python3 test_serial_latency.py /dev/ttyACM0
Message count: 10
Average time: 0.1389778859447688 ms
Max time: 0.2486809971742332 ms
❯ python3 test_serial_latency.py /dev/ttyACM0
Message count: 10
Average time: 0.1252930029295385 ms
Max time: 0.17271301476284862 ms
❯ python3 test_serial_latency.py /dev/ttyACM0
Message count: 10
Average time: 0.16377330175600946 ms
Max time: 0.2868599840439856 ms
For the same message size, this seems to be quite small. In all my attempts bounded by 0.5ms, so not entirely sure what the problem is that I'm experiencing. To be fair, when it happens it happens extremely infrequently, so my trying this five or six times perhaps isn't sufficient to expose the problem.

Question
I know for FTDI one can reduce the latency timer for devices sending small messages such that they don't get bunched up into a single packet. But I am connected to the teensy through /dev/ttyACM0, so I guess this isn't an option. Are there some options I can look into for checking the usb port more often, or some methods I can apply for reducing this latency? As the signal rate is 200 Hz, latency >1ms begins to be a little problematic.
 
Sorry, I am probably not much help here. But with USB latency, when talking with PC/RPI...

When I have worked on different simple robots in the past, where I used a Teensy or other board as a Servo controller and the like, talking to the PC running Linux.

That on the Host side, using a logical OutputFlush() operation (tcdrain on Termios) really helped with latency when talking to an FTDI device,
AND really hurt when not using an FTDI device.

that is, on FTDI(/dev/ttyUSBn), device I believe there is an IOCTL that knows when the USB operation has completed or the like,
but none FTDI like /dev/ttyACMx it felt like maybe this code is not implemented, so they just put in a long delay figuring it for sure would be output...
So my underlying code actually checked the device name and only if USBxx would it call tcdrain...

But I don't know if this applies to your setup or problem.
 
When you say
AND really hurt when not using an FTDI device.
do you mean the latency suffered when using an OutputFlush() on a non FTDI device, or that latency in general suffered with non FTDI devices?

Please correct me if I'm wrong, but the teensy 4.1 (and teensys in general) are non FTDI devices, right? I figure this is the case because (as far as I understand) FTDI devices are put in /dev/ttyUSBx ports instead of ACM. Is there a way to interpret the teensy as a FTDI device? I guess no because it's missing the hardware?
 
Question
I know for FTDI one can reduce the latency timer for devices sending small messages such that they don't get bunched up into a single packet. But I am connected to the teensy through /dev/ttyACM0, so I guess this isn't an option.

Your program already has Serial.send_now().

As far as transmitting is concerned, that is the best you can do. Like FTDI, Teensy does have a timeout where it waits to transmit data in hopes of filling a full USB packet to make best use of bandwidth. Calling Serial.send_now() causes any partially filled buffer to immediately be given to the USB hardware for transmit to your PC.

There are many complexities to the USB protocol, but I'm going to refrain from typing a lengthy message because those effects are much smaller and for all practical purposes they're outside of your control.


Are there some options I can look into for checking the usb port more often, or some methods I can apply for reducing this latency? As the signal rate is 200 Hz, latency >1ms begins to be a little problematic.

But regarding your overall design, this approach where you generate the 200 Hz timing on the PC side and transmit commands and then wait for replies means you will doubly suffer the PC's poor real-time performance, first for transmitting those commands and then for receiving the replies.

The sad reality is PC operating systems have poor real-time performance. If using Windows, it will sometimes run in power saving mode where programs can have about 16 ms scheduling latency. You can use timeBeginPeriod() to cause Windows to run in a mode with (usually) 1ms scheduling performance.

A better design would have Teensy perform the measurement at 200 Hz and transmit the messages without waiting to receive a command.
 
Any chance you could write the measuring test code in "C" for the PC instead of python? Python has shown problems with throughput speed - and it may be affecting the results found.

Also, the baud of 9600 is a bit short of the Teensy 4.1's 480,000,000 - not sure if python uses that value when presenting the data. The Teensy will hardware connect at max speed - but the software may be rate limiting data presented.

No, Teensy doesn't look/act like FTDI. @KurtE perhaps made notes as the p#1 made mention of them in case one was involved.
 
As far as transmitting is concerned, that is the best you can do. Like FTDI, Teensy does have a timeout where it waits to transmit data in hopes of filling a full USB packet to make best use of bandwidth. Calling Serial.send_now() causes any partially filled buffer to immediately be given to the USB hardware for transmit to your PC.
So I've done everything possible on the teensy side of things. I know it's not a general PC forum but what about on the computer side? Are there settings for usb interfaces which can be tweaked? I wish I could ask more specifically but I'm not extremely aware of how usb comms work

But regarding your overall design, this approach where you generate the 200 Hz timing on the PC side and transmit commands and then wait for replies means you will doubly suffer the PC's poor real-time performance, first for transmitting those commands and then for receiving the replies.
This was perhaps misworded from my side. The way it works is
1. Generate the signal
2. package the timestamp
3. send via usb to computer

In the normal operations there are very few messages from PC-> teensy and they are all in the setup phase and not in the running phase. During running the signals, the only communication is 10 byte messages from teensy to computer.


Also this is in linux, more specifically ubuntu 20.04 LTS
 
Any chance you could write the measuring test code in "C" for the PC instead of python? Python has shown problems with throughput speed - and it may be affecting the results found.

Also, the baud of 9600 is a bit short of the Teensy 4.1's 480,000,000 - not sure if python uses that value when presenting the data. The Teensy will hardware connect at max speed - but the software may be rate limiting data presented.

No, Teensy doesn't look/act like FTDI. @KurtE perhaps made notes as the p#1 made mention of them in case one was involved.

The python script is a one-time dummy thing I had on me at the moment. The actual PC interface is in C++ and looks more like this code snippet from the original post
Code:
while(True){
     if (serial_conn_->available()) {
         if (first_iteration) {
             first_iteration = false;
         } else {
             profiler_.stop("serialRate");
         }
         serial_conn_->read(buffer, expected_size_);
         profiler_.start("serialRate");
    }
}
of course this is the simplified form. Adding a timing element in here also shows intermittent periods of 1-5ms delay.

Regarding baud rate that's true, in hindsight the python script is perhaps more misleading than anything else. Though, when establishing the connection (in C or python) does my setting the baud rate actually make a difference? Considering the teensy ignores the argument of Serial.begin().

I can't imagine it's actually 9600 baud, because that should be much slower (10*8) / (9600) => 8,333333333 ms. So this should be significantly slower.
 
When you say

do you mean the latency suffered when using an OutputFlush() on a non FTDI device, or that latency in general suffered with non FTDI devices?

Please correct me if I'm wrong, but the teensy 4.1 (and teensys in general) are non FTDI devices, right? I figure this is the case because (as far as I understand) FTDI devices are put in /dev/ttyUSBx ports instead of ACM. Is there a way to interpret the teensy as a FTDI device? I guess no because it's missing the hardware?
I am saying in particular the logical output flush (tcdrain) on the linux machine, mainly SBC like RPI, or UP, would stay in this function for an extended period of time.

So my setup code to talk to the Dynamixel servos, had code that would check to see if the device was a /dev/ttyUSBnn
and if so set to use tcdrain and if not, to not use it.

On the Teensy side, I always used Serial.flush() for these types of packets.
Which does shorten latency, but as with most things it is a trade off.

That is, while it reduces latency, it also reduces throughput. That is if your packets are something like 10 bytes of data per your packet, where each packet could be transferring 512 bytes assuming you can talk at high speed).
 
I'm not flushing on the PC side, except for immediately after making the connection.

I can try flushing on the teensy side, but from what I read on https://www.pjrc.com/teensy/td_serial.html it sounds like it does the same as Serial.send_now(). Is there a meaningful difference?



Edit: some more information from my side. The extent of information to be sent it 10bytes at 200Hz. I can probably reduce it to 8 bytes if that makes it more manageable. But at 200hz with, ideally absolutely no more than 2ish ms delay. But from what I understand the problem may lie on the computer, and not with the teensy?
 
Last edited:
This was perhaps misworded from my side. The way it works is
1. Generate the signal
2. package the timestamp
3. send via usb to computer

In msg #1, you showed 2 programs for Teensy. The first does not transmit anything to the PC, other than "RTC problems".

The second program has this code, which waits for the PC to send a message, then transmits the data.

Code:
    while (Serial.available() <= 0)
      ;
    int c = Serial.read();
    buffer[buf_pos] = c;
    if (c == 13 || c == 10) {
      Serial.write(buffer, buf_pos + 1);
      Serial.send_now();
      break;
    }

Maybe your actual application works has you have just said, but my comment about the overall timing design was based on the code you showed.
 
I can try flushing on the teensy side .... it sounds like it does the same as Serial.send_now(). Is there a meaningful difference?

For USB Serial, both flush() and send_now() run exactly the same code.

Serial.flush() is supposed to wait for the data to actually complete transmission over the wire. For Hardware Serial is does indeed wait until the final stop bit of the last byte is actually transmitted. For USB Serial, is the same as send_now().

Serial.send_now() only gives custody of the data to the hardware, so the hardware will transmit it as soon as possible. But Serial.send_now() does not wait for the hardware to actually complete the transmission. It returns as soon as there is no more work to do to cause the hardware to transmit the data as soon as it can.

Arduino's API documentation says Serial.flush() "Waits for the transmission of outgoing serial data to complete". So technically the behavior we have today where Serial.flush() is the same as Serial.send_now() could be considered a bug. But implementing the wait as described would be quite difficult, and so far nobody has really cared. Perhaps in future software Serial.flush() may be changed to include waiting for the hardware to actually complete transmission. But with so many other far more important software issues pending, seems unlikely Serial.flush() will change anytime soon.


The extent of information to be sent it 10bytes at 200Hz. I can probably reduce it to 8 bytes if that makes it more manageable.

When running at 480 Mbit/sec, the packet size is 512 bytes. So you will not see any significant difference for any size up to 511 bytes. A full 512 byte packet actually requires a 2nd zero length packet.

480 Mbit/sec is very fast. After the USB protocol overhead, 512 bytes takes only 8.5 us. The protocol overhead is the same for any size packet. So you really won't save any meaningful amount by shaving just 2 bytes from your message.
 
I think this was the misunderstanding, what you quote is some silly python script I found to "estimate" the latency.

The code to demonstrate what I'm trying to troubleshoot is the first code block, which starts with
Code:
#include <Arduino.h>
#include <TeensyTimerTool.h>

const uint8_t _pin_ = 1;  // output pin
IntervalTimer g_trigger_timer;
TeensyTimerTool::OneShotTimer g_one_shot_timer;

elapsedMicros g_micros_since_second;
volatile uint32_t g_seconds;

void periodicCallback()
{
  // get current time [seconds, micros since second]

  // set trigger pin HIGH
  digitalWriteFast(_pin_, HIGH);
  // set one shot timer to high time (10us)
  // g_one_shot_timer.trigger(10)

  // pack information into message
  // [seconds, micros, count]

  // Send message (10 bytes)
  // Serial.write(msg, 10);
  // Serial.send_now();
}

and the basic gist of the PC side c++ script is

Code:
while(True){
     if (serial_conn_->available()) {
         if (first_iteration) {
             first_iteration = false;
         } else {
             profiler_.stop("serialRate");
         }
         serial_conn_->read(buffer, expected_size_);
         profiler_.start("serialRate");
    }
}

In hindsight the python script is more confusing than anything else, disregard the section titled "Testing USB comm latency", it doesn't really matter beyond the shallow debugging it provides on the usb connection.

In msg #1, you showed 2 programs for Teensy. The first does not transmit anything to the PC, other than "RTC problems".
The intent with the first piece was just to demonstrate the idea without posting all of it. If you think it would help I can attach the files, but they require hardware to be run (RTC).
 
When running at 480 Mbit/sec, the packet size is 512 bytes. So you will not see any significant difference for any size up to 511 bytes. A full 512 byte packet actually requires a 2nd zero length packet.
Is it expected that there's enough overhead on either side such that 8.5 us becomes >5ms? That seems like a drastic step.
 
The actual USB communication happens with consistent timing.

But once the USB host controller in your PC has received the data into memory, it needs to generate an interrupt. Most host controllers have a feature, which may or may not be in use, to wait a short time in case other transfers complete. PCs have high interrupt overhead, so it's much more efficient to process multiple completed transfers with a single interrupt.

The interrupt causes the USB host controller device driver to run, which handles the USB hardware. It in turn causes the Serial driver to run, which might happen immediately, or could occur later. When the Serial driver runs, it takes steps to cause the program waiting on data to be signaled that bytes can now be read.

Some time later the operating system will schedule the user mode program which is waiting for data to run. This is the long and highly variable part. On Windows while running in a lower power mode, it can be up to 16 ms. Linux and MacOS have other timing. None of them are designed for real time scheduling.
 
So then if we assume that the teensy has no room for improvement (10 bytes at 200 hz with Serial.send_now()) then is there really anything possible to do on the linux machine, aside from switching to a real-time OS?



Regarding the previous comments I've attached the source code.
View attachment forum.zip
 
Back
Top