PC/Teensy API for low-latency "packages" over USB serial communication



I make haptic devices that requires very low latency (<0.3ms or 3kHz+), but also very small packages of data, often < 64 bytes.
Normal APIs, especially in Windows, usually don't get the settings right to reach the low latency needed. Teensy 4 of course is
fundamentally awesome and with the latency_test.zip shared in this thread and earlier: https://forum.pjrc.com/index.php?threads/usb-to-digital-i-o-delay.7826/
we can manage to reach this goal, even faster.

So what I have done is to use that test code as basis for my own implementation, with #ifdefs for windows, mac and linux. And it works. Fast too.
But my code is far from being clean, and it has some sections for handling corner cases that I am not proud of. It includes blocking and non-blocking, fallback if a message was not read completely etc. There is also the topic of identifying the COM port (recently discussed in another thread) and the topic of try send/receive as part of initializing the device in an end-user application.

The key I think is that I fundamentally don't want to read/write serial, I want to send/receive messages (frames of data). I cannot use HID or similar because of <1ms requirement. Hence this approach of serializing my small packages. I also use raw ascii as encoding which is suboptimal (but human readable).

Now the question: is there any super-fast cross-platform serial API and corresponding Teensy lib for low-latency package communication, or any work in that direction? If not, is there any interest for such?

For reference I attach part of my code (can add more if interested).

My "packages":
  constexpr int model_a = 1;
  constexpr int model_b = 2;
  struct device_to_pc_message {
    int model{1};
    int enc[6] = {0,0,0,0,0,0};
    int error_code{0};
    // Returns number of characters, also writes a trailing \0
    int toChars(char *c) const {
      return sprintf(c, "[%d,%d,%d,%d,%d,%d,%d,%d]\n",
      model, enc[0], enc[1], enc[2], enc[3], enc[4], enc[5],  error_code);

    // Returns 0 if success, 1 if fail
    int fromChars(const char *c){
      return 8 != sscanf(c, "[%d,%d,%d,%d,%d,%d,%d,%d]",
      &model, &enc[0], &enc[1], &enc[2], &enc[3], &enc[4], &enc[5], &error_code);

struct pc_to_device_message {
    int ma[3] = {0,0,0};          // milliamps per motor
    // Returns number of characters, also writes a trailing \0
    int toChars(char *c) const {
      return sprintf(c, "[%d,%d,%d]\n", ma[0], ma[1], ma[2]);
    // Returns 1 if success, 0 if fail
    int fromChars(const char *c){
      return 3 == sscanf(c, "[%d,%d,%d]", &ma[0], &ma[1], &ma[2]);

Teensy side:
// returns 0 if success, error code else
// start_pos is where [ is located
int await_message(size_t& start_pos){
  bool start_found = false;
  int comma_count = 0;
  for (size_t buf_pos = 0; buf_pos < buf_len; buf_pos++) {
    while (Serial.available() <= 0){
      // wait

    char c = Serial.read();
    in_buf[buf_pos] = c;
    if (c == ']'){
        return 256; // no start character before end character
      if(comma_count != 2)
        return 512; // wrong legth of array, should be [0,0,0]

      // Do we still have waiting messages? Discard and report error
        c = Serial.read();
        if(c!='\n' && c!='\r' && c!='\0') 
          return 8; // trailing characters in message

      return 0;
    if (c == '['){
      start_pos = buf_pos;
    if (c == ','){
  return 128; // buffer full

void loop() {

  // receive from serial
  size_t start_pos;
  int receive_error = await_message(start_pos);


  // parse
  pc_to_device_message pc_msg;
  if(pc_msg.fromChars(in_buf+start_pos)==0) {
    return_last_values(1024); // Parse error
  // do stuff  

  device_to_pc_message out_msg;
  // fill out_msg with data..

  int len = out_msg.toChars(out_buf);
  Serial.write(out_buf, len);
I suggest you take a look at the libraries EasyTransfer and SerialTransfer. They are not cross-platform, but they are simple enough that you could implement either one on the PC side. They both have they same purpose, which is to send/receive "packets" containing some "payload". Your packages would be the payload.
I have had a similar problem to yourself, receiving temperature and humidity readings from multiple ESP32Cs by a single ESP32C and transmitting to Teensy 4.1 via UART for storage for future analysis.

At first I used Text transmission, like yourself, but got fed up of converting back to real data. So I started putting the data into a struct and sending that struct as a binary entity.

Obviously there is the problem of synchronisation between both ends.

I came up with the idea of embedding an identifier at the start of the data. You can then look to see if the identifier is valid.

Using a byte as an identifier gives a 256:1 chance of not being valid data, whilst a uint16_t gives a 65k:1 chance and a 32 bit identifier gives an even larger unlikely chance of being knocked out by bad data. Speed was not a problem for me so I used a 32bit identifier
const unit32_t identifier = 57767959;

Store your data in something like this

#pragma pack(push,1)

const uint16_t identifier = 64959;

union identifierUnion {
    uint16_t Identifier;
    char     identTxt[sizeof(uint16_t)];            // allows individual elements (first in our case) of the identifier to be picked out

identifierUnion identMatch;

typedef struct teensyMsgType {                      // The data structure
    uint16_t ident = identifier;
    uint8_t  locationId;
    uint16_t temp;
    uint16_t humidity;
} teensyMsgType;

struct uniMsgType {
    union {
        teensyMsgType d;
        char          txt[sizeof(teensyMsgType)];

uniMsgType recvMsg;
#pragma pack(pop)

The Send Routine is exreamly simple
void Send(const teensyMsgType* msg){
    Serial1.write((const char*)msg, sizeof(teensyMsgType));

When sending the data don't forget to put the identifier in the msg.
    teensyMsg.ident = identifier;
The receive Routine is rather more complicated as we need to ensure synchronisation.
Here is a Receive Routine for the raw data, at this stage not necessarily synchronised.
I have shown the synchronisation part as part of the loop. It could easily be it's own separate routine.

bool receive(uniMsgType* msg)        // Get a chunk of raw data
    return (Serial1.readBytes((char*)msg, sizeof(uniMsgType)) == sizeof(uniMsgType));

void loop()
    while (Serial1.available() && (Serial1.peek() != identMatch.identTxt[0])) {
        Serial.print(Serial1.read());           // Just do something with the 1st byte...keep on going until it lookes like a synch
    }                                           // could put a timeout

    while (Serial1.available() >= (int)sizeof(uniMsgType)) {  // wait for the remainder of the data

        if (receive(&recvMsg)) {                              // get the data packet
            if (recvMsg.d.ident == identifier) {           
                GotDataNowDoSomethinhWithIt();                // ok the identifier matchwes do something with the data
            } else
                Serial.println("Bad data, will get back into synch");   // although the 1st byte matched the second did not
            }                                                           // so go round again.
        } else Serial.println("Unable to Receive Data");



Obviously this code could be MUCH improved, it throws away potentially valid data until it guarantees a synch.

I just give this as an idea for what you could do. Something similar to this works admirably well for the collection
of remote data into one on my system.
Last edited:
Thanks for the replies! My main trouble is the PC side actually, especially initialization and maintaing a stable fast loop.
My PC loop looks like (C++):

send data; // to receive first package
loop {
    wait and receieve data; (blocking)
    compute response;
    send response data;

It looks simple but it gets hairy. You have to send the right amount of data at the same time so it gets in the same USB frame (I presume). Like, if you send < 64 bytes I have noticed that speed drops for some reason. What do you use on the PC side? Completely written from scratch?
This general approach, where software on the PC sends a command to Teensy and software on Teensy replies to each command, usually (pretty much always) results in poor performance. The main problem is PC operating systems are not designed for hard real time tasks. Usually your program will run within 1ms of the desired moment, but operating system scheduling latency can be many milliseconds. The device drivers are also far from ideal. It can all add up to a lot of uncertainty and variability in timing. It's simply not a good design.

A better approach is to control the timing on the Teensy side, where code on Teensy always transmits a new message on the desired schedule regardless of whether the PC has sent anything. If timing is important, you would probably embed timing details or sequence numbers or other info into each message. Then on the PC side, you don't need to transmit anything at all. You can just receive the messages. You can decode the timestamps or sequence numbers if you want to handle errors or have confirmation of the timing.

On the Teensy side, you can use Serial.flush() or Serial.send_now() after transmitting all the bytes of a message. These functions make the data ready to transmit sooner, but can result in partially filled USB packets which make less efficient use of USB bandwidth. As a general rule, you probably should avoid calling these functions more than 1000 times per second if running at 12 Mbit speed, or 8000 per second if running at 480 Mbit speed. If your total data rate is high, you probably want your message to sit in a partially filled buffer, as the next message will be transmitted very soon anyway. The key is collecting accurate timing info to put into the messages, so your code on the PC side gets high quality timing data when it's all received and decoded.
In my case I do need to respond, maybe not to every transmission from Teensy but I need the most recent reading. It do have benefits since my haptic device can also be used as a pure input device, although most often it is used bidirectionally.

If I make it asynchronous and let Teensy send continously I guess I (or the OS) have to handle a buffer. Is this a problem? What happens if Teensy spams PC with readings that are not read? If I send faster than PC can receive?

Would you transmitt to Teensy asynchronous from the readings, in another thread? And e.g. use Threads on Teensy to read-handle respectively transmit new values.