teensy 4.1 coding guidance for USB serial communication

samm_flynn

Active member
I’m using a Teensy 4.1 and need to transfer large amounts of data in a duplex manner. I have a few questions:

1. Performance of Serial.send_now()

I noticed that using Serial.send_now() reduces my data transfer speed. Specifically:

  • Without send_now(): 20.97 MB/s
  • With send_now(): 10.26 MB/s
  • Chunk size = 512 bytes, payload size = 293,472 bytes
Here’s my function for sending data from Teensy to a Python program on PC:
C++:
void sendBinaryPayload(uint8_t *payload, size_t payloadSize, size_t chunkSize) {
  size_t bytesSent = 0;
  while (bytesSent < payloadSize) {
      size_t currentChunkSize = min(chunkSize, payloadSize - bytesSent);
      Serial.write(&payload[bytesSent], currentChunkSize);
// without send_now ->Data Transfer Speed: 20.97 MB/s.Payload Size : 293472 bytes | Time taken 0.0133486s. totalBytesRead=293472
      // Serial.send_now();
// Data Transfer Speed: 10.26 MB/s. Payload Size : 293472 bytes | Time taken 0.0272901s. totalBytesRead=293472
      bytesSent += currentChunkSize;
  }
}
Is it normal for send_now() to be slower speed when writing full 512-byte chunks, or is there something wrong with my code?

2. USB Speed Limitations

The Teensy 4.1 supports USB 2.0 High-Speed (480 Mbps = 60 MB/s), but I never reach anywhere close to that.
My best observed speeds:
PC -> Teensy: 7.38 MB/s (using PySerial)
Teensy -> PC: 20.97 MB/s (without send_now())
I’m running this on a high-performance gaming laptop with two separate processes (not threads) for communication.
Could Windows overhead be the bottleneck here?
(I'm not necessarily looking for higher speeds—just curious why it’s capped so much lower than the theoretical max. I am using
BAUD_RATE = 4608000 in python.)

3. Structured Data Over Serial

I need to send and receive C-style structs over serial between PC and Teensy.
Are there any existing frameworks or libraries that handle this nicely?
I’m currently implementing my own data layer, but it’s time-consuming. Ideally, I’d like a library that:
-Fragments & reassembles large data packets (e.g., SQN numbers, CRC, etc. all the data layer black magic.)
-Handles error detection.

Does anything like this exist for Teensy/Arduino?


4.I need to send and recieve Structs over Serial from PC to teensy . Is there already any framework that does this nicely.

- Is this fantastic library by @Frank B hardware accelerated? I want to use it for CRC32 calculation.

Apologies in advance if there is any mistake in my calculations. Not very good at maths.

Thanks in Advance.
 

Attachments

  • binaryParser.h
    3.9 KB · Views: 13
  • headerBytes.h
    1.5 KB · Views: 11
  • main.cpp
    1.1 KB · Views: 9
  • textParser.h
    108 bytes · Views: 10
  • utils.h
    829 bytes · Views: 8
Hi,

disclaimer: I've started working on my first Teensy project a month ago, but I encountered some of the problems you mentioned.

The Serial.send_now() increases the number of USB transactions and I've experienced that very much limits performance.

For structured data transfer I've found this: https://github.com/PowerBroker2/pySerialTransfer. I ended up writing my own logic as I was already mostly done with it by the time I've found the library, but it seems nice enough.

Good luck!
 
Hi,

disclaimer: I've started working on my first Teensy project a month ago, but I encountered some of the problems you mentioned.

The Serial.send_now() increases the number of USB transactions and I've experienced that very much limits performance.

For structured data transfer I've found this: https://github.com/PowerBroker2/pySerialTransfer. I ended up writing my own logic as I was already mostly done with it by the time I've found the library, but it seems nice enough.

Good luck!
For USB Serial, I was transferring large chunks, later I found, If your buffer is entirely full, send_now works pretty well, it depends , how you stage the transfer.

I loved SerialTransfer, but one limitation for me was the max buffer size, I might rewrite the lib or something similar bare bones for my use case, I am sending float arrays, that fills the entire RAM2.

But unfortunately, with a control background and not any sort of serious programming, I am still researching serial protocols.

If I someone could make the teensy 4.1 teensy DCP, that would have been so nice.
 
There are tradeoffs for when to use send_now() function (which is the same as flush())

That is, you are trading off latency for throughput.

For example, if you have code, that sends something to the PC and then waits for a response then the flush() is very useful, as to not have
the Teensy sitting there waiting for the timeout to happen, that will then cause it to send a partial buffer to the PC.

Other times I use it, is sometimes in debug outputs, to make sure the output is sent before some code runs that might hard hang or crash.
There have been many times I have had debug code something like:
Code:
Serial.println("Before step 1");
do_step1();
Serial.println("Before step 2");
do_step(2);
Serial.println("Before step ");
do_step(3);
...
And maybe the Before step 1 prints, but none of the others.
So I spend a lot of time looking at what step_1() does... But turns out it actually crashed in do_step(3), but the
debug messages were sitting in the USB buffers and never output...

If however, you are wanting high throughput, you should avoid or at least minimize these calls.
 

2. USB Speed Limitations

The Teensy 4.1 supports USB 2.0 High-Speed (480 Mbps = 60 MB/s), but I never reach anywhere close to that.
My best observed speeds:
PC -> Teensy: 7.38 MB/s (using PySerial)
Teensy -> PC: 20.97 MB/s (without send_now())
I’m running this on a high-performance gaming laptop with two separate processes (not threads) for communication.
Could Windows overhead be the bottleneck here?
(I'm not necessarily looking for higher speeds—just curious why it’s capped so much lower than the theoretical max. I am using
BAUD_RATE = 4608000 in python.)

The general rule of thumb is to expect at least 10 bits per byte. On true serial this is certainly true as you will have at least a start and stop bit so sending 8 bits requires at least 10 bits. But, USB also has overhead. There are potentially stuffing bits (like CAN) but also headers. So, these things add up. Also, you are being polled by the PC so you only have as much time on the bus as the PC is willing to give you. You can't monopolize the whole bus.

TLDR, You can't just send 480 megabits of traffic any time you want. Being able to send over 20 megabytes per second is very good. You should not expect much more out of USB2. If you need to go faster you'd likely need USB3 as well as a different processor capable of USB3 connections.
 
At this point I am quite content on how the USB Serial Transfers data, I had a new question, is there any way to trigger an interrupt based on USBSerial connection state? currently my code polls Serial.dtr(), was wondering if there is a cleaner way in teensy 4.1
 
PC -> Teensy: 7.38 MB/s (using PySerial)
My guess is that you could get more throughput with other things than python. At least my experiences with python,
on other platforms, such as RPI, I found it less than optimal. I have not used it much on the PC, other than in a few cases like,
writing High Level Analyzer for Saleae Logic Analyzer... But it is for sure not my native language.

Another option would be to maybe use something like RawHID, although the current released implementation only allows you to send 64 byte packets. I have/had a version that could send/receive 512 byte packets, which had better throughput, but so far this has not been pulled in.
 
My guess is that you could get more throughput with other things than python. At least my experiences with python,
on other platforms, such as RPI, I found it less than optimal. I have not used it much on the PC, other than in a few cases like,
writing High Level Analyzer for Saleae Logic Analyzer... But it is for sure not my native language.

Another option would be to maybe use something like RawHID, although the current released implementation only allows you to send 64 byte packets. I have/had a version that could send/receive 512 byte packets, which had better throughput, but so far this has not been pulled in.
Thanks for sharing, will keep it in mind
 
Back
Top