Teensy 4.1 serial USB crash

I'm working on a project with OctoWS2811 and a Teensy 4.1. I have a LED matrix running using 32 separate pins as outputs, each with a strip length of 300 LEDs.
I'm sending packets of data over USB to the Teensy from a Python script on a Raspberry Pi in the following format:
{led number low byte, led number high byte, red, green, blue}

I'm using the follow to display the packets received:

Code:
#include <OctoWS2811.h>
#define BUFFER_LEN 50000
#define NUM_LEDS 300
char packetBuffer[BUFFER_LEN];
const int numPins = 32;
byte pinList[numPins] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 24, 25, 26, 27, 28, 22, 21, 20, 19, 18, 17, 16, 15, 41, 40, 39, 38, 37, 36, 35, 34 };
const int ledsPerStrip = NUM_LEDS;

const int bytesPerLED = 3;  // change to 4 if using RGBW
DMAMEM int displayMemory[ledsPerStrip * numPins * bytesPerLED / 4];
int drawingMemory[ledsPerStrip * numPins * bytesPerLED / 4];
const int config = WS2811_GRB | WS2811_800kHz;
OctoWS2811 leds(ledsPerStrip, displayMemory, drawingMemory, config, numPins, pinList);
int totalLeds = 0;
long lastTime;
bool ledOn;

uint16_t N = 0;
bool l = true;
int frame = 0;
void setup() {
  Serial.begin(9600);  
  leds.begin();
  leds.show();
  pinMode(13, OUTPUT);
  totalLeds = ledsPerStrip * numPins;
}
void loop() {
  if (Serial.available() > 0) {
    int len = Serial.readBytes(packetBuffer, BUFFER_LEN);
    for (int i = 0; i < len; i += 5) {
      packetBuffer[len] = 0;
      N = ((packetBuffer[i] << 8) + packetBuffer[i + 1]);
      if (N < totalLeds) {
        leds.setPixel(N, (uint8_t)packetBuffer[i + 2], (uint8_t)packetBuffer[i + 3], (uint8_t)packetBuffer[i + 4]);
      }
      if (N == 0) {
        leds.show();
        frame++;
        if (frame == 30) {
          digitalWrite(13, l);
          l = !l;
          frame = 0;
        }
      }
    }
  }
}

Each frame is 5 * 32 * 260 bytes, so around 42kb (at 30 fps aroud 1248kb per second).

The problem I'm having is that the serial receiving element of the system always seems to hang after some amount of time. I.e. the loop() continues to run, but no longer receives any data. On the Python script, the sending serial port will no longer accept any data, and just hangs when trying to write to it.

So I'm assuming some sort of low level crash where a buffer somewhere is being overwhelmed. Here's what I've tried without success:

Splitting each packet in two on the sending end
Sending each 5 byte group individually (too slow)
Introducing delays between sending packets (these end up slowing everything down too much)
Getting the Teensy to send a serial confirmation after every frame, and the Python script waiting for this to continue (this made everything incredibly slow)
Trying different BUFFER_LEN ranging from 100 to 50000
Increasing CDC_RX_SIZE_480 to 1024 in packages\teensy\hardware\avr\1.59.0\cores\teensy4\usb_desc.h
Different USB ports on Raspberry Pi
Different USB cables

I'm not really sure what else to try apart from just finding a way to reset the Teensy when the serial stops working (e.g. using GPIO pin, watchdog or cycling power to Teensy). Or whether this is just a totally unrealistic data rate to expect over USB and I should consider some other way of getting the data to the Teensy.
 
If you have a 2nd T4.1, can you create a sketch to send the same data from T4.1 #1 thru its host port to the USB port on T4.1 #2 to see if it's really the receiver that is hanging up ?? Based upon other problem posts in this forum, I'd lean towards the possibility that your problem is likely at the python end.

Mark J Culross
KD5RXT
 
Thanks for the tip Mark! Yes, good idea - I wrote a very basic test in C++ instead or Python to test, and had the same problem unfortunately on both Raspberry Pi and x64 platforms when I tested it.

The problem seems to be caused by the leds.show(), when this is commented out there's no problem. I also tried waiting until leds.busy() was false, but this also didn't help.
 
I would not expect problems with this amount of data over USB. However, cable ethernet could be more reliable which I would prefer.

The problem I'm having is that the serial receiving element of the system always seems to hang after some amount of time. I.e. the loop() continues to run, but no longer receives any data. On the Python script, the sending serial port will no longer accept any data, and just hangs when trying to write to it.

So I'm assuming some sort of low level crash where a buffer somewhere is being overwhelmed. ...
As I understand the USB port from the Teensy 4.1 is connected to the Raspberry Pi. So how exactly can you see what exactly is happening at the Teensy? Or do you just see that the LEDs do not get updated anymore?

I was looking at your code for some time now and don't see a problem of an index of an array getting larger than it's size. ( a mistake which I make from time to time and can cause some hard to find freezing...).

How do you make sure that, e.g. when 6 bytes (not a multiple of 5...) are in the current buffer that there is no shift in the data which will be corrupted? Is it possible that there is a moment when Serial.readBytes... is called that one or more additional bytes are ignored and a shift of the data is happening in the next round when Serial.readBytes is called? Depending on the data this could have different effects when N is always zero or never gets zero...
 
I would strongly recommend reading 5 bytes at a time (one complete packet) rather than reading as much as possible and then parsing it in chunks of 5. Otherwise there's likely to be a split packet due to usb transfer boundaries that will throw things out of sync.
e.g.:
Code:
while (Serial.available() >= 5) {
    Serial.readBytes(packetBuffer, 5);
    // process packet in packetBuffer here
}

(There's a bug in the current code: if Serial.readBytes() fills the entire buffer, "packetBuffer[len] = 0;" will write beyond the end of it.)
 
Last edited:
As I understand the USB port from the Teensy 4.1 is connected to the Raspberry Pi. So how exactly can you see what exactly is happening at the Teensy? Or do you just see that the LEDs do not get updated anymore?
In terms of the Teensy, I know the loop is still running as I when I implement a 1 second blink in the loop, the LED carries on blinking, even after the USB dies. I also know that no data is being received as the USB serial port on the Raspberry Pi side basically closes itself and refuses to take any more data until I unplug and then plug the Teensy from USB (this happens both in Python and C++ version).

I've implemented a watchdog timer from here and it does work to bring the USB serial port back up after it dies, and then the Raspberry Pi can reconnect and carry on, but not an ideal solution!
 
I would strongly recommend reading 5 bytes at a time (one complete packet) rather than reading as much as possible and then parsing it in chunks of 5. Otherwise there's likely to be a split packet due to usb transfer boundaries that will throw things out of sync.
Thanks for the suggestion. I gave it a go and implemented as you suggested, without a buffer or loop but unfortunately it still seems to have the same problem!

Yes it's a good point about loosing sync. I did wonder about that and that's why I put the check than N is an acceptable number. However for the rest of the numbers it shouldn't matter if they are completely out of sync - it will just produce garbage output on the LEDs, but I don't think it should crash the USB serial in this manner. Unless it causes problems if multiple calls are made by setPixel to the same pixel before the led.show(), but I had a quick look at the OctoWS code, and not sure that should cause a problem?
 
I wrote a very basic test in C++ instead or Python to test, and had the same problem unfortunately on both Raspberry Pi and x64 platforms when I tested it.

Can you show us the PC side transmitting code?

Is this C++ code and Python code fairly simple to compile and run? Or perhaps you could create a simple version which just sends fixed patterns? Without the transmitting code, we can only read your Arduino sketch and guess. Both sides are needed to try actually reproducing the problem.
 
Is this C++ code and Python code fairly simple to compile and run? Or perhaps you could create a simple version which just sends fixed patterns?
I've attached the C++ code which just sends fixed patterns. It's quite intense as it doesn't attempt to limit the data rate (the actual data rate is much lower in practice), but it quickly demonstrates the USB serial lockup I've been having.

Thanks for the help!
 

Attachments

  • sw.cpp
    2.6 KB · Views: 10
  • led_teensy_test.ino
    2.4 KB · Views: 11
Was wondering if anyone had any thoughts on the above? I've managed to improve the situation a bit in terms of only calling led.show() after a third of the LEDs have been updated, but it still hangs roughly every hour. I can also provide a Python demo script if the C++ one above isn't easily accessible.
 
Was wondering if anyone had any thoughts on the above? I've managed to improve the situation a bit in terms of only calling led.show() after a third of the LEDs have been updated, but it still hangs roughly every hour. I can also provide a Python demo script if the C++ one above isn't easily accessible.
I'm not familiar with the Octo-Leds, but I have done a lot of high-speed comms between PC and Teensy. In one of your earlier posts you noted that the problems disappeared if you commented out leds.show(). For me, that immediately raised the question: How long does it take for leds.show() to execute? If it takes longer than the time for the PC to send more bytes than the serial buffer size, you are in trouble.

My approach to debugging would be to connect an oscilloscope and set a test pin high before leds.show() and clear it afterward(). The width of the pulse will tell you how long it takes to execute leds.show() and the time between pulses will show you how regular the timing is in the reception of data.

Lacking an oscilloscope, you can occasionally capture micros() before and after leds.show() and display the difference.

It's important to remember that processes in the pc that are not part of your program can temporarily back up USB transfers to the Teensy. When the PC decides to catch up, it may send a bunch of packets at once, over running the Teensy buffer. That's one of the things that might show up as an unusually long interval between calls to led.show(). It's important to remember that, on a high-speed USB link, a burst of delayed data can arrive at more than 8KB per millisecond.

You might also keep track of the maximum value you get with Serial.available(). If that number ever gets above about half the serial buffer size, you may have a problem.
 
In WS2811-and-Octo-land, the strip is likely to operate at 800kHz, 1.25us per bit. There are 24 bits per pixel and maybe 300us between frames. This means that for N pixels in the longest strip, each frame is about N*24*1.25 + 300 microseconds. This is all done in the background via DMA. This value is the same for 1 strip and for 32 strips because parallel.
 
In WS2811-and-Octo-land, the strip is likely to operate at 800kHz, 1.25us per bit. There are 24 bits per pixel and maybe 300us between frames. This means that for N pixels in the longest strip, each frame is about N*24*1.25 + 300 microseconds. This is all done in the background via DMA. This value is the same for 1 strip and for 32 strips because parallel.
OK,, since the OP specified 300 LEDs, it seems that each update is going to take about 9 milliseconds. Since the show() function starts with

while (update_in_progress) ;

If you have multiple frames of data, after the first call, each subsequent call to show() is going to take about 9mSec. If you're getting bursty data from the PC, the original loop() function can only process one frame of 42KB every 9mSec.
 
At the moment, I'm only transmitting data at around 20fps, meaning each frame has 50ms, so should be well within the display time if the update is 9ms for show(), however it still crashes every 20 mins or so. In terms of the 9msec limit - I've also had it working at 100 fps with no problems so is definitely possible (apart from the inevitable crashses!)

I will try to profile exactly how long the show() does actually take in reality. And will also have a look at the state of the buffer at various points to see if it is filling up.
 
To be clear: show() doesn’t wait for the whole frame to finish. It simply ensures the DMA stuff has what it needs. It will pause, however, until the previous frame has finished, if it’s called before the frame is done. You can check this with a busy() call on the driver.
 
At the moment, I'm only transmitting data at around 20fps, meaning each frame has 50ms, so should be well within the display time if the update is 9ms for show(), however it still crashes every 20 mins or so. In terms of the 9msec limit - I've also had it working at 100 fps with no problems so is definitely possible (apart from the inevitable crashses!)

I will try to profile exactly how long the show() does actually take in reality. And will also have a look at the state of the buffer at various points to see if it is filling up.
You may think your PC program is only sending one frame every 50mSec, but are they actually arriving at the Teensy 50mSec apart? A 42-KB frame can be sent to the T4.1 in about 3milliSeconds if the PC really wants to pack the bytes into high-speed microframes! There's a lot going on under the hood in Windows and Linux USB transfers. Those operating systems often have very large USB buffers to handle delays such as OneDrive sync updates over WiFi or ethernet.
 
I worked through a large set of possible options, but unfortunately haven't found a perfect solution with the USB. But thought I'd put here some findings.

The leds.show() does indeed take 9ms in practice - and this is what is causing the USB crashes. If I comment it out the crashes stop, or if I replace it with delay(9) it also causes crashes, so it's the delay rather than anything to do with the Octo library.

I've chopped up sending the packets from the computer in various different ways with no luck, as suggested above, there's some sort of burstiness that's happening that's overwhelming some low-level buffer in the USB handling. I suppose I was curious if there was anything that could be done about this lower level crash (e.g. increasing a buffer somewhere in the Teensyduino code), but perhaps it isn't as simple as that.

In the end I reached a sort of double conclusion, firstly I found a batch of settings that meant the crash only happened every 90 mins or so, and let the watchdog just sort the crashes out (main setting that helped was having a BUFFER_LEN as a similar size to the packet, and then looping through the buffer downloaded). I also put in some checks to see that leds.show() wasn't being called too often, and also doing the leds.show() when the last LED was received, rather than the first LED.

But probably the more lasting solution has been to rewrite using Ethernet and UDP - this seems much more stable and never crashes. The maximum datarate I'm acheiving seems significantly lower than USB, but I think it's just about enough to keep up with the framerate (Say 1.8mb/s). This obviously may also just be my implementation that is slow.
 
Just to confirm, I'm running the code from msg #9 on a Teensy 4.1 and the PC side code on Ubuntu 22.04. It runs for about 20 seconds and the orange LED blinks, then it freezes. Is that what you're seeing?
 
Just to confirm, I'm running the code from msg #9 on a Teensy 4.1 and the PC side code on Ubuntu 22.04. It runs for about 20 seconds and the orange LED blinks, then it freezes. Is that what you're seeing?
Yes, that's correct. When the flashing stops is when the USB crash happens. The script is still running, but the serial port is no longer accessible.
 
Captured the USB communication today.

screen.png


It's definitely something going wrong on the Teensy side. What exactly is still a mystery. I'm going to dig deeper soon...
 
Back
Top