Teensy 4.1 is slower than Arduino every with neopixel !?! Please help

Hi,
I have lower speed with teensy 4.1 when send data to leds, compared with arduino every.
for test I use same code (below), on teensy4.1 and on arduino every, led strips are same - 800 leds (SK6812 with mini 3535 smd, 144 led/m), power supply are the same, but when I start test - "three color wheel" (code below) - on teensy 4.1 finish for approx 60sec, on arduino every for approx 40sec!
I measure the time with an external manual stopwatch.

Оn Teensy 4.1 I tried everything I could think of - I tried with different libraries - Fastled, WS2812Serial, octoWS2811 - the time always remains around 60s, I tried different CPU speeds on Teensy - from 150 to 960 Mhz - again no change.
I have no problem with power - I inject 5V on every 40leds! All work fine, but on teensy work slower !!
With Arduino every I don't need to use stepup 3.3V to 5V, on Teensy, I first try with SparkFun Logic Level Converter - Bi-Directional (https://www.sparkfun.com/products/12009), but found information somewhere that they are slow and try with 74AHCT125N - result always is same - around 60sec (the problem apparently not in the logical converters!)
I try on Teensy with different control pins (1, 9, 10...)- same result - 60sec.

Only for info - I try parallel output on Teensy with octoWS2811+FastLed and all works - when I use two outputs time is around 30sec, when use 4 pins - time is around 15sec.
Any ideas?

Code:
#include <Adafruit_NeoPixel.h>
#define PIN 9
Adafruit_NeoPixel strip = Adafruit_NeoPixel(800, PIN, NEO_GRB + NEO_KHZ800);

void setup() {
  strip.begin();
  strip.setBrightness(20);
  strip.show(); // Initialize all pixels to 'off'
}

void loop() {
  colorWipe(strip.Color(255, 0, 0)); // Red
  colorWipe(strip.Color(0, 255, 0)); // Green
  colorWipe(strip.Color(0, 0, 255)); // Blue
}

void colorWipe(uint32_t c) {
  for(uint16_t i=0; i<strip.numPixels(); i++) {
    strip.setPixelColor(i, c);
    strip.show();
  }
}
 
What happens if you change
Code:
for(uint16_t i=0; i<strip.numPixels(); i++) {
to
Code:
for(int i=0; i<800; i++) {
You may be doing some extra 16/32 bit conversions ?
 
It has to do with how NEOPixels work, you might want to take a look through some of the documentation, like:
https://learn.adafruit.com/adafruit-neopixel-uberguide/the-magic-of-neopixels
https://cdn-shop.adafruit.com/datasheets/WS2812B.pdf

The protocol sends data at 800khz. For each pixel it sends 24 bits of data.
So for each frame it has to send 800*24 bits of data.
As long as your processor can output this, it won't matter what speed your processor runs at.

Also between frames there is gap of time between frames. One spec that I included shows at least 50us, but I believe adafruit library has at least 300us.

Each of your ColorWipe calls, output 800 frames of data. So you should be able to do some simple calculations to get an idea of how fast it should run.
 
The WS2812 needs 1.25us per bit, 24 bits per LED, 800 LEDs = 1.25*24*800 = 24000 us = 24 ms.
Unless I'm missing something (it is friday afternoon after all) you should be able to get around 40 updates per second.

It's not going to be CPU power, they are trivial to drive. The only hard part is that the timing requirements are fairly tight.
Without digging into the library code it's hard to say why it's so slow, it shouldn't be.
 
This animation is supposed to take 58.32 seconds.

You have 800 LEDs with 24 bits of data each. Communication speed is 800 kbits/sec. A pause of at least 300 us is used to mark the end of end of a transfer. So each LED update takes:

(800 * 24) / 800e3 + 300e-6 = 0.0243 seconds.

The colorWipe function calls show() 800 times. So the time to run colorWipe() should be:

0.0243 * 800 = 19.44 seconds

The loop() function runs colorWipe() 3 times, so the total time to run the animation is supposed to be:

19.44 * 3 = 58.32 seconds
 
The only explanation that Arduino Every could run this faster is incorrect (faster than 800 kHz) timing which just happens to work with your LEDs. This wouldn't be too surprising, as Arduino Every is a relatively new product and Adafruit doesn't make or sell it, so it's probably had only minimal testing.

I don't recommend running with shorter pulse timing than the official WS2812 / WS2811 specification. It may work with one batch of LEDs and then fail with another. Or even the same LEDs might fail if operated in room or outdoors with higher temperature.

But if you want to achieve faster than the proper timing, you certainly can. Just look for these lines within the Adafruit_NeoPixel library.

Code:
#elif defined(TEENSYDUINO) && (defined(__IMXRT1052__) || defined(__IMXRT1062__))
#define CYCLES_800_T0H  (F_CPU_ACTUAL / 4000000)
#define CYCLES_800_T1H  (F_CPU_ACTUAL / 1250000)
#define CYCLES_800      (F_CPU_ACTUAL /  800000)
#define CYCLES_400_T0H  (F_CPU_ACTUAL / 2000000)
#define CYCLES_400_T1H  (F_CPU_ACTUAL /  833333)
#define CYCLES_400      (F_CPU_ACTUAL /  400000)

Just edit the first 3 numbers to change the 800 kHz waveform timing.
 
I want to thank everyone for your help and fast response!

The code is exactly the same on both - every and teensy. Changing "for(uint16_t i=0; i<strip.numPixels(); i++)" with "for(int i=0; i<800; i++)" is not the problem - I tried it!

This animation is supposed to take 58.32 seconds.

EXACTLY!! All of my tests took betweeen 58 and 59 seconds!

Due to the fact that with arduino every, test runs 50% faster (around 40 sec instead of 60 sec) with no problems - apparently the led strip is doing well - I use SK6812. I will try to adjust the values in the octoWS2811 library..... can you tell me the lines that I need to edit in octoWS2811 to adjust sending data speed to strips?
However I will end up using an octoWS2811+Fastled with 8 parallel pins, which will reduce the time eightfold.
Here comes the question with Тeensy 4.1 and octoWS2811+Fastled can I use 9 pins in parallel at the same time (me need 900 leds to finish project), because I read that the limit is 8 pins?

Is there a way to find out at what speed Every sends the data to led strip - 800mhz or more and how much?

Thanks in advance.
 
I will try to adjust the values in the octoWS2811 library..... can you tell me the lines that I need to edit in octoWS2811 to adjust sending data speed to strips?

First to answer your question, look for these lines in OctoWS2811_imxrt.cpp

Code:
#if defined(__IMXRT1062__)

[COLOR="#FF0000"]#define TH_TL   1.25e-6
#define T0H     0.30e-6
#define T1H     0.75e-6[/COLOR]

But just because you can do a thing does not mean you should.

You're already getting approx 40 Hz refresh rate. That's plenty fast enough for non-POV animations.

That's the last I'm going to say about doing things the proper way... and now, the other questions...


Here comes the question with Тeensy 4.1 and octoWS2811+Fastled can I use 9 pins in parallel at the same time (me need 900 leds to finish project), because I read that the limit is 8 pins?

OctoWS2811 supports use of any number of pins on Teensy 4.1.

But sadly, the FastLED driver which uses OctoWS2811 still only supports exactly 8 pins.


Is there a way to find out at what speed Every sends the data to led strip - 800mhz or more and how much?

Yes, there is a way. The most common way would be to use an oscilloscope or logic analyzer to directly view the waveform.
 
as PaulStoffregen said:
"......because you can do a thing does not mean you should...." - after several tests with different values ​​in the library - the result was not faster performance, but more unstable strip operation.

800mHz - typo, I mean 800kHz.

My project will eventually end up with 1080 leds. I'm still trying to get the maximum speed because I'll be running 54 servos at the same time.
My question is, how will I achieve faster speed - with 29 parallel pins (if possible with Teensy 4.1) * 40 LEDs and octoWS2811 or with 7 parallel pins * 160 LEDs (the last pin will drive 120 because I have 29 * 40 leds in pieces) with octoWS2811+Fastled? (or perhaps Arduino Every with 7 pins * 160leds with fastled)

Thanks in advance for answers.
 
Definitely do not use a blocking library like Adafruit_NeoPixel library if your project will be doing anything else while the LEDs update. You need a non-blocking library like OctoWS2811 or WS2812Serial. Adafruit has non-blocking libraries for some of their boards. Because non-blocking libraries access special hardware, they tend to be written for specific boards and often come with limits about which pins they use. These trade-offs are absolutely worthwhile. A blocking library will (usually) interfere with the Servo library, and even if the signals remain clean, it will restrict the timing your code can use to update the motors. Non-blocking libraries are critically important if your program needs to actually do stuff while the LEDs are updating.

Regarding "trying to get the maximum speed", perhaps the overly simple example code has gotten your project's journey started on the wrong foot? With that sort of simple code, the animation speed is tightly tied to the frame update rate. If you want faster animation, it drives you to seek faster update rates, which is really not a good way.

You should instead choose a specific frame update rate and put your work into achieving it consistently, and then design your animation speed around timing of that stable frame rate. The coding style for animation is completely different. Perhaps we could and should do so much more in the LED library examples to demonstrate this approach...

For example, if you wish to animate a color wipe effect like Adafruit's example, instead of changing 1 LED per loop, you would design the loop to work in units of time, not units of number of LEDs. This is the fundamental difference in approach. Everything is about computing which LEDs should be turned on as a matter of time. You start by choosing how much time you want the color wipe to take. If you want it to occur over 5 seconds (roughly 4X faster than using 1-LED-a-time with proper 800kHz or 3X faster than whatever overclock timing Arduino Nano Every is using), your loop runs from 0 to 5000 milliseconds. If your frame rate is 25 Hz, you would increment the loop by 40 for each LED update. If your frame rate is faster, you would update by a smaller number. But the critically important point is the loop is done with the incrementing variable being time, not a specific LED position. Inside the loop, you compute which LEDs should be illuminated based on the time. If the time is 2500, you turn on half of them. For a fast effect, you will be turning on more than 1 LED per loop. And because you're computing in terms of time, the number you turn on for each loop might not be identical, since you'll round up or down (or perhaps turn 2 LEDs on partially if you get really fancy). No matter what your actual frame rate is, the animation will always take the correct length of time.

Achieving higher speed is still desirable, since it allows smoother animation. But if your code is designed in terms of elapsed time, you don't need faster updates for the animation speed. Faster only gives smoother. As with movies, television and video games, faster refresh rate has diminishing returns, because human perception really isn't very fast. 20 Hz is probably the bare minimum, and above 100 Hz you're probably well into diminishing returns. I'm sure some FPS gamers would beg to differ, and indeed you do need low latency and faster frame rate when animations are triggered by human actions than you would need when people are just passively watching, like movies and television (which normally use fairly low frame rates).

The main point is to design your animation code with the loop working on time, not position. Inside the loop you compute the position based on the time. Virtually all animation is done this way, because it works so well. Using position as the loop variable is only useful for the very simplest things, and those code examples are meant to be very simple so you can learn how the LEDs work, but the downside to such simplicity is it tends to establish a mode of thinking about animation which really isn't very good for anything much more substantial. We (all authors of LED libraries) should probably do better with examples to at least explain this briefly in comments inside those very simple examples, and provide more sophisticated (but hopefully still fairly simple and easy to understand) example that demonstrate animating with the loop based on time rather than position.
 
I addition to what Paul said: The animation speed (speed of transitions) must be independent from the actual update rate of the LEDs (fps).

This is achieved by all control parameters being derivatives of the elapsed time since startup. Meaning they are all calculated based on millis().

This solves multiple challenges at once: Now the animation runs on different processors and different LED setups with the same speed. The animation always runs at a visually constant speed, no matter what else is going on inside the loop and no matter if the actual fps are not constant over time (for example because other I/O operations have to happen sometimes, but not every loop). And by using this approach the animation is always non blocking which makes it easy to combine it with other program modules in one sketch. This includes the possibility to run multiple independent animations at the same time and blending them together if desired.

Some years ago I showed this code as a simple example for 2 independent but both time-dependent animations.

 
One minor issue is blocking libraries like Adafruit_NeoPixel are so blocking they even block millis() from updating.

I do agree with this approach. It just needs to be done with the blocking behavior in mind, if using a blocking library. Like the position-vs-time animation topic, misunderstandings about the speed of Adafruit_NeoPixel have also come up before on this forum because millis() freezes and loses track of accurate time keeping for LED updates more than ~2ms (67 or more LEDs).

I wrote this lengthy message because this question about speeding up colorWipe and in general how to make animations run faster has come up multiple times. It's now on my list of documentation updates to someday write...
 
Hi Jopeto2000.
I run 11 parrallel channels of 300 leds using a teensy 4.1 and the fastled library so get started with your project in confidence :). My current setup I think I can do 14 channels because I'm using a few other pins for other stuff. The solution for fastled on all pins on teensy 4.1 was found on these forums somewhere which I'm sure you will find and involved a few lines of code I don't undersatand.

In my case some animations need to happen on a beat of music, and need to happen as fast as possible as the ws2812 is the limiting factor. eg turn on a sequence of lights without skipping any or having to turn two on a given 'frame'. In my case fastled.show() blocks for 3100us to setup dma or something so there is some blocking.

To keep things simple I find separating your tasks to different microcontrollers just saves headaches. I'm sure you could get servo's and lights done on the 4.1 but maybe just put them on two different microcontrollers and connect them with some form of communication.

Just some ideas anyway.

Gavin.
 
I run 11 parrallel channels of 300 leds using a teensy 4.1 and the fastled library so get started with your project in confidence . My current setup I think I can do 14 channels because I'm using a few other pins for other stuff. The solution for fastled on all pins on teensy 4.1 was found on these forums somewhere which I'm sure you will find and involved a few lines of code I don't undersatand.

Thanks for the idea - I tried with 14 pins in parallel and everything works great with Teensy 4.1. Very thanks!

Definitely do not use a blocking library like Adafruit_NeoPixel library if your project will be doing anything else while the LEDs update. You need a non-blocking library like OctoWS2811 or WS2812Serial.

This is the real truth - it took me a while to get familiar with non-blocking codes and how to use millis(), but now my code is non-blocking with OctoWS2811 + fastled. With blocking code and the library - there was just no way. Thanks for your time and the long but super correct post.

My project now successfully drives 1080 addressable diodes + 54 servos simultaneously fast enough and without problems with just one Teensy 4.1. Thanks to everyone involved in the discussion, as well as your ideas - it was crucial for the project!

But, now I have another problem - T4.1 did not startup correctly when I power on. Sketch did not start. (if USB is connected to PC - all is OK) I took the time and became familiar with the fact that it is good (and necessary) to put a 300ms delay on startup. After all the reading I still couldn't figure out how or where to do this delay. I am using IDE 2.2.1.

Can anyone help with instructions regarding this delay? Thanks in advance!
 
Thanks for the idea - I tried with 14 pins in parallel and everything works great with Teensy 4.1. Very thanks!



This is the real truth - it took me a while to get familiar with non-blocking codes and how to use millis(), but now my code is non-blocking with OctoWS2811 + fastled. With blocking code and the library - there was just no way. Thanks for your time and the long but super correct post.

My project now successfully drives 1080 addressable diodes + 54 servos simultaneously fast enough and without problems with just one Teensy 4.1. Thanks to everyone involved in the discussion, as well as your ideas - it was crucial for the project!

But, now I have another problem - T4.1 did not startup correctly when I power on. Sketch did not start. (if USB is connected to PC - all is OK) I took the time and became familiar with the fact that it is good (and necessary) to put a 300ms delay on startup. After all the reading I still couldn't figure out how or where to do this delay. I am using IDE 2.2.1.

Can anyone help with instructions regarding this delay? Thanks in advance!

With that many led strings and servos, I could imagine that you need to stage powering a subset of stuff on, waiting for a bit, and then powering the next. Otherwise, your initial power rush of everything starting up might overwhelm your power supply.

At the least, you might want to put the following call at the beginning of the setup function:

Code:
delay (3000);           // delay for 3 seconds to let things stabilize

I would imagine assuming you have a loop to call begin for each of the servos, to add a small delay there if the servo is started in begin. Presumably you will need to iterate to see what the numbers are. And then add a bit more delay just in case.

Now, if you need a small delay after moving each of the servos, then it becomes trickier. You probably have to have a queue of when each servo can start (<n> milliseconds after you issue the command). There you have write it non-blocking, etc.

Note, I am a software guy, but I've seen cases in various systems where you need to stage things as things start up.
 
Here is video:

You should watch the whole video because the better effects are in the second part of video. :)
At the beginning I show some things: I have counting down the seconds implemented in an interesting way with two hands, the brightness and color of the diodes can be adjusted, it can show temperature and humidity, but the good thing is the demo effects. In principle, it shows the time all the time, and the demo effects can be played manually if desired with buttons on one side. Setting the time is with buttons on the other side. As I said, everything is controlled by one Tensi 4.1 - 1080 diodes and 54 servos.


However, my question was related to this topic: https://forum.pjrc.com/threads/63251-Teensy-4-0-4-1-startup-delay-of-300-ms
My problem is exactly that of the this topic.
How and where should I do this delay, where is the file startup.c
Me need help with instruction so I can add 300 or more ms delay before sketch is starting.
Thanks in advance.
 
Last edited:
No - this is the first thing I tried of course, but without success. The question is, can a delay be made before the sketch has even begun to be executed? In the topic above mentioned by me I read this: delay in startup.c in ResetHandler funcrion - maybe this is it?
 
If the delay needs to happen before static objects are initialized, you could override the startup_late_hook() function:
Code:
void startup_late_hook(void) {
  delay(300);
}
 
I've gone through multiple tests as to why my sketch stops and won't start except when the T4.1 is connected to the PC.

It helped me a lot and this is exactly what I needed regarding the delay before the sketch gets to the setup section:
If the delay needs to happen before static objects are initialized, you could override the startup_late_hook() function:
I used this:
Code:
extern "C" void startup_late_hook(void);
void startup_late_hook(void) {Serial.print("Starting");delay(1000);Serial.println(".");delay(1000);}

And indeed the delay is achieved, but in my case it still didn't turn out to be the problem.
After testing where exactly the execution stops by displaying text first at the pre-delay, then in the first line of the void setup, then in the first line of the void loop, it turned out that the problem for me is in the line waiting for the serial port, because I use an AM2320 sensor:

while (!Serial) {delay(10);} // hang out until serial port opens

After removing this line - everything is fine, even the delay before initializing the static objects turned out to be unnecessary.

Once again I want to thank everyone, if I encounter a future problem - I will ask you again - the specialists here!
 
Back
Top