FastLED now has 50-pin parallel Teensy 4.1 WS2812 support

niteris

Member
It turns out, the Teensy 4.1 and 4.0 are absolute LED beasts!

The claim is all 50 pins can be driven with DMA. Can that be possible?

The backend driver is ObjectFLED which just came out for Arduino-Teensy.

FastLED 3.9.8 just made this a core driver option.

Code:
#define FASTLED_USES_OBJECTFLED
#include "FastLED.h"

Here's a reddit thread where they announced it.


Here's an example sketch:

 
Hey Paul, any insight on this issue? Looks like Audio DMA for teensy produces LED flickering on ObjectFLED:
I've looked into this, and it seems to be a DMA conflict between Audio library and OctoWS2811 code which I hijacked for ObjectFLED. Problem disappears when the user comments out instantiation of AudioInputI2S before calling the addLEDs(), which runs the begin() function on 3 dma channels for LEDs. Is there a known DMA conflict between Audio library and OctoWS2811?
 
I'm still researching this problem. But while learning how DMAChannel.h allocates channels to multiple apps, I found a function I believe is generating errors:
Code:
void DMAPriorityOrder(DMAChannel &ch1, DMAChannel &ch2, DMAChannel &ch3, DMAChannel &ch4)
{
    if (priority(ch3) < priority(ch4)) swap(ch3, ch4);
    if (priority(ch2) < priority(ch3)) swap(ch2, ch3);
    if (priority(ch1) < priority(ch2)) swap(ch1, ch2);
    if (priority(ch3) < priority(ch4)) swap(ch2, ch3);     //s.b. swap(ch3, ch4)
    if (priority(ch2) < priority(ch3)) swap(ch1, ch2);     //s.b. swap(ch2, ch3)
    if (priority(ch3) < priority(ch4)) swap(ch2, ch3);     //s.b. swap(ch3, ch4)
}

This appears to be a malformed bubble sort. I added comments where I thought code was wrong.

I still haven't zeroed in on this problem, but the above is my primary suspect. I need to install the Audio library in my sandbox to test the theory.
 
Happy New Year! I've reproduced this error and ruled out OctoWS2811 and FastLED as the cause. Also, the funny bubble sort mentioned above has no bearing on it. Apparently I introduced this error in ObjectFLED when I re-packaged the DMA driver code from OctoWS2811. So my PD is focused on this, and any differences I introduced while refactoring OctoWS2811 code.

The conflict can be eliminated by commenting out this single line of code from the Audio library file output_i2s.cpp:
Code:
393        CCM_CCGR5 |= CCM_CCGR5_SAI1(CCM_CCGR_ON);
This is executed by the constructor for AudioInputI2S. I can't see any direct conflicts between this and ObjectFLED's use of quad timer, but I'm not expert in register-level hardware.

Investigation continues. If anyone has an idea about this, please share! I didn't think I changed anything in OctoWS2811 DMA and timer configuration when I converted it to ObjectFLED. But I will dig deeper into this.
 
Just to confirm, I've added this to my list of issues to investigate. Can't make any promises of time frame when I'll get to it, but it definitely is on my list so I won't forget even if this forum thread becomes inactive.

The conflict can be eliminated by commenting out this single line of code from the Audio library file output_i2s.cpp:

Not turning on the clock to the I2S peripheral would completely break I2S audio output, right?
 
:cool: Right. Disabling I2S clock was not offered as a solution. It did tip me off that this is probably not a DMA channel conflict, as I suspected. I commented out each line in the config_i2s() method one-by-one, and only this one (CCGR_ON) fixed the problem.

Thanks for putting this on your list. I am going to do a deep-dive into how I managed to corrupt your timer-dma code while putting it into objectFLED. Hopefully I can find and fix it in a day or two, and you can take it off your list. I have a test sketch which lets me swap between OctoWS2811 and ObjectFLED, and the problem disappears when I run in Octo mode. So since I broke it...
 
WHOO-HOO! I found the problem. And it was not me screwing up Paul's timer-DMA code during refactoring. :)

Well, not exactly. It turns out this problem can be solved in ObjectFLED by specifying non-default LED waveform timing with the begin() function. Alternately, it can be solved by using an overclock factor of 1.2 or above, also using begin().

To recap, the problem is that ObjecFLED was adding 'sparkles' to LED display when the Audio library AudioInputI2S object is instantiated (and turns on it's clock). When OctoWS2811 is used in place of ObjectFLED, the output is clean: no sparkles. So I've been looking for corruption in my copy of OctoWS2811 code in ObjectFLED. But refreshing my code with the original Octo code did not solve this until I used Paul's default LED waveform timing instead of my own defaults.

OctoWS2811 defaults to 1250 nS LED clock with 300 nS T0H and 750nS T1H periods. When developing ObjectFLED I changed this default to 1250, 420, and 840. I had to shrink my T0H time to 350 nS to make the sparkles disappear. The reason I changed defaults in the first place is that the WS2812B data sheet specifies 33%/66% ratios between T0H/T1H and the clock period. Paul's defaults were closer to 25%/60% ratios, likely from a WS2811 data sheet. My ratios fail when sharing Teensy's clocks with Audio library. Specifically, ObjectFLED's T0H period must be decreased when Audio library turns on it's clock.

Also, I can make OctoWS2811 add sparkles by setting it's T0H to 360 or above, which also disappear when I stop Audio from starting it's clock.

I will get outside confirmation for my fix, then update ObjectFLED so it works with Audio out of the box.

Paul, this is no longer a critical issue, but I do thank you for stopping by, and for giving us OctoWS2811 open-source!
 
Glad you found it. :)

Maybe this would be a good moment to mention the WS2812Capture library I recently started. It's meant to capture WS2812 waveforms and analyze the actual timing. If you have another Teensy handy (and time to connect some wires between then), might be worth a try to see the actual timing with and without Audio running.

In my dream world of infinite hours in every day, my hope is to eventually build up hardware and testing software to automatically verify many of the important Arduino libraries. Ideally every commit would run a Github Action would build the library and run it on real hardware while other real hardware analyzes the waveforms, to verify the library really does work rather than merely compile. WS2812Capture is meant as an initial step to someday having that capability...
 
Last edited:
My blind guess is other libraries using DMA may be adding about 90ns latency. If so, you should see the WS2812Capture stats show different min & max timing while Audio is running.

I have an idea how this might be improved (maybe) but first step is to just reliable measure the effect with WS2812Capture before trying new DMA configurations!
 
About the T0H timing, I really believe you could go with no more than 300ns. Many different WS2812 products have been sold over the years, and many different datasheets with varying (sometimes non-nonsensical or contradictory) specs have been published.

But as a practical matter, many real LEDs make the low vs high decision with TH somewhere between 400ns to 500ns (and the actual threshold may vary slightly with temperature). Yes, this is much less than half of the 1250ns cycle.

Inside the chip is an imprecise analog delay circuit. With IC design, you don't get good quality resistors and capacitors like you do with regular electronics. Likewise, the transitor specs can vary quite a lot. The one thing you do get is well defined ratios of transistor size. Design-wise, it's an entirely different world of analog circuitry. The designers of these chips (probably) have a circuit similar to a 555 timer which triggets on the rising edge. If the input signal goes low before the timer, it's detected as a 0 bit. If the timer goes low first, it's a 1 bit. But unlike a 555 timer where you connect 5% or 1% resistor and probably 5% or 10% tolerance capacitor, inside the chip you only get resistors (or current sinking transistor circuits) and capacitors with tolerance in the 40% range, plus variance for changes in temperature and supply voltage. There is a special "bandgap reference" circuit made from carefully matched bipolar transitors which can be better... and who knows, maybe they have something like that inside these chips. But I doubt it (and a bandgap ref only solves the resistor side of a RC delay circuit), since they're so incredibly cheap. It's very likely 555-like, using the polysilicon gate material for the capacitor and likely a length of the well implant for resistor.

The point is that sort of design needs to make do with an on-chip resistor and capacitor having much worse tolerance than you would ever imagine with regular PCB design. 40% is the likely range. So the desiger does not dare to set the T0H vs T1H threshold close to 50% of the 1250ns cycle! If they do, and the capacitor or resistor end up being on the +40% tolerance, the threshold could end up too close to the end of the 1250ns timing. Pretty much all these WS2812 LEDs have their timing threshold abound 1/3rd of the way into the 1250ns period. If the timing tolerance goes long by 40%, they don't risk it being unable to receive data at all. And if the timing threshold goes short by -40%, the product still works if the controller sends a short T0H pulse.

This is why I set the default T0H at 300ns in OctoWS2811. And it's only 250ns in WS2812Serial, though in that case I can only adjust in increments of 125ns.

I would highly recommend using a similar threshold with your library. Please try to avoid thinking the 1250ns clock cycle as some sort of symmetry around 625ns. I know that way of thinking makes logical sense. But that's not how the analog circuitry inside these LEDs is designed. The real threshold is shorter and imprecise due to the limited resources and difficult trade-offs the IC designer faces.
 
Please try to avoid thinking the 1250ns clock cycle as some sort of symmetry around 625ns. I know that way of thinking makes logical sense. But that's not how the analog circuitry inside these LEDs is designed. The real threshold is shorter and imprecise due to the limited resources and difficult trade-offs the IC designer faces.
You hit my nail right on the head. I was just thinking about this very thing. Thanks for giving the detail behind why. The wide tolerances, I guess, are why these things are so overclockable. I wrote a serial driver before I discovered Octo', and OC'd by factor of 1.7. With Octo' code I can clock 1.6. I was thinking, since I expose overclocking & waveform timing, that a 1/3-2/3 ratio "compressed" better when overclocking. But it is about the threshold, not the ratio, as you say.

From experimenting, it seems Audio is adding 60-70 nS to T0H. I tried switching the ObjectFLED from TMR4 to TMR3, but same result.

I will update ObjectFLED to use your original default timing waveform. Also, I will re-think the logic I use to shrink the waveform for overclocking. I think I could user your new Capture library for this work. It comes with perfect timing. :)

Thanks again!
 
From experimenting, it seems Audio is adding 60-70 nS to T0H.

That's good info.

Maybe the problem could be lessened using "Channel preemption". The 3 DMA channels used for LED output would be able preempt the audio DMA (if configured some way). Audio should not be nearly as urgent because sample rate is only 44.1 kHz and the SAI ports have FIFOs. Audio can probably tolerate many microseconds of latency.

This feature is described in section 6.3.4 on page 99 in the reference manual. How to actually configure it might take some experimenting. I've been waiting for an actual low latency need before trying to play with this feature. Let me know if you dive into it and find it makes a difference?
 
This feature is described in section 6.3.4 on page 99 in the reference manual. How to actually configure it might take some experimenting. I've been waiting for an actual low latency need before trying to play with this feature. Let me know if you dive into it and find it makes a difference?
I read the section, and it says a channel can be configured to be preemptable; and that a channel can be configured to never preempt another. I infer that the default is for channels to be able to preempt, but no channels are preemptable unless explicitly configured to be.

So Audio would need to be made preemptable, and Octo/Object are ready to preempt as-is. I guess this indicates that good programming practice is to make DMA preemptable if it's slow or has long inner loop?
 
Back
Top