[queued] TriantaduoWS2811, a 32-channel WS* library for Teensy 4.0 using FlexIO & DMA

Ward

Member
[queued] TriantaduoWS2811, a 32-channel WS* library for Teensy 4.0 using FlexIO & DMA

Here's a project I've been working on to generate 32 WS* channels of up to 1000 LEDs each, with zero processor overhead through the use of FlexIO and DMA and external shift registers. Some features:

1000 LEDs per channel at 30 FPS, though the number of LEDs and frame rate can be traded off. 10 LEDs at 3 kHz should be attainable.
Consumes three Teensy pins and zero processor time
Double-buffered to reduce tearing artifacts with video
Each channel is configurable for RGB, GRB, or GRBW
If you're really nuts, you could probably fit three of these on a Teensy, for 96 total output channels. Right now it's hard-coded to FlexIO 1

This would also be a good starting point for any project that needs a lot of outputs updated relatively (3.2 MHz) quickly, directly from RAM.

Code and lots more details here:
https://github.com/wramsdell/TriantaduoWS2811
 
Thank you! In thinking about this some more, I'm tempted to do some more work with FlexIO and shift registers because it's just such a powerful combination. I *think* one could do 96 channels of PWM with 200ns resolution using 8 IOs: two clocks and 6 data lines. That would be an interesting feat.
 
This looks extremely interesting! Have you looked at extending it to a second flexIO for more outputs int the meantime? Also, is there an integration with FastLED already so I can save myself some work :) ?
 
Nope, I've never investigated FastLED. Is it a front-end/back-end sort of thing where I could easily swap in my drive code and leverage the existing value manipulation functions?
 
So, I just tried this library on a Teensy 4 with a PCB that is meant for a new interactive art LED installation. It works! Mostly! :)

Here's some questions / feedback:
- Defining LEDCOUNT in your own code and smaller than in TDWS2811.h breaks the library, and in fact the Teensy 4 becomes unstable, refusing serial communication. This is fixed by just changing the value inside the library header, but I'm not quite sure what's happening there.
- Is there a way to see if a frame has fully rendered? Or a counter that ticks once per frame? I want to calculate actual framerates and update LEDs exactly once per frame, as fast as possible.
- Only some of my LEDs work with this library. I'm using a cheap 16 LED ring from Aliexpress, and some batches from it work, while others don't. Unfortunately they don't really say which LED type they use, but it could be that they switched from WS2812b to SK6812. On paper, they're protocol compatible, but there might some tiny timing differences that make it fail. Happy for others to confirm this one though.
- Only the first channel seems to be actually set to DEFAULT_CHANNEL_TYPE in the header (one of the last lines in the header file).
 
Last edited:
So, I just tried this library on a Teensy 4 with a PCB that is meant for a new interactive art LED installation. It works! Mostly! :)

Here's some questions / feedback:
- Defining LEDCOUNT in your own code and smaller than in TDWS2811.h breaks the library, and in fact the Teensy 4 becomes unstable, refusing serial communication. This is fixed by just changing the value inside the library header, but I'm not quite sure what's happening there.
- Is there a way to see if a frame has fully rendered? Or a counter that ticks once per frame? I want to calculate actual framerates and update LEDs exactly once per frame, as fast as possible.
- Only some of my LEDs work with this library. I'm using a cheap 16 LED ring from Aliexpress, and some batches from it work, while others don't. Unfortunately they don't really say which LED type they use, but it could be that they switched from WS2812b to SK6812. On paper, they're protocol compatible, but there might some tiny timing differences that make it fail. Happy for others to confirm this one though.

You didn't mention if you used a level shifter such as (https://www.adafruit.com/product/1787), as well as the capacitors and resistors specified by the typical guides (https://learn.adafruit.com/adafruit-neopixel-uberguide/best-practices). This is one of the value adds that the Octows2811 shield gives you.

Here is an older article that was one of the first to mention about using level shifters on Teensy: https://happyinmotion.com/?p=1247

Over the years, I've some WS2812B's that wouldn't work unless I powered them with 5v and used a level shifter. The newer ones that use SK6812's instead of WS2812B's, can be run at 3.3v, so for small number of LEDs, I just power it with 3.3v, or if the device is powered by 3.7-4.2v lipo batteries, use VIN.

Here is one article comparing the two: https://www.pololu.com/category/180/sk6812-ws2812b-based-led-strips.

As I recall there were at least 2 generations of WS2812's as well. So who knows whether your LEDs were manufactured with old chips or new. I suspect you probably had SK6812's, and one batch went back to WS2812B's.
 
Last edited:
For this project I'm using the 74AHCT595 as recommended in the documentation in the first post, whose outputs are apparently level-shifted to 5V. However, I don't have an oscilloscope handy to actually verify that.
 
For this project I'm using the 74AHCT595 as recommended in the documentation in the first post, whose outputs are apparently level-shifted to 5V. However, I don't have an oscilloscope handy to actually verify that.

Ok, just checking the obvious. Evidently the resistor and capacitor are also needed for large installations that are not battery powered, as is keeping the distance down (none of that applies to me, but it sounds like you might need it). And of course there is the possibility that some of your LEDs just are not wired professionally within the LED itself.
 
Yep, thanks! It's a valid point, as I've been bitten by this before (the worst was one a previous projects where I was using 74HC245 instead of 74HCT245 level shifters, which was fine in lab conditions but started glitching when the ambient temperature dropped to outdoor temps :mad:). For the current project, I have enough LED rings of the batch that works to build it all, so right now it's not a big deal for me. Once our Makerspace reopens I can check the actual values with an oscilloscope on the LEDs that don't work.
 
For this project I'm using the 74AHCT595 as recommended in the documentation in the first post, whose outputs are apparently level-shifted to 5V. However, I don't have an oscilloscope handy to actually verify that.

AHCT - Advanced High speed Cmos - TTL switching levels compatible inputs.

TTL logic levels are: Log. 0 is between 0V to 0.8V and Log 1 is between 2V to 5V. So there is no problem with 3V3 CMOS logic.

AHC - CMOS compatible levels - thresholds are 0.3*Vcc for LOW and 0.7*VCC for HIGH to be reliable. So with 5V Vcc it's 3.5V for HIGH and it's way beyond 3.3V from 3.3V CMOS logic. It might work, and sometimes it does, but it can't be considered as reliable design as the threshold value might be floating somewhere between 0.3*Vcc and 0.7*Vcc.
 
It might work, and sometimes it does, but it can't be considered as reliable design as the threshold value might be floating somewhere between 0.3*Vcc and 0.7*Vcc.

I'm not quite sure what you're trying to say. Looking at the specs of a 74AHCT595 chip, it does state that:

Both 74AHC595 and 74AHCT595 inputs are overvoltage tolerant. This feature allows the use of these devices as
translators in mixed voltage environments.

Specifically, for the 74AHCT595, a threshold of 2.0V for HIGH is mentioned when Vcc is 5.5V.
 
I'm not quite sure what you're trying to say. Looking at the specs of a 74AHCT595 chip, it does state that:
As it's for driving WS2812x I'd expect Vcc=5V and therefore it doesn't make a sense to use AHC version as it has pretty much the same input levels (CMOS) as WS2812

For the AHC version, that 3.3V HIGH from Teensy outputs might be not enough to be recognized as HIGH on that AHC input as threshold might be anywhere between 1.5V to 3.5V with 5V Vcc. High is still in the undefined region. You can use little bit lower Vcc, to be outside of undefined region or you can add for example diode between ground and GND to move threshold little bit

For the AHCT is perfectly fine, as its undefined region is before 0.8V - 2.0V
 
I had some problems getting the lib to work, but i resolved it by tweaking the pll5 clock. I’m just posting this here because this beast deserves way more attention than it is currently getting ;)

https://github.com/wramsdell/TriantaduoWS2811/issues/5


I don’t consider myself a professional c-coder, but there might be some bugs in the code on other places like:
channelType_t channelType[32]={DEFAULT_CHANNEL_TYPE}; -> i don’t think this intializes all array items?
p->SHIFTCTL[4] = 0x00800002; -> this writes in const memory (marked as unused), that can’t be right? Right?

Also it look like you can mix rgb bgr, rgbw. But the bit count is calculated for the current channel and if you go mix 24 and 32 bits won’t these overlap??

else if (channelType[channel]==GRB)
{
bitCount=24;
ledVal=(color.green<<24)+(color.red<<16)+(color.blue<<8);
}

else if (channelType[channel]==GRBW)
{
bitCount=32;
ledVal=(color.green<<24)+(color.red<<16)+(color.blue<<8)+color.white;
}

Anyway i don’t need an answere, just putting it here to help and raise awareness.
 
Hi MaltWhiskey.

Glad to see some people trying some custom boards with this lib!
I did one myself
passengers.jpg

I'm using it in this art installation: https://www.instagram.com/p/CGxljWeKKal/
(a 6 meter length container filled with ~12000 leds, I have 18 above PCB communicating through Art-Net)

I'm using it to control RGBW (SK6812) leds. So far so good, but I face flickerings on the last output of the last SN74 (see: https://github.com/wramsdell/TriantaduoWS2811/issues/4)

The only thing I modified was some timings here: https://github.com/wramsdell/TriantaduoWS2811/blob/master/TriantaduoWS2811/TDWS2811.h#L112

I had to increase the size of the "zeros" array to 50.

If you face and/or have any idea on my issue, I would be glad to discuss with you!

Regards
 
I have to look into it, but those zero’s are probably the reset. The reset for ws2811 (don’t know sk6812) needs to be at least 280000 ns acording to the datasheet, hook up a scope and verify?. The double buffering in the lib is also looking a bit scetchy, buffer might change before a frame is completed, giving flickering. What i like to do is the way they do it in video games. Using the delta time it takes to render a frame and adjust animations according to delta time. And only switch buffers when a frame is completely rendered. This way there is no tearing and animations are framerate independent. But in the lib is scater gater dma, the irq switches buffer so dma never ever stops so every frame is displayed an equal amount of time... Maybe we can change this? The leds keep there pwm even if we don’t feed new data. Or just make sure animation rendering takes less time than a frame and synchronize this with buffer switching...

Anyway this is just my idea by looking at the code. Still need to hookup some leds to it LOL
I’m using this to make a 16x16x16 led cube with pl9832 leds. A Nightmare solder and led bending job.

Btw your art project looks really cool :p
 
Well, it is still black magic to me, but it seems the pll5 timing and the dma timing is loosly coupled... meaning ifyou adjust the pll5 timing the 4 pieces of dma don’t adjust accross the new range. This might give a problem with your sk6812 leds, you should grab your scope and check if the timing is correct. I checked my 32 channels with a led strip with 250 ws2812b leds and all was fine. But now i need to get my pl9823 leds and their timeing is verydifferent. I need arround 450ns high, 2x450ns data, 450ns low. The other leds do 312,5 high, 312,5 data, 2x312,5 low. So some dma tweaking is in order.
 
The idea behind this library is fantastic, credits to Ward! But it seems to me there needs to be a little more effort in the coding part.

Like this:

pinConfig=IOMUXC_PAD_DSE(7)+IOMUXC_PAD_SPEED(3)+~IOMUXC_PAD_PKE+~IOMUXC_PAD_SRE;
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_04 |= pinConfig;
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_04 &= ~pinConfig;
pinConfig=IOMUXC_PAD_DSE(6)+IOMUXC_PAD_SPEED(3)+~IOMUXC_PAD_PKE+~IOMUXC_PAD_SRE;
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_05 |= pinConfig;
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_05 &= ~pinConfig;
pinConfig=IOMUXC_PAD_DSE(7)+IOMUXC_PAD_SPEED(3)+~IOMUXC_PAD_PKE+~IOMUXC_PAD_SRE;
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_06 |= pinConfig;
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_06 &= ~pinConfig;
}

those bit manipulations are not doing what is intended. I replaced it with this below:

/******************************************************************************
* Setup pin 2, 3, 4 as OUTPUT and set IOMUX DSE and SPEED
*
* HYS Hysteresis enable = 0 (0=disabled, 1=enabled)
* PUS pull up/down config select = 0
* (0=100K pull down, 1=47K pull up, 2=100K pull up, 3=22K pull up)
* PUE Keeper select (0=keeper, 1=pull)
* PKE Pull keeper enable = 0 (0=disabled, 1=enabled)
* ODE Open drain enable = 0 (0=disabled, 1=enabled)
* SPEED speed = 1
* (0=50MHz, 1=100MHz, 2=150MHz, 3=200MHz)
* DSE drive strength = 7 (should be impedance matched)
* (0 = off, 1=150/1 Ohm, 2=150/2 Ohm, 3=150/3 Ohm, ... 7=150/7 Ohm)
* SRE slew rate = 0 (0=slow, 1=fast)
*****************************************************************************/
void TDWS2811::configurePins() {
pinMode(2, OUTPUT);
// external memory controller 04 is connected to gpio 2
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_04 = IOMUXC_PAD_DSE(7) | IOMUXC_PAD_SPEED(1);
pinMode(3, OUTPUT);
// external memory controller 04 is connected to gpio 3
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_05 = IOMUXC_PAD_DSE(7) | IOMUXC_PAD_SPEED(1);
pinMode(4, OUTPUT);
// external memory controller 04 is connected to gpio 4
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_06 = IOMUXC_PAD_DSE(7) | IOMUXC_PAD_SPEED(1);
}

This is all over the place. I replaced the PLL setup with (not tested, just mental coding) this:

void TDWS2811::configurePll(void) {
// Before disabeling the PLL set the bypass source to the internal 24MHz
// reference clock. See 14.6.1.6 page 1039.
CCM_ANALOG_PLL_VIDEO_CLR = CCM_ANALOG_PLL_VIDEO_BYPASS_CLK_SRC(3);
// Bypass the PLL See 14.6.1.6. page 1039.
CCM_ANALOG_PLL_VIDEO_SET = CCM_ANALOG_PLL_VIDEO_BYPASS;
// Disable the Video PLL output before configurating
CCM_ANALOG_PLL_VIDEO_CLR = CCM_ANALOG_PLL_VIDEO_ENABLE;

// Clear CCM_ANALOG_PLL_VIDEO, before setting the values
CCM_ANALOG_PLL_VIDEO_CLR = CCM_ANALOG_PLL_VIDEO_DIV_SELECT(0x7f) |
CCM_ANALOG_PLL_VIDEO_POST_DIV_SELECT(3);
// Clear CCM_ANALOG_MISC2, before setting the values
CCM_ANALOG_MISC2_CLR = CCM_ANALOG_MISC2_VIDEO_DIV(3);

// Post-divider for video values (0=/1, 1=/2, 2=/1, 3=/4)
CCM_ANALOG_MISC2_SET = CCM_ANALOG_MISC2_VIDEO_DIV(2);
CCM_ANALOG_PLL_VIDEO_SET =
// DIV_SELECT set the loop divider (27-54)
CCM_ANALOG_PLL_VIDEO_DIV_SELECT(42) |
// Divider after PLL (0=/4, 1=/2, 2=/1, 3=reserved)
CCM_ANALOG_PLL_VIDEO_POST_DIV_SELECT(0) |
// Enable PLL output
CCM_ANALOG_PLL_VIDEO_ENABLE;
// NUM = 30 bits signed numer, abs(NUM) must be less than DENOM
CCM_ANALOG_PLL_VIDEO_NUM = 2;
// DENOM = 30 bits unsigned numer. NUM/DENOM -> fracional loop diver
CCM_ANALOG_PLL_VIDEO_DENOM = 3;

// Wait for the PLL to lock. Prevents random initial edges 13.3.2.2.1
while ((CCM_ANALOG_PLL_VIDEO & CCM_ANALOG_PLL_VIDEO_LOCK) == 0)
;
// Disable bypass for Video PLL once it's locked.
CCM_ANALOG_PLL_VIDEO_CLR = CCM_ANALOG_PLL_VIDEO_BYPASS;

/* Disable FlexIO clock gate */
hw->clock_gate_register &= ~(uint32_t(hw->clock_gate_mask));

uint32_t ccm_cdcdr = CCM_CDCDR;
// Divider clock pred values 0=/1, 1=/2, 2=/3, 7=/8
ccm_cdcdr &= ~(CCM_CDCDR_FLEXIO1_CLK_PRED(7) |
// Divider clock podf values 0=/1, 7=/8
CCM_CDCDR_FLEXIO1_CLK_PODF(7) |
// Clock multiplexer 0=PPL4, 1=PLL3, 2=PLL5, 3=PLL3swclk
CCM_CDCDR_FLEXIO1_CLK_SEL(3));

// 4 is not defined in the manual
ccm_cdcdr |= (CCM_CDCDR_FLEXIO1_CLK_PRED(4) |
// divide by 1
CCM_CDCDR_FLEXIO1_CLK_PODF(0) |
// PLL5
CCM_CDCDR_FLEXIO1_CLK_SEL(2));
CCM_CDCDR = ccm_cdcdr;
}

But the timing issues is probably this:
p->TIMCTL[0] =
0x01C30401; // trgsel=1,trig pol= active low, trig src = internal timer
// pin output CME4=2, dual 8 bit counters baud mode
p->TIMCTL[1] = 0x00030503; // pin output cme5=3, dual 8 bit counters pwm mode

p->TIMCMP[0] = 0x0000BF00; // (lower 8 bit + 1) * 2 is baud divider. upper 8
// -> bf+1 = c0/2 = 0x60=96 bits 32 32 32=96
p->TIMCMP[1] = 0x0000001F; // lower = high period output 1f+1 = 0x20 = 32,
// upper is low period of output 0+1=1

/* Finally, set up the values to be loaded into the shift registers at the
* beginning of each bit */
p->SHIFTBUF[0] = 0xFFFFFFFF;
p->SHIFTBUFBIS[1] = 0xAAAAAAAA; // Identifiable pattern should DMA fail to
// write SHIFTBUFBIS[1]
p->SHIFTBUF[2] = 0x00000000;
p->SHIFTBUF[3] = 0x00000000;

There are 96 bits going through the shift register while there needs to be 32*4 for 1 effective bit to a ws2811. So SHIFTBUG[3] will not get shifted in and the frequency will be divided in 3 parts not 4. Again still a mental and manual reading excercise, haven't run the code jet......

Than again i might have it all wrong....
 
This is a copy of the issue on github: I'm working with your lib and so far it's great, thanks for it!

I reached out to you on linkedin during the month of june about some issues and I'd like to thank you for the help and time you took to help me solve them.

There is still one issue I face: flickering on the last output of the last SN74.
You told me the following:

Yeah, it's coming back to me now. There's a hardware bug where the contents of the LSB of shift register 4 get transferred to the output on the wrong clock edge, i.e. data is changing on the rising edge, which should never happen. The fix was to make sure that the LSB of SHIFTBUF 4 was 0. The incorrect shift still happens, but it's zero so it doesn't corrupt the waveform. Because it's a corruption at the very end of the cycle (SHIFTBUF[3][31]->incorrect shifting of SHIFTBUF[4][0] for half a clock->SHIFTBUF[0][0]) the problem only appeared on the very last output.
I tried to change the following line: https://github.com/wramsdell/TriantaduoWS2811/blob/master/TriantaduoWS2811/TDWS2811.cpp#L136

To SHIFTBUF[4] = 0x00000000; and also tried SHIFTBUF[4] = 0x00000001; with no success.

Any idea where I should investigate ?

Regards


Well hooked up my scope and i was right, only 96 bits are shifted, so change the BF to FF to shift 128 bits. Now about the last output of the last shift register. Channel 32 (counted from 1) is actually the First bit that gets shifted in, so forget all about SHIFTBUF[4] that thing doesn’t exist. It has to do something with SHIFTBUF[0]... I just don’t know what.... Another posibility is reflections on the clock line, but i’m not an electrical engineer (and also not a professional programmer) i’ll go and try to see if i can terminate the signal (but this is way out of my skillset).

Also is it ok to discuss issues here in this thread? Or should another thread be more appropriate?
 
Arrrg i’m way over my head here in the electrical engineering department... I made this 2 layer board and i put ground on top and on bottom layer with a few traces, but there are not many on the top, just 4 cm in total. And 20 ish via’s just in case. Big 5v trace, but i don’t think i need it at all. Led stip inputs are high impedance...

Probably better to start at last shiftregister with clock, because shift that one first, than the 3th, 2nd, 1st. Might be like peanuts but there is that 1 ns delay over 15cm, so it may help... Than make a 4 layer board ground top and bottom, 2 signals between.

Here is my board https://github.com/MaltWhiskey/32-Shift-Register

Is anybody more knowledgable willing to have a look and improve the board?
Would be nice if we can make a hardware and software solution for the community :p
 
Back
Top