Driving APA102 LEDs using FlexIO DMA?

StefanPetrick · Apr 11, 2023

Teensy 4: I noticed that at 800 fps on 256 LEDs (driven at 12 MHz with FastLED) I spend up to 20% of the time waiting for FastLED.show() to finish. This surprised me.

Not sure if this is the expected behaviour considering that FastLED uses hardware SPI if possible. I used pin 11+13. I would expect the LED update to happen fully in the background while I have almost 100% of the loop time available for rendering new data.

My question is: Did anyone use FlexIO DMA for the LED SPI transfer?

Is my assumption correct that this should be the best way on a Teensy 4 to free as much CPU time as possible? If so does it require then this double buffering like SmartMatrix does it (one buffer gets filled while the previous frame in the other buffer gets written out, then buffer swap)?

I'm happy for any hint what I should consider, look into or failed to understand yet.

StefanPetrick · Apr 12, 2023

I'm slightly irritated by the silence.

Are my questions too esoteric (nobody has an answer) or too plain stupid (nobody wants to engage, because my assumptions are beyond wrong)?

I'd appreciate any advice to any of the topics I touched, I'm really stuck.

kd5rxt-mark · Apr 12, 2023

StefanPetrick said:
Teensy 4: I noticed that at 800 fps on 256 LEDs (driven at 12 MHz with FastLED) I spend up to 20% of the time waiting for FastLED.show() to finish. This surprised me.

Not sure if this is the expected behaviour considering that FastLED uses hardware SPI if possible. I used pin 11+13. I would expect the LED update to happen fully in the background while I have almost 100% of the loop time available for rendering new data.

My question is: Did anyone use FlexIO DMA for the LED SPI transfer?

Is my assumption correct that this should be the best way on a Teensy 4 to free as much CPU time as possible? If so does it require then this double buffering like SmartMatrix does it (one buffer gets filled while the previous frame in the other buffer gets written out, then buffer swap)?

I'm happy for any hint what I should consider, look into or failed to understand yet.

@Stefan:

In general & from my experience when I've asked questions in the past, most of the experienced members on this forum are very good at answering questions for which either they have some direct experience, or for which they have some particular expertise. Speaking for myself (and I am, by no means, an expert on anything in particular, but I do have some valuable practical experience with the T4.x), I try to answer questions whenever I can.

Unfortunately, I don't have any experience with FlexIO DMA, nor with LED SPI transfer, so I'm a total strike-out in that arena. Have you tried looking at the source for the FastLED library to see what techniques it is using for information transfer to the LEDs ?? Maybe it is already doing what you think might provide some improvement ??

Sorry, I know that's not a helpful answer, but it's the best answer that I can give with my complete lack of expertise in this area !!

Good luck & have fun !!

Mark J Culross
KD5RXT

StefanPetrick · Apr 12, 2023

Hi Mark,

I appreciate your kind reply!

Here is what I know for sure: FastLED performs either bitbanging or harware SPI transfer, when the LEDs are connected to the harware SPI pins. Also I know that there is a difference - depending on which pins are used - what speed is possible and how blocking the transfer is.

Using hw SPI is less blocking than bitbanging (at the same transfer speed) according to my own measurements. There are several files in the library with names like fastspi_dma.h, fastspi_bitbang.h and so on.

I basically understand nothing of it, maybe it's readable for a software engineer or for someone who at least implemented SPI communication themselfes before - for me it just raises way more questions than it answers. Unfortunately Dan who wrote it died and I can't ask for his guidance anymore. RIP. Also the pin operation seems to happen on a "as close to the hardware as it gets" level - a lot of inline assembler...far beyond anything I have a chance to understand as a hobbyist and user. Last maybe relevant info I have is that the last major update of the library happend for Teensy 3.2 - so I consider it unlikely that any specific optimisation for later Teensys ever happened. It works up to T 4.1, but I guess without taking advantage of new IO methods newer processors provide.

So I read a lot in the forum here, I see people communicate with amazing speeds with large LCD panels, I read a lot of NXPs marketing material how amazing FlexIO is but so far I found nowhere a documentation starting from scratch and providing all needed information to get this going. All material I found assumes that the user knows already a lot about it but doesn't provide basic abstract information what has to happen why and when and how...basically explaining the full concept behind. If anyone can recommend a good source (beyond the IMXRT1062DVL6 datasheet which I perceive as not at all beginner friendly) I'd really appreciate any hint.

I would love to learn and understand more but I really don't know where to start or what to read or whom to ask.

On the other hand I use the SmartMatrix library for a HUB75 panel which appanrently leverages FlexIO, has an amazing bandwith, is fully non-blocking, ...so there seems to be a way to push out data very efficient and elegant.

@everybody: Please point me to sources which explain all this stuff. I read also this long post but it seems like I miss so many basics that I have no chance to catch up based on the info there.

Maybe I'm overly ambitious and it's simply unrealistic to learn this topic in my free time without having at least a degree in electronics to begin with. Maybe I just didn't find the right sources yet - something like a generously commented "blink example" for SPI, DMA and FlexIO. I think I'm missing kind of a roadmap what I have to dig into in which order to make any progress...or someone experienced telling me to stay away from this stuff because it's a big boys only game.

Stefan

easone · Apr 12, 2023

I believe the SmartMatrix library includes support for APA102 LEDs using an implementation of FlexIO on Teensy 4. The SmartLED shield also has the requisite connector to attach an APA102 array. This is in addition to HUB75.

There's an example sketch here:
https://github.com/pixelmatix/Smart...nel_Plus_Apa102/FastLed_Panel_Plus_Apa102.ino

The implementation of the driver is probably within these source files, though there may be other dependencies:
https://github.com/pixelmatix/SmartMatrix/blob/master/src/MatrixTeensy4Apa102Refresh.cpp
https://github.com/pixelmatix/SmartMatrix/blob/master/src/MatrixTeensy4Apa102Refresh_Impl.h

I haven't tried using this aspect of the library so I don't know exactly how to set it up. Louis Beaudoin (@embedded-creations on this forum) is probably the best person to ask for support.

Btw, the FlexIO is really confusing and I don't think anyone has a good understanding of it. (I contributed to the FlexIO HUB75 driver for SmartMatrix a few years ago, but I would struggle to provide much help these days.)

StefanPetrick · Apr 12, 2023

Thank you for this reply, easone!

I was not even aware that SmartMatrix can drive APA102s as well, I was fully focussed on HUB75. This is good news because I'm familiar with using SmartMatrix already and I love the performance it provides. Good to know that I'm on the right path. So maybe for now I dive into the driver details where I feel I have a chance to catch up and learn basics before looking at the hardware level.

Thanks for your answer, the links and for your contribution to SmartMatrix. I enjoy using it: https://www.youtube.com/shorts/GOaweTDAd3k

easone · Apr 12, 2023

StefanPetrick said:
Thanks for your answer, the links and for your contribution to SmartMatrix. I enjoy using it: https://www.youtube.com/shorts/GOaweTDAd3k

Looks great! Were you able to increase the frame rate? 240 FPS is the default frame rate so it should be possible to set it higher. With 128x64 pixels at 36 bit refresh depth you can get about 300 FPS for a HUB75 panel with SmartMatrix. I think 32x32 should achieve over 1000 FPS unless CPU limited. If you have a running SmartMatrix HUB75 sketch, you can simply print the macro MAX_REFRESH_RATE to get the theoretical max frame rate for your screen size and refresh depth. You can use the matrix.getRefreshRate function to check what the actual rate you're getting is.

shawn · Apr 12, 2023

This thread may help: https://forum.pjrc.com/threads/66201-Teensy-4-1-How-to-start-using-FlexIO

StefanPetrick · Apr 13, 2023

Apparently I was too frustrated and I didn't read the intro carefully. The very first link is this one https://forum.pjrc.com/threads/6335...tart-using-DMA?p=266991&viewfull=1#post266991 which is exactely the "let's start from scratch" write up I was looking for.

Thanks to @miciwan for providing this tremendously helpful first-steps introduction into the topic!

I'm making progress now.

Driving APA102 LEDs using FlexIO DMA?

StefanPetrick

Well-known member

StefanPetrick

Well-known member

kd5rxt-mark

Well-known member

StefanPetrick

Well-known member

easone

Well-known member

StefanPetrick

Well-known member

easone

Well-known member

shawn

Well-known member

StefanPetrick

Well-known member