LED Matrix driver for T4.0 using FlexIO parallel out, FlexPWM, DMA, & SmartLED shield

easone

Well-known member
LED Matrix driver for T4.0 using FlexIO parallel out, FlexPWM, DMA, & SmartLED shield

demo.jpgphoto.JPG

For a little while, I've been working on adapting the SmartMatrix library to Teensy 4.0 so I can drive a bunch of these Jumbotron-type LED matrices. Teensy 4.0 is attractive for this application since the T3.6 can only drive a screen of about 64x128 pixels due to memory and speed limitations; the T4.0 should be capable of at least twice as many pixels. I've made some good progress and thought I'd share something.

This is not a full port yet but it's a proof of concept that shows a way to drive the matrix using the T4.0's hardware peripherals, specifically FlexIO, FlexPWM, and DMA. In brief, these matrices are driven using 6 data pins, a clock pin, an OE pin, and a latch pin. They do not have any internal LED PWM drivers, so each LED is inherently capable of only displaying fully saturated R/G/B/C/M/Y/W. To get more colors, you have to modulate the LEDs directly by switching pixels on and off really fast! Only two rows are driven at any one time (one in the upper and one in the lower half), so to display a full image it is necessary to quickly loop through all the rows. There are pins for row addressing, but the Pixelmatix SmartLED Shield V4 (designed for T3.2) conveniently provides a way to multiplex the data and address pins together, while also providing the needed 5V level shifting. I opted to keep that shield for T4.0 (with the help of some jumper wires to connect the right Teensy pins). See this Github for the code (with comments) and jumper wire details.

On T3.x, the SmartMatrix library uses DMA to write 8 bits to a GPIO port connected to the data and clock signals. T4.0 doesn't have a configuration of GPIO pins that makes this convenient, but I found a group of 6 pins on FlexIO2 that can be driven in parallel (and don't conflict with the SmartLED shield), and FlexIO can also generate the needed clock signal in hardware. There aren't many examples of FlexIO out there, but I was inspired by Ward's TriantaduoWS2811 demo to give this a try. I can explain a bit more about how it works if anyone is interested.

To display more colors, I adapted the technique from SmartMatrix which uses Binary Code Modulation by quickly flashing each row of pixels over and over, with different timings to represent the individual bits in the binary representation of the color intensity values. (The MSB time is twice as long as the next bit, etc.) This really deserves a detailed writeup... Anyway, I figured out how to do that on T4.0 with FlexPWM triggering a DMA transfer that changes the next frequency and duty cycle of the Latch and OE signals, and then links to the DMA transfer feeding the FlexIO for data output. Everything happens without taxing the CPU, except for a data processing interrupt. For the 32x64 pixel demo, the row update cycle happens at a rate of up to 300 kHz and the full matrix can display 12 bit colors at 580 frames per second. The image shows what the LED panel looks like displaying a "T4" demo image. It's hard to take photos of these LED panels. It looks awesome in person (and super bright)!

Next steps are to implement SmartMatrix's remaining features (such as the more sophisticated buffering, and the ability to define a screen made up of multiple matrices tiled together) and get this working with the other components of the library.
 
Superb, keep going. Others said before it wouldn't be possible at all on the T4. Plus, 12-bit color sounds great!

Kind regards,
Sebastian
 
Last edited:
Nice work. I'll be interested to see what refresh rate you can get for 128x128 and if you can go above that.
For anyone who needs higher resolutions, while still using arduino code, you can try https://github.com/marcmerlin/FastLED_RPIRGBPanel_GFX on rPi. More details here: http://marc.merlins.org/perso/ardui...Resolution-RGBPanels-with-a-Raspberry-Pi.html

I was able to run that arduino code as high as 384x256 on rPi3 using the active-3 board.
http://marc.merlins.org/perso/ardui...92_-to-384x256-and-maybe-not-much-beyond.html

actually this reminds me that 128x256 is probably the most you can reasonably do on a single channel, regardless of the processor due to bus speed limitations.
 
Wow, Marc, those displays are very impressive! Pretty amazing what the rPi can do here.

My code is driving the matrix at 24 MHz which is close to the maximum clock speed the matrix can support. In theory, I agree that 128x256 would be about the maximum acceptable resolution for T4.0 with RGB888 color and 100Hz framerate. Although I think you could push it to 256x256 if you compromised on the color depth (RGB555) without running out of memory. It's very easy to change the color depth all the way from 1 to 16 bits per color channel.

Currently I am limited by a timing critical interrupt that fills the row buffer at the end of each row, but it should be possible to go way faster by using double buffering and a separate interrupt to refill it continuously (following SmartMatrix). I only have four 32x64 panels to test with here, but I can attempt bigger resolution settings to see what happens.
 
Wow, Marc, those displays are very impressive!

My code is driving the matrix at close to the maximum clock speed the matrix can support (24 MHz). In theory, I agree that 256x128 would be about the maximum acceptable with RGB888 color and 100Hz framerate. Although I think you could push it to 256x256 if you compromised on the color depth (RGB555) without running out of memory. It's very easy to change the color depth all the way from 1 to 16 bits per color channel.

Currently I am limited by a timing critical interrupt that fills the row buffer at the end of each row, but it should be possible to optimize by using double buffering and a separate interrupt to refill it continuously (following SmartMatrix). I only have four 32x64 panels to test with here, but I should be able to attempt bigger resolution settings to see what happens.
You are correct on all points. SmartMatrix doesn't go below 24bpp, but rpi-rgb-panel allows you to use fewer bits per color, to increase the refresh rate. If you're going to do serious work with rgbpanels, I greatly recommend you get yourself an rPi3 or rPi4 and look at the pi-rgb-panel code to compare and borrow from for your teensy v4 code :) It comes with multiple options to increase performance at the cost of visual perfectness.

As for not having enough panels, as you know, you can totally define more panels than you really have, and run the code to make sure it doesn't run out of RAM or crash, and that it looks ok on the first panels you do have.

256x256, if you achieve that with reasonable colordepth, I would be super impressed, that'd pushing the limits for sure.
Please send updates to your work here too if you don't mind: https://community.pixelmatix.com/t/teensy-4-0-released/498
 
Oh, and binary code emulation you mention, was actually first used in the adafruit driver.
Sparkfun explains very nicely how it works here https://www.sparkfun.com/news/2650
also http://www.batsocks.co.uk/readme/art_bcm_1.htm

In much simpler, I wrote one too for an 8x8 matrix https://github.com/marcmerlin/LED-Matrix
Like https://github.com/marcmerlin/LED-Matrix/blob/master/LED_Matrix.cpp#L72
You can actually see BCM in action on this old video I did in 2017: https://youtu.be/9yGZLtewmfI?t=18
 
By the way, if you want more pixels to play with, those are the 128x64 panels I've bought and used:
https://www.amazon.com/Indoor-128x64-Module-256mm128mm-256128mm/dp/B0869P1DCH
They are ABCDE but they use FM6126A which needs the special init sequence that is now built in rpi-rgb-panel an out of tree for SmartMatrix (ping me if you need details on that).
Those panels are great because you get 4x the resolution from your panels for about 2X the price and no special wiring.
The one I gave is ABCDE, be careful as some cheaper panels on the internet are ABC panels that require special addressing.
 
This is not a full port yet but it's a proof of concept that shows a way to drive the matrix using the T4.0's hardware peripherals, specifically FlexIO, FlexPWM, and DMA. In brief, these matrices are driven using 6 data pins, a clock pin, an OE pin, and a latch pin. They do not have any internal LED PWM drivers, so each LED is inherently capable of only displaying fully saturated R/G/B/C/M/Y/W. To get more colors, you have to modulate the LEDs directly by switching pixels on and off really fast! Only two rows are driven at any one time (one in the upper and one in the lower half), so to display a full image it is necessary to quickly loop through all the rows. There are pins for row addressing, but the Pixelmatix SmartLED Shield V4 (designed for T3.2) conveniently provides a way to multiplex the data and address pins together, while also providing the needed 5V level shifting. I opted to keep that shield for T4.0 (with the help of some jumper wires to connect the right Teensy pins). See this Github for the code (with comments) and jumper wire details.


hello.

Could you post the schematic of the board you use to connect the Teeny 4.0 to the LED panels, and the wire modifications you've added?
I would like to test it, but developing my own PCB with the circuit ready to connect, without using the original board with additional cables.

It seems to me from the photos I have seen, that on that adapter plate there is a 74HCT374 and a 74HCT245. I assume the 74HCT245 is used as a 3.3v to 5v signal adapter, and the HCT374 to save multiple GPIOs on the Teensy 4.0.

Regards
 
Last edited:
LuisHS, the schematic for the shield is in "SmartLEDShield_V4_sch.pdf" here: https://github.com/pixelmatix/SmartMatrix/blob/master/extras/hardware
It uses the 74x245 as a level converter and it uses the 74x374 to multiplex the RGB and address signals. It also has a 74x1G08 as a level converter for one more signal (BUFFER_OE). The "APA102 Buffers" part of the shield is not used.

I am using the following T4.0 pins:
Code:
  /* Basic pin setup */
  pinMode(10, OUTPUT); // FlexIO2:0 = GPIO_B0_00 - BUFFER_CLK, wire to pin 14
  pinMode(12, OUTPUT); // FlexIO2:1 = GPIO_B0_01 - BUFFER_R1, wire to pin 2
  pinMode(11, OUTPUT); // FlexIO2:2 = GPIO_B0_02 - BUFFER_B2, wire to pin 20
  pinMode(6, OUTPUT); // FlexIO2:10 = GPIO_B0_10 - BUFFER_B1
  pinMode(9, OUTPUT); // FlexIO2:11 = GPIO_B0_11 - BUFFER_R2, wire to pin 21
  pinMode(32, OUTPUT); //FlexIO2:12 = GPIO_B0_12 - BUFFER_G1, wire to pin 5
  pinMode(8, OUTPUT); // FlexIO2:16 = GPIO_B1_00 - BUFFER_G2
  pinMode(4, OUTPUT); // FlexPWM2_0:A = EMC_06 - BUFFER_OE
  pinMode(33, OUTPUT); // FlexPWM2_0:B = EMC_07 - BUFFER_LATCH, wire to pin 3
I used jumper wires to connect pin 10 to pin 14, pin 12 to pin 2, pin 11 to pin 20, pin 9 to pin 21, pin 32 to pin 5, and pin 33 to pin 3. By the way, pins 32 and 33 are on the underside of the T4.0 board, so you need to either solder wires or use pogo pins to connect to those (at least until T4.1 is released). There might be other options for you if you are designing a custom board and don't need to keep certain pins free.
 
LuisHS, the schematic for the shield is in "SmartLEDShield_V4_sch.pdf" here: https://github.com/pixelmatix/SmartMatrix/blob/master/extras/hardware
It uses the 74x245 as a level converter and it uses the 74x374 to multiplex the RGB and address signals. It also has a 74x1G08 as a level converter for one more signal (BUFFER_OE). The "APA102 Buffers" part of the shield is not used.

I am using the following T4.0 pins:
Code:
  /* Basic pin setup */
  pinMode(10, OUTPUT); // FlexIO2:0 = GPIO_B0_00 - BUFFER_CLK, wire to pin 14
  pinMode(12, OUTPUT); // FlexIO2:1 = GPIO_B0_01 - BUFFER_R1, wire to pin 2
  pinMode(11, OUTPUT); // FlexIO2:2 = GPIO_B0_02 - BUFFER_B2, wire to pin 20
  pinMode(6, OUTPUT); // FlexIO2:10 = GPIO_B0_10 - BUFFER_B1
  pinMode(9, OUTPUT); // FlexIO2:11 = GPIO_B0_11 - BUFFER_R2, wire to pin 21
  pinMode(32, OUTPUT); //FlexIO2:12 = GPIO_B0_12 - BUFFER_G1, wire to pin 5
  pinMode(8, OUTPUT); // FlexIO2:16 = GPIO_B1_00 - BUFFER_G2
  pinMode(4, OUTPUT); // FlexPWM2_0:A = EMC_06 - BUFFER_OE
  pinMode(33, OUTPUT); // FlexPWM2_0:B = EMC_07 - BUFFER_LATCH, wire to pin 3
I used jumper wires to connect pin 10 to pin 14, pin 12 to pin 2, pin 11 to pin 20, pin 9 to pin 21, pin 32 to pin 5, and pin 33 to pin 3. By the way, pins 32 and 33 are on the underside of the T4.0 board, so you need to either solder wires or use pogo pins to connect to those (at least until T4.1 is released). There might be other options for you if you are designing a custom board and don't need to keep certain pins free.

Thank you very much esasone, I already found it.
I'm going to design a PCB with the modifications so I don't have to connect external cables. I want to test it with a Teensy 4.0, but my idea would be to make a complete board with the RT1062 microcontroller and additional chips, so I don't have to solder any cables.

And if I could replace the Teensy's RT1062 BGA with an RT1020 LQFP144, it would be ideal for me. Although I see this as more complex, because of the Teensy libraries for the Arduino IDE, except that I only need to modify the clock signal of the microcontroller (500Mhz instead of 600Mhz).

To do tests with Teensy 4.0, it will be very useful, I have several boards here.

Making a complete board with the included RT1062 microcontroller (or RT1020 if possible) would also allow uploading of signed firmware images, to protect the code from copying. I think Teensy 4.0 does not yet support uploading encrypted images of the code for use in commercial applications.

Do you know if with the modifications you have made by connecting additional cables, using Flexio, we can still use a micro SD card by SDIO with Teensy 4.0?. I don't know if there can be any conflict between Flexio and SDIO ports for SD card.
 
Last edited:
There aren't many examples of FlexIO out there, but I was inspired by Ward's TriantaduoWS2811 demo to give this a try. I can explain a bit more about how it works if anyone is interested.

Yes, could you please explain in more detail how it works?

Next steps are to implement SmartMatrix's remaining features (such as the more sophisticated buffering, and the ability to define a screen made up of multiple matrices tiled together) and get this working with the other components of the library.

Waiting for your new updates, congratulations on your work.
 
.

I have checked the changes in the connections of your modifications, and the only problem that I find for my application, is that using pins 10, 11 and 12, SPI1 is lost, and the pins of SPI2 are shared with SDIO in order to use micro SD cards. A problem for me, because I need an SPI port to receive data, using DMA, for the images to be displayed.

Wouldn't it be possible to use other pins, to have the SPI1 port free?


I have looked in the datasheet of the RT1062, it seems to have four SPI ports, but still does not seem to have a free SPI port in Teensy 4.0.

SPI1, shared pins with SDIO for SD cards
SPI2, pins used by the external QSPI for boot
SPI3, SDI, SCK and CS available, but SDO (GPIO_AD_B0_01) is not available in Teensy 4.0
SPI4, use pins 10, 11 and 12 of your modification

spi_t4.jpg
 
Last edited:
Assuming you are using a custom circuit instead of the shield, there's a small amount of flexibility. We still need to use at least two of the pins 10-13 because of the limited number of FlexIO2 pins. (There are only 9 pins available on FlexIO2, and we need 7, so only two of those four pins can be left free.) By the way, FlexIO1 only has 5 pins, and FlexIO3 is not DMA accessible. However, it may be possible to emulate SPI using FlexIO1 - I think Kurt's FlexIO library has a demonstration of that.
 
Actually, you are fine. Check pages 310-311 of the reference manual. LPSPI3 can be muxed to pins 0, 1, 26, and 27.
 
Actually, you are fine. Check pages 310-311 of the reference manual. LPSPI3 can be muxed to pins 0, 1, 26, and 27.

Thank you, I had not noticed the possibility of changing the pin assignment for SPI with Muxing options.
So can I assign each LPSPI3 pin to a different Muxing mode ?, CS and SIN to ALT7, SCK and SOUT to ALT2.

What I do not know is how complicated it will be to use SPI3 with Teensy libraries, I only used them with SPI1 and SPI2, for a project that needed to receive data by one of the SPIs, using DMA, and to connect to an Ethernet controller by the another SPI.
 
@easone - Again great stuff, at some point would like to setup some form of example of the parallel access stuff in the Flex IO library that I tinker with.

@LuisHS - SPI library, should have all of the T4 (and hopefully T4.1 pins) defined in the hardware tables. It was unfortunate that during the T4 Beta, we did not catch on the card that the SPI1 functions on pins 0 and 1 were not marked on the card (MISO1 on 1 CS1 on 0). By the pin definitions for T4.1, there will be other pins for these functions.

As with all Teensy boards using SPI, you can use things like SPI1.setMISO(1); - Which is the default anyway.

On T4.x boards the hardware LPSPI objects map like:
LPSPI4 -> Arduino SPI object
LPSPI3 -> SPI1 object
LPSPI1 -> SPI2 object.

There is no SPI3 object. Note neither T4 and the to be released T4.1 exposed all of the pins needed to of the hardware LPSPI2 device so we did not create an Arduino SPI object for them. If someone creates their own custom T4.x that obviously could be possible.
 
As with all Teensy boards using SPI, you can use things like SPI1.setMISO(1); - Which is the default anyway.

On T4.x boards the hardware LPSPI objects map like:
LPSPI4 -> Arduino SPI object
LPSPI3 -> SPI1 object
LPSPI1 -> SPI2 object.

There is no SPI3 object. Note neither T4 and the to be released T4.1 exposed all of the pins needed to of the hardware LPSPI2 device so we did not create an Arduino SPI object for them. If someone creates their own custom T4.x that obviously could be possible.


OK thanks.
So LPSPI3 is SPI1 for the Teeny 4.0 source code?

Can I assign these pins to use LPSPI3?
SPI1.setMOSI (26);
SPI1.setMISO (1);
SPI1.setSCK (27);

What about PCS0, how can I assign the Chip Select to pin 0?
LPSPI4 is available as SPI0 ?


LSPI3_T4.jpg
 
Last edited:
Again these are the default pins for SPI1, so you don't need to call setMOSI.. As that is what will be used when you do SPI1.begin...

As with T3.x (and TLC) - you can setup a hardware CS pin you can use the setCS(pin) member method, which in T3.x will return a bitmask of which bit it is and with T4 (very recently released version) will return an index (older ones always returned a 1). With T4 we only exported one CS pin per SPI... The T4.1 pins that were defined do have some with multiple CS pins. A few of them are simply alternate ones to same index, but some have now different indexes... Will be interesting to see how all it works when it is released.

Like the other pins like MOSI, MISO... there are methods you can ask if a pin is valid to be used for something.
like: if (SPI1.pinIsMOSI(mosi_pin) && SPI1.pinIsSCK(sck_pin) && SPI1.pinisMISO(miso_pin)) {_spi = &SPI1;}
(how some of our libraries allow you to specify pins and automatically figure out which SPI port to use).

So again there is a pinIsChipSelect(cs)


CS - As with all Teensy boards hardware CS pins when used with the SPI library for the most part is no different than when you use non hardware CS pins. Most libraries simply do digitalWrite like functions to control CS. There are some libraries for example ili9341_t3n (_t3 only works with SPI where as my _t3n works with all SPI busses)... And with T4 you can for example optionally use it as the DC pin and maybe get a slight performance gain.

Sorry I know this part is sort of outside the scope of this thread. But should mention that the SPI implementations are considerably different between T3.x and T4s. With T3.x the PUSHR register has parts in the upper bits of the register which can control up to 4 different CS pins as part of that operation. With T4, the TDR register is 32 bits of data and does not have this exactly however the TCR register which also puts stuff in their FIFO does have stuff to control CS pins, but instead of a mask of which pins (maybe more than one) to update, it has an index of which one to use...

However the one place where set using a hardware CS pin is required is if you are trying to implement an SPI Slave. There is another thread talking about SPI Slave for T4 and a sketch I posted that had SPI talking to SPI1 with one being master and the other slave...
 
I used jumper wires to connect pin 10 to pin 14, pin 12 to pin 2, pin 11 to pin 20, pin 9 to pin 21, pin 32 to pin 5, and pin 33 to pin 3. By the way, pins 32 and 33 are on the underside of the T4.0 board, so you need to either solder wires or use pogo pins to connect to those (at least until T4.1 is released). There might be other options for you if you are designing a custom board and don't need to keep certain pins free.


So does this mean that pins 14, 2, 20, 21, 5 and 3 are free for other uses?

In my application, in addition to using the LPSPI3 to receive data with DMA, I need 3 other free ports to activate interrupts with synchronization signals of received images (line start, line break, frame start).

Since I'm going to design my own PCB, I will route the tracks to the ports that really need to be used, instead of connecting external jumpers over the SmartLEDShield_V4.

I suppose that although in your application with the shield, you are connecting ports of the original design with new ports, you really only need the new ports, although these are connected to the previous ports that would no longer be used in your source code.
 
LuisHS - yes, that's correct.

UPDATE: the sketch has been updated to support multi-panel displays, and now uses a software interrupt to refill the matrix row buffer continuously. It now runs way, way faster! The updates are committed to the github.

I did some testing to see how high I could push the refresh rate at various resolutions and color depths. In many cases, the limiting factor was the speed of the hardware interface with the matrix, which impacts the panel brightness at high refresh rates. The following refresh rates were stable and free of glitches on the array I have available (64x128):

ResolutionRGB111RGB333RGB555RGB88830-bit36-bit48-bit
32x641200067003700200015001200860
64x646600360019001100840670460
64x128320018001000560430340240
128x1281500910510290220170120
128x2567704602601401108760
256x256390230120735541
A few notes:
  • For resolutions higher than 64x128, obviously I was not able to check the whole display for glitches, only the initial panels.
  • RGB555 color is adequate for most purposes; RGB888 is good, except there is data loss in dark areas; 30/36 bit are better at reproducing darks. I don't see any improvement going to 48 bit.
  • Refresh rates below 100-120 FPS have bad flickering and aren't practical.
  • I don't know if there will be interference with other stuff that uses DMA (SD, USB, serial, etc.), but at least it looks like the processing power of the T4 is not a limiting factor.
 
LuisHS - yes, that's correct.
UPDATE: the sketch has been updated to support multi-panel displays, and now uses a software interrupt to refill the matrix row buffer continuously. It now runs way, way faster! The updates are committed to the github.


Great, very good that update.
Are you going to add the swapBuffers() method soon?

To work with dynamic content, such as video animations, instead of a single image, it is necessary. In this way you would already have full compatibility with SmartMatrix, to replace complete applications from Teensy 3.6 to Teensy 4.0.

I want to test your code with resolutions of 128x32 and 192x64, in applications that now work with Teensy 3.6 and SmartMatrix.



I have tried to compile your new update and it gives me these errors.

ledMatrixDemo_t4: 86: error: 'FLASHMEMvoid' does not name a type
* FLASHMEM void setup () {

ledMatrixDemo_t4:124: error: 'FLASHMEMvoid' does not name a type
FLASHMEM void loadTestImage(uint16_t wd, uint16_t ht) {

ledMatrixDemo_t4:142: error: 'FLASHMEMvoid' does not name a type
FLASHMEM void calculateTimerLut(void) {

ledMatrixDemo_t4:329: error: 'FLASHMEMvoid' does not name a type
FLASHMEM void flexPWMSetup(void) {

ledMatrixDemo_t4:365: error: 'FLASHMEMvoid' does not name a type
FLASHMEM void dmaSetup() {

What could be the problem ?.
If I remove FLASHMEM on all these functions it compiles without error.
 
Last edited:
What version of Teensyduino are you using? FLASHMEM was added in 1.49. Without it, there is no performance impact but there will be increased memory utilization.
 
What version of Teensyduino are you using? FLASHMEM was added in 1.49. Without it, there is no performance impact but there will be increased memory utilization.


OK, I am using 1.47 Bet4, I will update to the latest version. Thank you.
 
Back
Top