FastLED with Teensy 4.1, fast parallel / DMA output on any pin

spolsky

Member
Paul's current github version* of OctoWS2811, when used with Teensy 4.x, can use any pin to drive LED strips in parallel and it's all shockingly fast.

Here's an example of how to set that up using 8 pins:

Code:
  const int numPins = 8;
  byte pinList[numPins] = {33, 34, 35, 36, 37, 38, 39, 40};
  const int ledsPerStrip = 300;
  CRGB rgbarray[numPins * ledsPerStrip];

  // These buffers need to be large enough for all the pixels.
  // The total number of pixels is "ledsPerStrip * numPins".
  // Each pixel needs 3 bytes, so multiply by 3.  An "int" is
  // 4 bytes, so divide by 4.  The array is created using "int"
  // so the compiler will align it to 32 bit memory.
  DMAMEM int displayMemory[ledsPerStrip * numPins * 3 / 4];
  int drawingMemory[ledsPerStrip * numPins * 3 / 4];
  OctoWS2811 octo(ledsPerStrip, displayMemory, drawingMemory, WS2811_RGB | WS2811_800kHz, numPins, pinList);

If you would like to use FastLED to get all fancy FastLED features, you can hook it up by creating a custom FastLED "controller" which just sends the bits to OctoWS2811 for display. This is all incredibly fast especially on a Teensy 4.x.

To create the custom controller:

Code:
#include <OctoWS2811.h>
#include <FastLED.h>
#include <Arduino.h>
#include <Util.h>

template <EOrder RGB_ORDER = RGB,
          uint8_t CHIP = WS2811_800kHz>
class CTeensy4Controller : public CPixelLEDController<RGB_ORDER, 8, 0xFF>
{
    OctoWS2811 *pocto;

public:
    CTeensy4Controller(OctoWS2811 *_pocto)
        : pocto(_pocto){};

    virtual void init() {}
    virtual void showPixels(PixelController<RGB_ORDER, 8, 0xFF> &pixels)
    {

        uint32_t i = 0;
        while (pixels.has(1))
        {
            uint8_t r = pixels.loadAndScale0();
            uint8_t g = pixels.loadAndScale1();
            uint8_t b = pixels.loadAndScale2();
            pocto->setPixel(i++, r, g, b);
            pixels.stepDithering();
            pixels.advanceData();
        }

        pocto->show();
    }
};

To use this:

Code:
  CTeensy4Controller<RGB, WS2811_800kHz> *pcontroller;

  void setup()
  {
    octo.begin();
    pcontroller = new CTeensy4Controller<RGB, WS2811_800kHz>(&octo);

    FastLED.setBrightness(255);
    FastLED.addLeds(pcontroller, rgbarray, numPins * ledsPerStrip);
  }

Enjoy!

Joel

--
* Note: this version of the library is not available through the arduino or platformio library managers yet. For now get it from GitHub.
 
For some background, what exactly is the difference between FastLED and OctoWS2811? I keep seeing FastLED mentioned but have no idea what it's for.

I assume it plugs into OctoWS2811 at a high level of abstraction because the RGB order in that code would be wrong for talking to the LEDs directly.
 
OctoWS2811 is a library for Teensy that sends pixel data to WS2812b-type LED strips rapidly, taking advantage of direct memory access (DMA) and parallel output on multiple pins where available. On Teensy 3.2 it can use eight pins (thus the name). On Teensy 4.x it can use any and all pins.

FastLED is a general library that works for many different kinds of chips and LEDs with an emphasis on speed. In the days of very slow Arduino Unos and so on, it was the fastest way to send pixel data to strips.

OctoWS2811 really only provides very low level functionality - setPixel and show(). FastLED has a bunch of nice functions for color adjustments, brightness, power consumption limiting, dithering, HSV conversions, and a lot of neat fast math functions. So you might want to get the best of both worlds.

There is no "canonical" rgb order for talking to arbitrary ws2812b pixels because the manufacturers often solder the green, red, and blue leds literally at random to the three DMA pins on a ws2811, so every library has to give the end user the ability to rearrange r, g, and b. You're right that my sample code leaves this as an exercise for the reader :)
 
That is interesting, as I have been writing code to do HSV and Kelvin conversions and the like. Does FastLED let you store color values as floats, and will it let you have negative numbers? I need both of those things.

When I wrote a RPi driver for NeoPixels, it was necessary to transmit in GRB order.
 
No to both. FastLED is from the old days of Arduino Uno... it only supports 24 bit RGB and has lots of integer optimizations (like, it provides very fast integer trig functions). It has some very cool optimizations, for example, using 0-255 for Hue instead of the traditional 0-360.

The Teensy 4.0 is probably the first microcontroller that is fast enough, especially with floating point support and tons of memory, to do some really nice graphics in real time.
 
Have you noticed any issue with firing pixels when using the fastLED library and the nativeEthernet library simultaneously? FastLED works great for me by itself, but as soon as I start Ethernet communication, only the first 33 nodes fire, and sporadically at that. I’m not using the octows2811 library, do you think that might fix the issue?
 
It sounds like the Ethernet library is firing an interrupt at the right interval to disrupt signal generation. OctoWS2811 uses DMA to talk to the LED arrays, so unless the bus is totally saturated (unlikely) it will not be, uh, interrupted by interrupts.
 
Pilot,

I now have a partial sketch running to test this, and OctoWS2811 does seem to be working much better than before. Thanks for the help.
 
Last week I switched from FastLED to using OctoWS2811 for the LED serial interface (thanks to @spolsky and others), and it definitely solved my problems with interrupts being missed. Since I wasn't using any of the FastLED effects (I roll my own) and was only using the HSV to RGB color space conversion function, I made the big decision to abandon FastLED completely. I had previously experimented by "cloning" the FastLED hsv2rgb code and extending it to 16-bit math, but this wasn't very satisfying: the code is not well documented and is very difficult to understand. Since the Teensy 4 is so fast and has a floating-point unit, I decided to write a new conversion function from the ground up, using the canonical algorithm as described on Wikipedia.

I generally store HSV as well as RGB values in 16-bit (8.8) fixed-point format, and then convert to 8-bit RGB just before outputting the data using OctoWS2811. That is, unless I'm using non-addressable LEDs driven by a 12-bit PWM driver chip (PCA9685). I convert the HSV16 data to floating point within the hsv2rgb conversion, as well as other functions that compute smooth gradients and fades. Although my conversion algorithm may not (yet) be quite as sophisticated as FastLED's, I have implemented gamma correction, color scaling (to equalize the max brightness across LED colors) and temporal dithering to somewhat reduce quantization error caused by the 8-bit LED data. And now I at least understand exactly what the code is doing.

I represent each HSV16 component as a uint16_t in the range 0x0000 - 0xFF00, with appropriate wrap-around for the Hue component. Note that I don't use the range 0xFF01 - 0xFFFF, as this would prevent correct rounding to an 8-bit value. When doing intermediate computations in floating point, the values are converted to the range 0.0 - 1.0. In the case of Hue, values are allowed to be temporarily negative until they are wrapped back into values less than 1.0.

So far it all seems to be working well, but more testing is needed. Since OctoWS2811 is using DMA to output the data in parallel with program execution, I can pretty confidently say that the frame rate will always be limited by the LED serial data rate and the number of LEDs. I'm current running a frame rate of 100 Hz (10ms) with 300 LEDs, and all of the effects/conversion functions take less than 1 ms!

I'm open to having other people review and/or use my code, as I'm sure it can be made better with more eyes on it.
 
I was using int16_t for awhile, but eventually switched to float as I wanted to be able to do more accurate brightness scaling and layering of effects. The possibility of storing negative numbers is there to facilitate certain effects as well. After spending an afternoon trying funny bit-shifting stuff to translate a float to an unsigned int, I discovered that casting a float to uint32_t on a Teensy 4.0 with a simple cast takes less than 20 nanoseconds, and that's with adding 0.5 so that actual rounding occurs rather than simple truncation. (I think the cast is using the same funny bit-shifting stuff behind the scenes.)

These are a couple of functions I tested. The first takes much longer (>30nS) because it's doing comparison and branching. The second takes <20nS, but is only safe when converting positive numbers.

All the register stuff seems to be superfluous, as removing that word doesn't change the benchmark output, suggesting that g++ is automatically optimizing to use register variables. However, I don't like things being accidentally right, as an implementation change in g++ could break that, so I have left in the word. The variables passed in also have to be declared as register.

Code:
inline int fastRound(register float f) {
    register float fRound = f > 0 ? f + 0.5 : f - 0.5;
    register int roundedInteger = (int)fRound;
    return roundedInteger;
}

inline unsigned int fastRoundUnsigned(register float f) {
    register float fRound = f + 0.5;
    register int roundedInteger = (int)fRound;
    register int final = roundedInteger >= 0 ? roundedInteger : 0;
    return final;
}

The output of this is bit-shifted into a uint32_t for each of R, G, B values, and this also takes hardly any time. (The first function is not used in translation because only positive values are meaningful. So, anything less than zero is clipped to zero. Negative numbers only have meaning during the composition of different colors to a single pixel.)

Hand-wringing over nanoseconds is probably a little silly for most projects, but it will matter if you are driving thousands of LEDs at anything like a smooth framerate.
 
Yes, I use floating point for similar computations, but generally store "steady state" HSV values in 8.8 fixed-point format. Other than that I decided to focus more on numerical accuracy and readability than on code optimization. The resulting code is running so fast that I'll probably (hopefully) never need to go back and optimize. I've been using type casts to go back and forth and that seems to be working great.
 
What is your motivation to use HSV over RGB? I store everything as RGB (floats) and if I need HSV I have some code that converts the inputs to RGB. Same with input in Kelvin. Almost every effect I do has only RGB.
 
What is your motivation to use HSV over RGB?

When I started working with LED lighting couple of years ago I read several articles and posts that explain that HSV was simpler and more intuitive than RGB in a number of situations. It's easy to produce color fades and gradients by manipulating a single variable (hue) while maintaining constant saturation and brightness. Same thing with brightness fades just by varying V. Also, it's easier to allow a user to select colors with a wheel/slider-based "color picker" rather than picking colors from a pre-defined palette of RGB-based colors. Most of my projects involve slowly-changing colors and brightness levels, and HSV has worked pretty well for that.
 
To create the custom controller:

Code:
#include <OctoWS2811.h>
#include <FastLED.h>
#include <Arduino.h>
#include <Util.h>

template <EOrder RGB_ORDER = RGB,
          uint8_t CHIP = WS2811_800kHz>
class CTeensy4Controller : public CPixelLEDController<RGB_ORDER, 8, 0xFF>
{
    OctoWS2811 *pocto;

public:
    CTeensy4Controller(OctoWS2811 *_pocto)
        : pocto(_pocto){};

    virtual void init() {}
    virtual void showPixels(PixelController<RGB_ORDER, 8, 0xFF> &pixels)
    {

        uint32_t i = 0;
        while (pixels.has(1))
        {
            uint8_t r = pixels.loadAndScale0();
            uint8_t g = pixels.loadAndScale1();
            uint8_t b = pixels.loadAndScale2();
            pocto->setPixel(i++, r, g, b);
            pixels.stepDithering();
            pixels.advanceData();
        }

        pocto->show();
    }
};

To use this:

Code:
  CTeensy4Controller<RGB, WS2811_800kHz> *pcontroller;

  void setup()
  {
    octo.begin();
    pcontroller = new CTeensy4Controller<RGB, WS2811_800kHz>(&octo);

    FastLED.setBrightness(255);
    FastLED.addLeds(pcontroller, rgbarray, numPins * ledsPerStrip);
  }

Enjoy!

Joel


Am I correct in thinking that this is for parallel output?

What if you have three different strips that you want to control separately?

What is the initialization procedure for multiple strips that are controlled independently?
 
This is the code you’re looking for. Parallel means each pin sends data simultaneously, not necessarily the same data. I use this for 32 independent strands of leds that output different data to each strand.
 
Ok, not to be dense, but:

Code:
FastLED.addLeds(pcontroller, rgbarray, numPins * ledsPerStrip)>(leds_strip, 0);

will then let me do all the fun things I am used to doing with FastLED on leds_strip like:

Code:
fill_solid(leds_strip, 100, CRGB::Blue);

?
 
Hi spolsky,

I'm new to teensy world, I am trying to to compile your code but I keep getting the following errors:

Code:
#include <OctoWS2811.h>
#include <FastLED.h>
#include <Arduino.h>
//#include <Utils.h>

const int numPins = 8;
byte pinList[numPins] = {33, 34, 35, 36, 37, 38, 39, 40};
const int ledsPerStrip = 300;
CRGB rgbarray[numPins * ledsPerStrip];

// These buffers need to be large enough for all the pixels.
// The total number of pixels is "ledsPerStrip * numPins".
// Each pixel needs 3 bytes, so multiply by 3.  An "int" is
// 4 bytes, so divide by 4.  The array is created using "int"
// so the compiler will align it to 32 bit memory.
DMAMEM int displayMemory[ledsPerStrip * numPins * 3 / 4];
int drawingMemory[ledsPerStrip * numPins * 3 / 4];
OctoWS2811 octo(ledsPerStrip, displayMemory, drawingMemory, WS2811_RGB | WS2811_800kHz, numPins, pinList);


template <EOrder RGB_ORDER = RGB,
          uint8_t CHIP = WS2811_800kHz>
class CTeensy4Controller : public CPixelLEDController<RGB_ORDER, 8, 0xFF>
{
    OctoWS2811 *pocto;

public:
    CTeensy4Controller(OctoWS2811 *_pocto)
        : pocto(_pocto){};

    virtual void init() {}
    virtual void showPixels(PixelController<RGB_ORDER, 8, 0xFF> &pixels)
    {

        uint32_t i = 0;
        while (pixels.has(1))
        {
            uint8_t r = pixels.loadAndScale0();
            uint8_t g = pixels.loadAndScale1();
            uint8_t b = pixels.loadAndScale2();
            pocto->setPixel(i++, r, g, b);
            pixels.stepDithering();
            pixels.advanceData();
        }

        pocto->show();
    }
};


CTeensy4Controller<RGB, WS2811_800kHz> *pcontroller;

void setup()
{
  //octo.begin();
  pcontroller = new CTeensy4Controller<RGB, WS2811_800kHz>(&octo);

  FastLED.setBrightness(255);
  FastLED.addLeds(pcontroller, rgbarray, numPins * ledsPerStrip);
}

Code:
\Documents\Arduino\teensy41-fastled\teensy41-fastled.ino:18:105: error: no matching function for call to 'OctoWS2811::OctoWS2811(const int&, int [1800], int [1800], int, const int&, byte [8])'
 OctoWS2811 octo(ledsPerStrip, displayMemory, drawingMemory, WS2811_RGB | WS2811_800kHz, numPins, pinList);
                                                                                                         ^
In file included from C:\Users\fbric\Documents\Arduino\teensy41-fastled\teensy41-fastled.ino:1:0:
\Documents\Arduino\libraries\OctoWS2811/OctoWS2811.h:84:2: note: candidate: OctoWS2811::OctoWS2811(uint32_t, void*, void*, uint8_t)
  OctoWS2811(uint32_t numPerStrip, void *frameBuf, void *drawBuf, uint8_t config = WS2811_GRB);
  ^
\Documents\Arduino\libraries\OctoWS2811/OctoWS2811.h:84:2: note:   candidate expects 4 arguments, 6 provided
In file included from C:\Users\fbric\Documents\Arduino\teensy41-fastled\teensy41-fastled.ino:1:0:
\Documents\Arduino\libraries\OctoWS2811/OctoWS2811.h:75:7: note: candidate: constexpr OctoWS2811::OctoWS2811(const OctoWS2811&)
 class OctoWS2811 {
       ^
\Documents\Arduino\libraries\OctoWS2811/OctoWS2811.h:75:7: note:   candidate expects 1 argument, 6 provided
\Documents\Arduino\libraries\OctoWS2811/OctoWS2811.h:75:7: note: candidate: constexpr OctoWS2811::OctoWS2811(OctoWS2811&&)
\Documents\Arduino\libraries\OctoWS2811/OctoWS2811.h:75:7: note:   candidate expects 1 argument, 6 provided

I am using the following library versions:

Using library OctoWS2811 at version 1.4 in folder: \Documents\Arduino\libraries\OctoWS2811
Using library FastLED at version 3.4.0 in folder:\Documents\Arduino\libraries\FastLED

Thanks for your help and your code :)

regards
 
Back
Top