[solved] Teensy 3.2 SPI speed decreasing over time

dimitre · Apr 15, 2024

I'm testing some new LED strips (HD108) with Teensy 3.2 and I'm noticing some unusual
Animations begin very smooth and beautiful, over time (after one minute) we can notice looking at the LEDs it gradually starts decreasing frame rate.
Resetting board in reset button restores smoothness until it slow downs again.
I'm wondering if I'm missing something in the code, or maybe it is something else
I'm using latest Platformio in macOS

Code:

#include <Arduino.h>
#include <SPI.h>
void setup() {
    // SCK 13, MOSI 11
    SPI.begin();
}
void loop() {
    SPI.beginTransaction(SPISettings(40000000, MSBFIRST, SPI_MODE0));
    // Start frame
    for (int i = 0; i < 8; i++) {
        SPI.transfer16(0x0000);
    }
    int numLeds = 60 * 2;
    for (int i=0; i< numLeds; i++) {
        SPI.transfer16(0b1000110001100011); // bit 1 + (1)(5bit)(5bit)(5bit) brightnesses
        float t = (float)millis()/610.0f + (float)i * 0.3;
        float max = 5000;
        uint16_t r = (sin(t + 60.0 * 0.0) * max + max);
        uint16_t g = (sin(t + 60.0 * 1.0) * max + max);
        uint16_t b = (sin(t + 60.0 * 2.0) * max + max);
        SPI.transfer16(g);
        SPI.transfer16(r);
        SPI.transfer16(b);
    }
    // End Frame
    for (int i = 0; i < (numLeds/8); i++) {
        SPI.transfer(0xFF);
    }
    SPI.endTransaction();
    delay(10);
}

PaulStoffregen · Apr 15, 2024

My guess is the sin() function. Its input is in radian units, so ideally you should be giving it numbers between 0 to 6.28 (or whatever Pi * 2 is).

But after your program has run for only 4 seconds, even for the first LED (i == 0) the numbers you give to sin() become larger than 6.28. It (probably) has to work harder and might be using some rather inefficient algorithm to avoid tiny round-off errors from accumulating as you get to numerical ranges vastly more than the 6.28 range.

I'd try adding code to constrain your angle from 0 to 2*PI. The function modff() can very efficiently split a float number into integer and fractional parts. If you just scale from randians to 0 - 1.0, you can use the really quick modff(). The code might look something like this:

Code:

float angle = t + 60.0 * 2.0;
angle = angle / (PI * 2.0);  // convert from radians to percent
float ipart;
angle = modff(angle, &ipart); // extract fraction and discard integer
angle = angle * (PI * 2.0); // convert back to radians

I did just make this up right now, so it if has syntax errors or other issues, please understand it's only "pseudo code" meant to explain the idea. This approach might give slight round-off errors which probably aren't visible with LED animations where you end up converting to 8 bit intensity, but could wreck havoc on highly demanding applications.

You might also try using sinf() rather than sin(). It should run much faster, since sin() does all the work as 64 bit float even if you give it only 32 bit input. The sinf() function should run much faster. But it too could slow down. You probably still need to constrain the input to 1 radian unit.

dimitre · Apr 15, 2024

Excellent, thank you Paul

joepasquariello · Apr 16, 2024

PaulStoffregen said:
If you just scale from randians to 0 - 1.0, you can use the really quick modff(). The code might look something like this:

Code:

float angle = t + 60.0 * 2.0; angle = angle / (PI * 2.0); // convert from radians to percent float ipart; angle = modff(angle, &ipart); // extract fraction and discard integer angle = angle * (PI * 2.0); // convert back to radians

Nice catch, Paul. I looked at the question and never thought of that issue with sin(). Since he only needs the fractional part, I think the code can be simplified by using fmodf() rather than modff().

Code:

float angle = t + 60.0 * 2.0;      // radians
angle = fmodf( angle, PI * 2.0 );  // fractional revolution in radians (0 - 2*PI)

MarkT · Apr 16, 2024

Code:

        uint16_t r = (sin(t + 60.0 * 0.0) * max + max);
        uint16_t g = (sin(t + 60.0 * 1.0) * max + max);
        uint16_t b = (sin(t + 60.0 * 2.0) * max + max);

Suggests an assumption that sin takes degrees rather than radians, so there are probably other issues to solve too.

I'd recommend using an integer phase accumulator rather than an angle that grows without limit, since this wraps around naturally.

dimitre · Apr 16, 2024

Thanks. I've updated the code with Paul suggestions, and converted my RGB phasing to radians, so now code is working great
I've used std::modf instead of modff, to use radians directly

Code:

#include <Arduino.h>
#include <SPI.h>
float twopi = PI * 2.0;
void setup() {
    delay(500);
    SPI.begin(); // SCK 13, MOSI 11
}
void loop() {
    SPI.beginTransaction(SPISettings(40000000, MSBFIRST, SPI_MODE0));
    // Start frame
    for (int i = 0; i < 8; i++) {
        SPI.transfer16(0x0000);
    }
    int numLeds = 60 * 2;
    float tempo = (float)millis()*0.003f;
    float max = 8000; // max 32000 and some
    for (int i=0; i< numLeds; i++) {
        SPI.transfer16(0b1000110001100011); // bit 1 + (1)(5bit)(5bit)(5bit) brightnesses
        // SPI.transfer16(0b1111111111111111); // bit 1 + (1)(5bit)(5bit)(5bit) brightnesses
        float t { tempo + (float)i * 0.6 };
        float tr { t };
        float tg { t + twopi*.33 };
        float tb { t + twopi*.66 };
        tr = std::fmod(tr, twopi);
        tg = std::fmod(tg, twopi);
        tb = std::fmod(tb, twopi);
        uint16_t r = (std::sin(tr) * max + max);
        uint16_t g = (std::sin(tg) * max + max);
        uint16_t b = (std::sin(tb) * max + max);
        SPI.transfer16(g);
        SPI.transfer16(r);
        SPI.transfer16(b);
    }
    // End Frame
    for (int i = 0; i < (numLeds/8); i++) {
        SPI.transfer(0xFF);
    }
    SPI.endTransaction();
    // delay(10);
}

PaulStoffregen · Apr 16, 2024

You might consider this line when millis() grows to more than 23 bits (the size of float mantissa), which takes a little over 2 hours.

Code:

    float tempo = (float)millis()*0.003f;

You'll gradually lose millis precision. Maybe that matters, or maybe not?

Since you only run this once per LED update, probably no worry about just using 64 bit double, like this.

Code:

    const double mult = 0.003;
    float tempo = (double)millis()*mult;

On Teesny 3.2 the gcc option for 32 bit float constants is used, so use this approach to get constant that really is 64 bit double.

Of course when 32 bit millis() wraps back to 0, you'll get 1 update that suddenly "jumps". But that takes over 49 days.

dimitre · Apr 17, 2024

Thank you, great observations Paul!
The intent now is just testing communication with this 16bit led strip, it is beautiful
smooth color passages even with low brightness.

PaulStoffregen · Apr 17, 2024

Any chance you might share the project or at least photos? Would probably be useful for others. And even if you're not looking to share the code and circuit details, just photos (or a cell phone video uploaded to youtube) and basic info could make for a pretty interesting blog article.

dimitre · Apr 17, 2024

Sure, I'll be sharing some videos soon! it is just a test to communicate with the LEDs.
So it is a simple SPI connection on a protoboard.
my real aim is use a 24v addressable LED chip (16 bits also) to drive some power mosfets with high PWM and drive some long analog led strips in cascade. or magnets haha

mborgerson · Apr 22, 2024

MarkT said:
Code:

uint16_t r = (sin(t + 60.0 * 0.0) * max + max); uint16_t g = (sin(t + 60.0 * 1.0) * max + max); uint16_t b = (sin(t + 60.0 * 2.0) * max + max);

Suggests an assumption that sin takes degrees rather than radians, so there are probably other issues to solve too.

I'd recommend using an integer phase accumulator rather than an angle that grows without limit, since this wraps around naturally.

After many months in T4.X land, I'm now embarked on a Teensy 3.2 project. I have to keep reminding myself that the T3.2 DOES NOT have a hardware floating point unit---much less one that does double-precision calculations with a 600MHz system clock. The thought of doing floating-point sine calculations in real time with only software gives me the shivers!

There have been useful suggestions about how to keep the calculations running at the original speed, but I would be tempted to use Excel or Matlab to generate one or more tables of integer sine values to be stored in const (flash) memory to save valuable data RAM. You can then step through the table(s) to get your output values without doing floating point calculations at run time. How well this will work will depend on your needed update rates and the range of frequencies you need to generate. If the tables have 16-bit outputs for your brightest values, dimming is a matter of a single long integer multiply and divide.

joepasquariello · Apr 22, 2024

mborgerson said:
I would be tempted to use Excel or Matlab to generate one or more tables of integer sine values to be stored in const (flash) memory to save valuable data RAM. You can then step through the table(s) to get your output values without doing floating point calculations at run time. How well this will work will depend on your needed update rates and the range of frequencies you need to generate.

The sketch below shows a fixed-point sine/cosine lookup that I think is well-suited to the OP's program, which doesn't need high precision on angles or intensity. Angles are scaled 0..2PI and sines/cosines are scaled -1..1 over the 16-bit range. The angle values are naturally periodic, so there is no need for fmod() or equivalent. The OP's "tempo" variable (millis()*0.003) reaches 2PI in about 2.09 seconds, so I approximated that as 2.048 seconds to take advantage of powers of 2, and likewise the original "max = 8000" scaling is replaced with 8192. You can switch between the fixed-point and original float implementation for comparison. This should be very fast on T3.2

Code:

/******************************************************************************
* SINCOS.C        Fixed-point SIN and COS table lookup and interpolation.
*
* angles are periodic and scaled 0..65536 = 0..2*PI
* sin/cos are periodic and scaled -32768..32768 = -1..1
******************************************************************************/
#define TABLE_SIZE      256                     // opt size for 16 bits
#define SHIFT           8                       // 16 - log2(TABLE_SIZE)
#define BITE            (65536/TABLE_SIZE)      // step size
#define MASK            (BITE-1)                // MASK -> fast modulo

static const int16_t table[TABLE_SIZE+1] = {
       0,     803,    1607,    2410,    3211,    4010,    4807,    5601,
    6392,    7179,    7961,    8739,    9511,   10278,   11038,   11792,
   12539,   13278,   14009,   14732,   15446,   16150,   16845,   17530,
   18204,   18867,   19519,   20159,   20787,   21402,   22005,   22594,
   23169,   23731,   24278,   24811,   25329,   25832,   26319,   26790,
   27245,   27683,   28105,   28510,   28898,   29268,   29621,   29956,
   30273,   30571,   30852,   31113,   31356,   31580,   31785,   31971,
   32137,   32285,   32412,   32521,   32609,   32678,   32728,   32757,
   32767,   32757,   32728,   32678,   32609,   32521,   32412,   32285,
   32137,   31971,   31785,   31580,   31356,   31113,   30852,   30571,
   30273,   29956,   29621,   29268,   28898,   28510,   28105,   27683,
   27245,   26790,   26319,   25832,   25329,   24811,   24278,   23731,
   23169,   22594,   22005,   21402,   20787,   20159,   19519,   18867,
   18204,   17530,   16845,   16150,   15446,   14732,   14009,   13278,
   12539,   11792,   11038,   10278,    9511,    8739,    7961,    7179,
    6392,    5601,    4807,    4010,    3211,    2410,    1607,     803,
       0,    -804,   -1608,   -2411,   -3212,   -4011,   -4808,   -5602,
   -6393,   -7180,   -7962,   -8740,   -9512,  -10279,  -11039,  -11793,
  -12540,  -13279,  -14010,  -14733,  -15447,  -16151,  -16846,  -17531,
  -18205,  -18868,  -19520,  -20160,  -20788,  -21403,  -22006,  -22595,
  -23170,  -23732,  -24279,  -24812,  -25330,  -25833,  -26320,  -26791,
  -27246,  -27684,  -28106,  -28511,  -28899,  -29269,  -29622,  -29957,
  -30274,  -30572,  -30853,  -31114,  -31357,  -31581,  -31786,  -31972,
  -32138,  -32286,  -32413,  -32522,  -32610,  -32679,  -32729,  -32758,
  -32768,  -32758,  -32729,  -32679,  -32610,  -32522,  -32413,  -32286,
  -32138,  -31972,  -31786,  -31581,  -31357,  -31114,  -30853,  -30572,
  -30274,  -29957,  -29622,  -29269,  -28899,  -28511,  -28106,  -27684,
  -27246,  -26791,  -26320,  -25833,  -25330,  -24812,  -24279,  -23732,
  -23170,  -22595,  -22006,  -21403,  -20788,  -20160,  -19520,  -18868,
  -18205,  -17531,  -16846,  -16151,  -15447,  -14733,  -14010,  -13279,
  -12540,  -11793,  -11039,  -10279,   -9512,   -8740,   -7962,   -7180,
   -6393,   -5602,   -4808,   -4011,   -3212,   -2411,   -1608,    -804,
       0,
};

/*****************************************************************************
* sine( ang )   Return SIN of argument
*****************************************************************************/
int16_t sine( uint16_t ang )
{
  uint16_t i = ang >> SHIFT;
  return( table[i] + (((table[i+1]-table[i]) * (ang & MASK)) >> SHIFT) );
}

/*****************************************************************************
* cosine( ang ) Return COS of argument using cos(x) = sin(x + PI/2)
*****************************************************************************/
int16_t cosine( uint16_t ang )
{
  return sine( ang + 0x4000 );
}

elapsedMillis ms;
uint32_t count;

void setup() {
    Serial.begin( 9600 );
    while (!Serial && millis() < 2000) {}
    ms = 0;
}

#define FIXED_POINT (1) // set to 0 to use float

void loop() {
#if (FIXED_POINT)
    uint32_t twopi = 65536;
    uint16_t t, tr, tg, tb;
    uint16_t r, g, b;
    int numLeds = 1; // 60 * 2;
    uint16_t tempo = (ms % 2048) * 32; // 0..2PI in 2.048 seconds
    for (int i=0; i<numLeds; i++) {
        t  = tempo + i * (uint16_t)(0.6*(65536/(2*M_PI)));
        tr = t + (65536*0)/3;
        tg = t + (65536*1)/3;
        tb = t + (65536*2)/3;
        r  = sine(tr)/4 + 32768/4;
        g  = sine(tg)/4 + 32768/4;
        b  = sine(tb)/4 + 32768/4;
    }
    Serial.printf( "%10lu  %5hu  %5hu  %5hu  %5hu  %5hu  %5hu\n",
                        count, tr, tg, tb, r, g, b );
#else
    float twopi = PI * 2.0;
    float t, tr, tg, tb;
    uint16_t r, g, b;
    int numLeds = 1; // 60 * 2;
    float tempo = (float)ms * 0.003f;
    float max = 8000; // max 32000 and some
    for (int i=0; i<numLeds; i++) {
        t  = tempo + (float)i * 0.6;
        tr = t;
        tg = t + twopi*.33;
        tb = t + twopi*.66;
        tr = std::fmod(tr, twopi);
        tg = std::fmod(tg, twopi);
        tb = std::fmod(tb, twopi);
        r = (std::sin(tr) * max + max);
        g = (std::sin(tg) * max + max);
        b = (std::sin(tb) * max + max);
    }
    Serial.printf( "%10lu  %1.2lf  %1.2lf  %1.2lf  %5hu  %5hu  %5hu\n",
                        count, tr, tg, tb, r, g, b );
#endif
    count++;
    delay(10);
}

PaulS · Apr 23, 2024

In case someone is looking for generating sin/cos lookup tables, here is a link to an online generator: Dr LUT.

Paul

[solved] Teensy 3.2 SPI speed decreasing over time

dimitre

Well-known member

PaulStoffregen

Well-known member

dimitre

Well-known member

joepasquariello

Well-known member

MarkT

Well-known member

dimitre

Well-known member

PaulStoffregen

Well-known member

dimitre

Well-known member

PaulStoffregen

Well-known member

dimitre

Well-known member

mborgerson

Well-known member

joepasquariello

Well-known member

PaulS

Well-known member