Everything works fine then dramatically slows after about a minute

Status
Not open for further replies.

clfaye

Member
Hi - I am running an 8x32 matrix on a Teensy 3.2 using fastled in parallel. I'm displaying a sinusoid function. It works fine for about a minute and then slows to a framerate of about 2FPS - and keeps getting slower after that. Any ideas of what could be causing it?

Code:
#include "FastLED.h"
uint8_t Width  = 32;
uint8_t Height = 8;
 // y dimension  x=0, y=0 at lower left hand corner (pixel 8)

float speed = 1.0; // speed of the movement along the Lissajous curves
float size = 4;    // amplitude of the curves

// NUM_LEDS = Width * Height
#define NUM_LEDS_PER_STRIP 64
// Note: this can be 12 if you're using a teensy 3 and don't mind soldering the pads on the back
#define NUM_STRIPS 4
#define NUM_LEDS      256
#define BRIGHTNESS    100
#define FPS 100
#define FPS_DELAY 1000/FPS
CRGB leds[NUM_LEDS];

void setup() {
  LEDS.addLeds<WS2811_PORTD, NUM_STRIPS, GRB>(leds, NUM_LEDS_PER_STRIP);
  FastLED.setBrightness(BRIGHTNESS);
}

void loop() 
{
sinusoid();
  FastLED.show();
//  FastLED.delay(FPS_DELAY);
}

void sinusoid()
{
for (uint8_t y = 0; y < Height; y++) {
    for (uint8_t x = 0; x < Width; x++) {

      float cx = y + float(size * (sinf (float(speed * 0.003 * (millis() ))) ) ) - (Width/2);  // the 8 centers the middle on a 16x16
      float cy = x + float(size * (cosf (float(speed * 0.0022 * (millis()))) ) ) - (Height/2);
      float v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      uint8_t data = v;
      leds[XY(x, y)].r = data;

      cx = x + float(size * (sinf (speed * float(0.0021 * (millis()))) ) ) - (Width/2);
      cy = y + float(size * (cosf (speed * float(0.002 * (millis() ))) ) ) - (Height/2);
      v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      data = v;
      leds[XY(x, y)].b = data;

      cx = x + float(size * (sinf (speed * float(0.0041 * (millis() ))) ) ) - (Width/2);
      cy = y + float(size * (cosf (speed * float(0.0052 * (millis() ))) ) ) - (Height/2);
      v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      data = v;
      leds[XY(x, y)].g = data;

    }
  }
}
// Helper function that translates from x, y into an index into the LED array
uint16_t XY( uint8_t x, uint8_t y)
{
  uint16_t ledNum;
  if ( x & 0x01)
  {
    // Odd rows run backwards
    ledNum = ((x+1) * Height) - (y+1);
  }
  else
  {
    // Even rows run forwards
    ledNum = ((x * Height) + y);
  }


  return ledNum;
}
 
Last edited by a moderator:
Code is easier to read/scan if "Ctrl+F" formatted in IDE before posting within the CODE tag block - the "#" on the REPLY toolbar.
A delay of some sort in loop might help see what is going on. Not sure if calling too fast can cause issues? delay(10) is the easy thing - but only updating with an elapsedMillis variable lets the loop keep running.

Code:
elapsedMillis wLED;

void setup() {
  // ...
  wLED = 0;
}

void loop() 
{
  if ( wLED > 10 ) {
    wLED -= 10;
    sinusoid();
    FastLED.show();
    // FastLED.delay(FPS_DELAY);
  }
}
 
For something that gets slower and slower, I would be very suspicious of those formulas that use millis() in the calculation. I don't have an explanation as to why. Your formulas like for example :
cx = x + float(size * (sinf (speed * float(0.0021 * (millis()))) ) ) - (Width/2);
 
The floating point calcs with the rapidly increasing millis are my suspect as well. I tried converting all of the calcs to integers and using the sin8 approximation in fastled. I was about 30% successful. I converted over the first of the three calcs to integers only (which had no effect on speed). As I converted the second and third calcs things started going wrong with the colors. I tried for a few hours but couldn't get it. I think tonight I will just switch over to a 3.6, which should have no problem with the floating calcs and also was the teensy used in the example I'm basing this on. I'll let you know how it goes.
 
I'd suggest you narrow down the root cause by divide and conquer. If you just switch hardware, you might avoid the problem or you might delay it, but either way you won't learn what it was.

Comment out half the code. See if it still slows down. If so, comment half of what remains. If not, the problem is in something you commented, so uncomment half. Etc. It shouldn't take more than half a dozen tests to narrow it down to a single line or give you a clue about which function is slowing down.

If you decide to do this, you might as well grab the value of millis() once per iteration and stick it in a variable.
 
Sorry I may be missing something obvious, but what is the purpose of passing in millis() to these functions?

That is, if you look at the simple sub expressions like: (sinf (float(speed * 0.003 * (millis() )))
After something like 2 minutes, you will be taking the Sinf(360.0). I assume the sinf code does something to normalize this from 0-2 ... If my rusty math knowledge is correct... Wonder if it does by doing some remainder calculation or does it do repeated subtraction... If repeated subtraction, then each time it will take longer...

As for using sin8... I assume you did something to constrain the values passed in to be between 0-255 (i.e. the input theta is a uint8_t value...

Edit: forgot to mention, if it were me, I would also instrument the code, and in each pass I would probably
calculate a delta Time for the calculations and maybe print out a sum of these for each 100 times through the loop to see if they keep incrementing... I might actually have a sum for each of the calculations and maybe one for the LEDs show...
 
Nice to see the CODE # tag! It is readable now.

Indeed - millis() was what I thought to find as a problem but couldn't bother to read the flattened code.

How about something like :: ( millis() % 360 ) - or whatever makes sense to get a usable 'remainder' and limit the value to a proper range.
 
Just played with this code a bit. Indeed the trig functions become *much* slower when given angles far more than 2*PI. Whatever they're doing to turn such a huge number into the 0-2PI range takes quite a lot of CPU time.

Here's one possible idea to keep the angles/phase inputs from growing without bound. Instead of using millis(), I just put in a fixed increment (based on the assumption you want a consistent frame rate and will add an elapsedMicros or similar delay in loop). All 6 phases are pre-computed at the beginning of the update, and if any have grown beyond 2*PI, they're trimmed before use with the trig functions.

I also put in a bit code into loop() to print the number of microseconds taken for the computation. Before this change, it started around 80000 and after running for some time would grown to about 450000, which corresponds with the 2 Hz refresh rate you're seeing. This approach of increment & limiting the phase/angle to 2*PI gives a pretty consistent 62000 to 74000 microsecond compute time, even after running for many minutes.

Hope this helps?

Code:
#include "FastLED.h"
uint8_t Width  = 32;
uint8_t Height = 8;
 // y dimension  x=0, y=0 at lower left hand corner (pixel 8)

float speed = 1.0; // speed of the movement along the Lissajous curves
float size = 4;    // amplitude of the curves

// NUM_LEDS = Width * Height
#define NUM_LEDS_PER_STRIP 64
// Note: this can be 12 if you're using a teensy 3 and don't mind soldering the pads on the back
#define NUM_STRIPS 4
#define NUM_LEDS      256
#define BRIGHTNESS    100
#define FPS 100
#define FPS_DELAY 1000/FPS
CRGB leds[NUM_LEDS];

void setup() {
  LEDS.addLeds<WS2811_PORTD, NUM_STRIPS, GRB>(leds, NUM_LEDS_PER_STRIP);
  FastLED.setBrightness(BRIGHTNESS);
}

void loop()  {
  elapsedMicros usec=0;
  sinusoid();
  Serial.println(usec);
  FastLED.show();
//  FastLED.delay(FPS_DELAY);
  
}

const unsigned long millis_increment = 80;
float phase1 = 0.0;
float phase2 = 0.0;
float phase3 = 0.0;
float phase4 = 0.0;
float phase5 = 0.0;
float phase6 = 0.0;

void sinusoid() {
  phase1 += speed * 0.0030 * millis_increment;
  phase2 += speed * 0.0022 * millis_increment;
  phase3 += speed * 0.0021 * millis_increment;
  phase4 += speed * 0.0020 * millis_increment;
  phase5 += speed * 0.0041 * millis_increment;
  phase6 += speed * 0.0052 * millis_increment;

  const float pi2 = PI * 2.0;

  if (phase1 > pi2) phase1 -= pi2;
  if (phase2 > pi2) phase2 -= pi2;
  if (phase3 > pi2) phase3 -= pi2;
  if (phase4 > pi2) phase4 -= pi2;
  if (phase5 > pi2) phase5 -= pi2;
  if (phase6 > pi2) phase6 -= pi2;
  
  for (uint8_t y = 0; y < Height; y++) {
    for (uint8_t x = 0; x < Width; x++) {

      float cx = y + size * sinf(phase1) - (Width/2);  // the 8 centers the middle on a 16x16
      float cy = x + size * cosf(phase2) - (Height/2);
      float v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      uint8_t data = v;
      leds[XY(x, y)].r = data;

      cx = x + size * sinf(phase3) - (Width/2);
      cy = y + size * cosf(phase4) - (Height/2);
      v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      data = v;
      leds[XY(x, y)].b = data;

      cx = x + size * sinf(phase5) - (Width/2);
      cy = y + size * cosf(phase6) - (Height/2);
      v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      data = v;
      leds[XY(x, y)].g = data;

    }
  }
}
// Helper function that translates from x, y into an index into the LED array
uint16_t XY( uint8_t x, uint8_t y)
{
  uint16_t ledNum;
  if ( x & 0x01)
  {
    // Odd rows run backwards
    ledNum = ((x+1) * Height) - (y+1);
  }
  else
  {
    // Even rows run forwards
    ledNum = ((x * Height) + y);
  }


  return ledNum;
}
 
Just for fun, I tried running this on Teensy 3.6. With the FPU and 180 MHz clock speed, it does the computation in 2100 to 2500 us.

Also tried the original code. Indeed a similar slowdown happens on Teensy 3.6, originally taking ~2600 us and slowing to ~22000 us after a few minutes.
 
Thanks guys! It does work! I had come at this a much clumsier way by using a counter instead of millis, and limiting the counter to about 20,000. That also worked but it had a seam when the counter restarted. Paul's solution is much more elegant and doesn't have a seam.

This was way better than desoldering the 3.2! The old chip has still got some tricks! I'm making a hat - which is why the teensy isn't socketed - not enough space. I'll post some more vid when I get that done.

 
I believe you may be able to roughly double your frame rate (reaching approx 30 Hz video speed) by moving the 6 trig functions outside the loop. They take the same input data during each loop iteration, so there's no need to recompute them for every pixel. This seems like the sort of thing the compiler should optimize, but perhaps it doesn't "know" the trig functions are purely a math function of their input?

Code:
#include "FastLED.h"
uint8_t Width  = 32;
uint8_t Height = 8;
 // y dimension  x=0, y=0 at lower left hand corner (pixel 8)

float speed = 1.0; // speed of the movement along the Lissajous curves
float size = 4;    // amplitude of the curves

// NUM_LEDS = Width * Height
#define NUM_LEDS_PER_STRIP 64
// Note: this can be 12 if you're using a teensy 3 and don't mind soldering the pads on the back
#define NUM_STRIPS 4
#define NUM_LEDS      256
#define BRIGHTNESS    100
#define FPS 100
#define FPS_DELAY 1000/FPS
CRGB leds[NUM_LEDS];

void setup() {
  LEDS.addLeds<WS2811_PORTD, NUM_STRIPS, GRB>(leds, NUM_LEDS_PER_STRIP);
  FastLED.setBrightness(BRIGHTNESS);
}

void loop()  {
  elapsedMicros usec=0;
  sinusoid();
  Serial.println(usec);
  FastLED.show();
//  FastLED.delay(FPS_DELAY);
  
}

const unsigned long millis_increment = 80;
float phase1 = 0.0;
float phase2 = 0.0;
float phase3 = 0.0;
float phase4 = 0.0;
float phase5 = 0.0;
float phase6 = 0.0;

void sinusoid() {
  phase1 += speed * 0.0030 * millis_increment;
  phase2 += speed * 0.0022 * millis_increment;
  phase3 += speed * 0.0021 * millis_increment;
  phase4 += speed * 0.0020 * millis_increment;
  phase5 += speed * 0.0041 * millis_increment;
  phase6 += speed * 0.0052 * millis_increment;

  const float pi2 = PI * 2.0;

  if (phase1 > pi2) phase1 -= pi2;
  if (phase2 > pi2) phase2 -= pi2;
  if (phase3 > pi2) phase3 -= pi2;
  if (phase4 > pi2) phase4 -= pi2;
  if (phase5 > pi2) phase5 -= pi2;
  if (phase6 > pi2) phase6 -= pi2;

  float s1 = sinf(phase1) * size;
  float c2 = cosf(phase2) * size;
  float s3 = sinf(phase3) * size;
  float c4 = cosf(phase4) * size;
  float s5 = sinf(phase5) * size;
  float c6 = cosf(phase6) * size;
  
  for (uint8_t y = 0; y < Height; y++) {
    for (uint8_t x = 0; x < Width; x++) {

      float cx = y + s1 - (Width/2);  // the 8 centers the middle on a 16x16
      float cy = x + c2 - (Height/2);
      float v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      uint8_t data = v;
      leds[XY(x, y)].r = data;

      cx = x + s3 - (Width/2);
      cy = y + c4 - (Height/2);
      v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      data = v;
      leds[XY(x, y)].b = data;

      cx = x + s5 - (Width/2);
      cy = y + c6 - (Height/2);
      v = 127 * (1 + sinf ( sqrtf ( ((cx * cx) + (cy * cy)) ) ));
      data = v;
      leds[XY(x, y)].g = data;

    }
  }
}
// Helper function that translates from x, y into an index into the LED array
uint16_t XY( uint8_t x, uint8_t y)
{
  uint16_t ledNum;
  if ( x & 0x01)
  {
    // Odd rows run backwards
    ledNum = ((x+1) * Height) - (y+1);
  }
  else
  {
    // Even rows run forwards
    ledNum = ((x * Height) + y);
  }

  return ledNum;
}
 
I think that is probably right! I will check it tomorrow. Thank you so much for your help! You have saved me all the time that I anticipated resoldering everything. My coding chops are way out of practice. There are surprisingly few code examples of matrix effects that translate well to a simple WS2811 array. If I can pull a few together in one place I'll try to publish them - including this example. I love the teensy and have used it for all kinds of practical things - but my understanding of the intersection of math and art is lacking.
 
Status
Not open for further replies.
Back
Top