OctoWS Framerate Test and Results for Teensy 3.2 and 3.6

Status
Not open for further replies.

hijomojo

Member
Hey all,

Working on a project at the moment in which I was hoping to drive 31 short pixel strips (90px) individually on a single controller using FastLED, but ran into framerate issues.
I decided to do some testing of different methods of driving the strips, and ended up building a reference list of pixel count to frame rate for the different methods.
Just figured I'd post my results for OctoWS here as it might be a useful reference in the future, and I haven't spotted such a list around anywhere.

I tested both a Teensy 3.2 and 3.6 running the OctoWS library to drive 8 strips, starting at a length of 1px, then 50, 100, 150, etc per strip, all the way up until I ran into a preventative issue. My test method involved running the attached sketch, and hooking an oscilloscope onto pin 13 to check its frequency. This frequency indicates half the frame rate (1Hz = 2 frames) because the sketch only flips the signal once every frame not twice. Note that the Teensies weren't connected to anything other than the oscilloscope and the computer (via usb for power) for these tests, i.e. not attached to any other circuit such as the OctoWS shield.

Interestingly the pixel limit arose differently on the two boards; on the 3.2, the compiler spotted a ram limit and spat out an error at exactly 961 pixels and beyond, whereas on the 3.6 I could get to 1365 after which the Teensy seemed to just lock up at some point in the first iteration of the main loop (pin 13 stayed permanently high rather than constantly inverting) but there were no compiler errors.

I'd say the results are indicative of the realistic maximum frame rate for a given pixel count, as the sketch simply sets every pixel in the array once per loop, and only calls .show() once per loop. I've also tried testing without updating the pixel array every loop (i.e. removing the 'for' loop), and got the same framerate (I only tested this on the 3.6 at 1365 pixels). I'm not too sure if the time spent inverting pin 13 every frame affected the framerate at all, or in any meaningful way. Maybe someone more knowledgeable here can enlighten us on whether this might have been the case, or even test for it specifically.

Something interesting I found is that the framerate seems to be limited by something other than the clock speed, as the Teensy 3.2 @ 96Mhz gave the exact same results as the Teensy 3.6 @ 192Mhz.

My raw and graphed results are here: https://docs.google.com/spreadsheets/d/1hUzdqTbT-xQRewz9-DdlzS6PSpeoUkTskYcn1hqnSng/
and for simplicity, here is the table for the 3.6's results:

MhzStripsPixels/StripFPS/2 (from scope)FPS
19281375.0750.0
192850178.0356.0
1928100116.0232.0
192815086.0172.0
192820068.5137.0
192825056.7113.4
192830048.597.0
192835042.384.6
192840037.575.0
192845033.767.4
192850030.661.2
192855028.056.0
192860025.851.6
192865024.048.0
192870022.444.8
192875021.042.0
192880019.739.4
192885018.637.2
192890017.635.2
192895016.733.4
1928100016.032.0
1928105015.230.4
1928110014.629.2
1928115013.927.8
1928120013.426.8
1928125012.825.6
1928130012.424.8
1928135011.923.8
1928136511.823.6

In case you'd like to replicate my test or do something similar, here's the test sketch:
Code:
/*  OctoWS_FrameRate_Test.ino by hijomojo

  Required Connections
  --------------------
    pin13: oscilloscope probe or other test device
    gnd: oscilloscope/test device ground
  
  Optional Connections, according to Paul's standard
  --------------------
    pin 2:  LED Strip #1    OctoWS2811 drives 8 LED Strips.
    pin 14: LED strip #2    All 8 are the same length.
    pin 7:  LED strip #3
    pin 8:  LED strip #4    A 100 ohm resistor should used
    pin 6:  LED strip #5    between each Teensy pin and the
    pin 20: LED strip #6    wire to the LED strip, to minimize
    pin 21: LED strip #7    high frequency ringining & noise.
    pin 5:  LED strip #8
    pin 15 & 16 - Connect together, but do not use
    pin 4 - Do not use
    pin 3 - Do not use as PWM.  Normal use is ok.

*/

#include <OctoWS2811.h>

const int ledsPerStrip = 1365; // change this according to the number of pixels per strip you want to test the framerate of

DMAMEM int displayMemory[ledsPerStrip*8];
int drawingMemory[ledsPerStrip*8];

const int config = WS2811_GRB | WS2811_800kHz;

OctoWS2811 leds(ledsPerStrip, displayMemory, drawingMemory, config);

byte frameRate = 60;      // change this according to whatever framerate you'd like to cap at
unsigned long frameDuration;
unsigned long lastFrame;

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  frameDuration = 1000000/frameRate;
  leds.begin();
  leds.show();
}

void loop() {
  // uncomment these 2 lines if you wish to cap the framerate for any reason
  // while (micros() < lastFrame + frameDuration) {}
  // lastFrame = micros();

  digitalWrite(LED_BUILTIN, !digitalRead(LED_BUILTIN));
  for (int i=0; i < leds.numPixels(); i++) {
    leds.setPixel(i, 0xFF0000);
  } 
  leds.show();
}
 
... whereas on the 3.6 I could get to 1365 after which the Teensy seemed to just lock up

Yes, 1365 LEDs per pin (10920 total) is a hard limit imposed by the DMA hardware. A 15 bit number is used to count the bits transmitted. 1365 LEDs with 24 bit color per LED is 32760 bits transmitted.


Something interesting I found is that the framerate seems to be limited by something other than the clock speed, as the Teensy 3.2 @ 96Mhz gave the exact same results as the Teensy 3.6 @ 192Mhz.

WS2812 LEDs always use 800 kbit/sec communication, plus a 50 or 300 us end-of-frame idle time. Sending 8 in parallel means the overall communication speed of 6.4 Mbit/sec. If you do not spend a large amount of CPU time rendering each frame, nor impose any other limit on your frame rate, then the raw communication speed to the LEDs will limit the overall speed.

If you receive the frame data from media or communication from a PC, Teensy 3.2 usually performs the same as Teensy 3.6, because the speeds are controlled by the communication. Teensy 3.2 is (usually) plenty fast enough to receive data and transmit it to the LEDs.

But if you do rendering of the frames on Teensy, the additional performance of Teensy 3.6 can really help if you have a large number of LEDs and fairly complex animation.
 
Yes, 1365 LEDs per pin (10920 total) is a hard limit imposed by the DMA hardware
Ahh, had a feeling it was that but couldn't find the info about the DMA memory limit.

...the raw communication speed to the LEDs will limit the overall speed.
...if you do rendering of the frames on Teensy, the additional performance of Teensy 3.6 can really help if you have a large number of LEDs and fairly complex animation.
I know clock speed is not the be-all-and-end-all of performance, but are the chips on the Teensy 3.2 and 3.6 similar enough in IPC, or performance per cycle otherwise, that the additional clock speed of the 3.6 is effectively an indication of the 'spare' performance you could use for said animation complexity? Say for example, if I did some more tests in which I ran the 1px/strip test, on the 3.2, and reduced the clock speed until seeing the FPS drop, would the final clock speed before an FPS drop occurred (subtracted from the max clock speed for that chip/board) indicate this amount of 'spare' performance?
 
are the chips on the Teensy 3.2 and 3.6 similar enough in IPC, or performance per cycle otherwise, that the additional clock speed of the 3.6 is effectively an indication of the 'spare' performance you could use for said animation complexity?

Depends on your code.

If you use only integers, performance scales approximately with clock speed.

If you use 32 bit float, Teensy 3.6 is much faster than Teensy 3.2, because it has a 32 bit FPU.
 
Ahh, wonderful
So far I've never had a need for using float in striplight animations, so that's good news. Hopefully I'll have some time to test the scaling out.

Much appreciated, Paul!
 
Status
Not open for further replies.
Back
Top