Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 5 of 5

Thread: OctoWS Framerate Test and Results for Teensy 3.2 and 3.6

  1. #1
    Junior Member
    Join Date
    Mar 2019
    Posts
    3

    OctoWS Framerate Test and Results for Teensy 3.2 and 3.6

    Hey all,

    Working on a project at the moment in which I was hoping to drive 31 short pixel strips (90px) individually on a single controller using FastLED, but ran into framerate issues.
    I decided to do some testing of different methods of driving the strips, and ended up building a reference list of pixel count to frame rate for the different methods.
    Just figured I'd post my results for OctoWS here as it might be a useful reference in the future, and I haven't spotted such a list around anywhere.

    I tested both a Teensy 3.2 and 3.6 running the OctoWS library to drive 8 strips, starting at a length of 1px, then 50, 100, 150, etc per strip, all the way up until I ran into a preventative issue. My test method involved running the attached sketch, and hooking an oscilloscope onto pin 13 to check its frequency. This frequency indicates half the frame rate (1Hz = 2 frames) because the sketch only flips the signal once every frame not twice. Note that the Teensies weren't connected to anything other than the oscilloscope and the computer (via usb for power) for these tests, i.e. not attached to any other circuit such as the OctoWS shield.

    Interestingly the pixel limit arose differently on the two boards; on the 3.2, the compiler spotted a ram limit and spat out an error at exactly 961 pixels and beyond, whereas on the 3.6 I could get to 1365 after which the Teensy seemed to just lock up at some point in the first iteration of the main loop (pin 13 stayed permanently high rather than constantly inverting) but there were no compiler errors.

    I'd say the results are indicative of the realistic maximum frame rate for a given pixel count, as the sketch simply sets every pixel in the array once per loop, and only calls .show() once per loop. I've also tried testing without updating the pixel array every loop (i.e. removing the 'for' loop), and got the same framerate (I only tested this on the 3.6 at 1365 pixels). I'm not too sure if the time spent inverting pin 13 every frame affected the framerate at all, or in any meaningful way. Maybe someone more knowledgeable here can enlighten us on whether this might have been the case, or even test for it specifically.

    Something interesting I found is that the framerate seems to be limited by something other than the clock speed, as the Teensy 3.2 @ 96Mhz gave the exact same results as the Teensy 3.6 @ 192Mhz.

    My raw and graphed results are here: https://docs.google.com/spreadsheets...TskYcn1hqnSng/
    and for simplicity, here is the table for the 3.6's results:

    Mhz Strips Pixels/Strip FPS/2 (from scope) FPS
    192 8 1 375.0 750.0
    192 8 50 178.0 356.0
    192 8 100 116.0 232.0
    192 8 150 86.0 172.0
    192 8 200 68.5 137.0
    192 8 250 56.7 113.4
    192 8 300 48.5 97.0
    192 8 350 42.3 84.6
    192 8 400 37.5 75.0
    192 8 450 33.7 67.4
    192 8 500 30.6 61.2
    192 8 550 28.0 56.0
    192 8 600 25.8 51.6
    192 8 650 24.0 48.0
    192 8 700 22.4 44.8
    192 8 750 21.0 42.0
    192 8 800 19.7 39.4
    192 8 850 18.6 37.2
    192 8 900 17.6 35.2
    192 8 950 16.7 33.4
    192 8 1000 16.0 32.0
    192 8 1050 15.2 30.4
    192 8 1100 14.6 29.2
    192 8 1150 13.9 27.8
    192 8 1200 13.4 26.8
    192 8 1250 12.8 25.6
    192 8 1300 12.4 24.8
    192 8 1350 11.9 23.8
    192 8 1365 11.8 23.6

    In case you'd like to replicate my test or do something similar, here's the test sketch:
    Code:
    /*  OctoWS_FrameRate_Test.ino by hijomojo
    
      Required Connections
      --------------------
        pin13: oscilloscope probe or other test device
        gnd: oscilloscope/test device ground
      
      Optional Connections, according to Paul's standard
      --------------------
        pin 2:  LED Strip #1    OctoWS2811 drives 8 LED Strips.
        pin 14: LED strip #2    All 8 are the same length.
        pin 7:  LED strip #3
        pin 8:  LED strip #4    A 100 ohm resistor should used
        pin 6:  LED strip #5    between each Teensy pin and the
        pin 20: LED strip #6    wire to the LED strip, to minimize
        pin 21: LED strip #7    high frequency ringining & noise.
        pin 5:  LED strip #8
        pin 15 & 16 - Connect together, but do not use
        pin 4 - Do not use
        pin 3 - Do not use as PWM.  Normal use is ok.
    
    */
    
    #include <OctoWS2811.h>
    
    const int ledsPerStrip = 1365; // change this according to the number of pixels per strip you want to test the framerate of
    
    DMAMEM int displayMemory[ledsPerStrip*8];
    int drawingMemory[ledsPerStrip*8];
    
    const int config = WS2811_GRB | WS2811_800kHz;
    
    OctoWS2811 leds(ledsPerStrip, displayMemory, drawingMemory, config);
    
    byte frameRate = 60;      // change this according to whatever framerate you'd like to cap at
    unsigned long frameDuration;
    unsigned long lastFrame;
    
    void setup() {
      pinMode(LED_BUILTIN, OUTPUT);
      frameDuration = 1000000/frameRate;
      leds.begin();
      leds.show();
    }
    
    void loop() {
      // uncomment these 2 lines if you wish to cap the framerate for any reason
      // while (micros() < lastFrame + frameDuration) {}
      // lastFrame = micros();
    
      digitalWrite(LED_BUILTIN, !digitalRead(LED_BUILTIN));
      for (int i=0; i < leds.numPixels(); i++) {
        leds.setPixel(i, 0xFF0000);
      } 
      leds.show();
    }

  2. #2
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    19,922
    Quote Originally Posted by hijomojo View Post
    ... whereas on the 3.6 I could get to 1365 after which the Teensy seemed to just lock up
    Yes, 1365 LEDs per pin (10920 total) is a hard limit imposed by the DMA hardware. A 15 bit number is used to count the bits transmitted. 1365 LEDs with 24 bit color per LED is 32760 bits transmitted.


    Something interesting I found is that the framerate seems to be limited by something other than the clock speed, as the Teensy 3.2 @ 96Mhz gave the exact same results as the Teensy 3.6 @ 192Mhz.
    WS2812 LEDs always use 800 kbit/sec communication, plus a 50 or 300 us end-of-frame idle time. Sending 8 in parallel means the overall communication speed of 6.4 Mbit/sec. If you do not spend a large amount of CPU time rendering each frame, nor impose any other limit on your frame rate, then the raw communication speed to the LEDs will limit the overall speed.

    If you receive the frame data from media or communication from a PC, Teensy 3.2 usually performs the same as Teensy 3.6, because the speeds are controlled by the communication. Teensy 3.2 is (usually) plenty fast enough to receive data and transmit it to the LEDs.

    But if you do rendering of the frames on Teensy, the additional performance of Teensy 3.6 can really help if you have a large number of LEDs and fairly complex animation.

  3. #3
    Junior Member
    Join Date
    Mar 2019
    Posts
    3
    Quote Originally Posted by PaulStoffregen View Post
    Yes, 1365 LEDs per pin (10920 total) is a hard limit imposed by the DMA hardware
    Ahh, had a feeling it was that but couldn't find the info about the DMA memory limit.

    Quote Originally Posted by PaulStoffregen View Post
    ...the raw communication speed to the LEDs will limit the overall speed.
    ...if you do rendering of the frames on Teensy, the additional performance of Teensy 3.6 can really help if you have a large number of LEDs and fairly complex animation.
    I know clock speed is not the be-all-and-end-all of performance, but are the chips on the Teensy 3.2 and 3.6 similar enough in IPC, or performance per cycle otherwise, that the additional clock speed of the 3.6 is effectively an indication of the 'spare' performance you could use for said animation complexity? Say for example, if I did some more tests in which I ran the 1px/strip test, on the 3.2, and reduced the clock speed until seeing the FPS drop, would the final clock speed before an FPS drop occurred (subtracted from the max clock speed for that chip/board) indicate this amount of 'spare' performance?

  4. #4
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    19,922
    Quote Originally Posted by hijomojo View Post
    are the chips on the Teensy 3.2 and 3.6 similar enough in IPC, or performance per cycle otherwise, that the additional clock speed of the 3.6 is effectively an indication of the 'spare' performance you could use for said animation complexity?
    Depends on your code.

    If you use only integers, performance scales approximately with clock speed.

    If you use 32 bit float, Teensy 3.6 is much faster than Teensy 3.2, because it has a 32 bit FPU.

  5. #5
    Junior Member
    Join Date
    Mar 2019
    Posts
    3
    Ahh, wonderful
    So far I've never had a need for using float in striplight animations, so that's good news. Hopefully I'll have some time to test the scaling out.

    Much appreciated, Paul!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •