Teensy 4 IntervalTimer Max Speed

Status
Not open for further replies.

Sanworks

Member
Hi Paul and others,

When toggling an output pin, the IntervalTimer class, seems to be limited to 1.6 microsecond intervals on Teensy 4.
On T3.6, I was able to go as low as 0.7 microseconds - so I imagine that T4 hardware is capable of much faster updates.

Here's my code (shown with 1 microsecond intervals, this fails on T4):

Code:
#define OUT_PIN 17
IntervalTimer myTimer;
byte outPinState = 0;

void setup() {
  pinMode(OUT_PIN, OUTPUT);
  myTimer.begin(timerCallback, 1);
}

void timerCallback() {
  outPinState = 1-outPinState;
  digitalWriteFast(OUT_PIN, outPinState);
}

void loop() {
  
}

I tried using both digitalWrite() and digitalWriteFast() to toggle the pin, same result.

Thanks!
 
See here for links to some tests from early this year. https://forum.pjrc.com/threads/5795...for-tick-timer?p=218573&viewfull=1#post218573

I reactivated one of the test sketches from then to see if the T4 PIT (intervaltimer) is more efficient than the one from the beta board was. For testing i count the processor cycles a loop takes with and without the PITs running in the background. The ratio gives some information about the load the PIT generate. Test is done for 4 timers at various timer periods:

Here the result of the program pasted below: f: interrupt frequency, the numbers in parentheses are the respective cycle counts) load: ratio of the counts
Code:
f:100.0 kHz Load:  41.6  (w/o interrupts: 6500010 with interrupts 11125203)
f: 50.0 kHz Load:  17.3  (w/o interrupts: 6500010 with interrupts 7858722)
f: 25.0 kHz Load:   8.5  (w/o interrupts: 6500010 with interrupts 7105767)
f: 12.5 kHz Load:   4.3  (w/o interrupts: 6500010 with interrupts 6789252)
f:  6.2 kHz Load:   2.2  (w/o interrupts: 6500010 with interrupts 6644030)
f:  3.1 kHz Load:   1.1  (w/o interrupts: 6500010 with interrupts 6571456)
f:  1.6 kHz Load:   0.6  (w/o interrupts: 6500010 with interrupts 6536124)

Seems to fit nicely with your observation



Code:
#include "arduino.h"

IntervalTimer t1, t2, t3, t4;

void test() // dummyfunction
{
   digitalWriteFast(0, HIGH);
   digitalWriteFast(0, LOW);
}

void test2() // dummyfunction
{
   digitalWriteFast(1, HIGH);
   digitalWriteFast(1, LOW);
}

void test3() // dummyfunction
{
   digitalWriteFast(2, HIGH);
   digitalWriteFast(2, LOW);
}

void test4() // dummyfunction
{
   digitalWriteFast(3, HIGH);
   digitalWriteFast(3, LOW);
}

volatile int dummy;
constexpr unsigned loops = 1000000;


// count processor cycles needed for a loop
unsigned speedTest(unsigned loops)
{
   uint32_t start = ARM_DWT_CYCCNT;
   for (unsigned i = 0; i < loops; i++)
   {
      dummy++;
   }
   uint32_t end = ARM_DWT_CYCCNT;

   return end - start;
}

void setup()
{
   while(!Serial);
   pinMode(LED_BUILTIN, OUTPUT);
   pinMode(0, OUTPUT);
   pinMode(1, OUTPUT);
   pinMode(2, OUTPUT);
   pinMode(3, OUTPUT);
   
   // required for T3.6
   ARM_DEMCR |= ARM_DEMCR_TRCENA;
   ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;

   // Measure cycles required for loop without any interrupts
   noInterrupts();
   uint32_t withoutInts = speedTest(loops);

   for (int T = 10; T < 1001; T *= 2)
   {
      // activate interrupts and IntervalTimer
      interrupts();

      t1.begin(test, T);
      t2.begin(test2, T);
      t3.begin(test3, T);
      t4.begin(test4, T);

      uint32_t withInts = speedTest(loops);

      float load = 100.0f * (1.0f - (float)withoutInts / (float)withInts);
      Serial.printf("f:%5.1f kHz Load: %5.1f", 1000.0f / T, load);
      Serial.printf("  (w/o interrupts: %d with interrupts %d)\n", withoutInts, withInts);
      Serial.flush();
   }
}


void loop()
{
   digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
   delay(500);
}
 
Last edited:
@luni - result on T4 at 600 MHz seem to match above.

Here is that code at 960 MHz:
F_CPU=960000000 deg C=62
f:100.0 kHz Load: 31.4 (w/o interrupts: 6500008 with interrupts 9476890)
f: 50.0 kHz Load: 13.1 (w/o interrupts: 6500008 with interrupts 7483873)
f: 25.0 kHz Load: 6.6 (w/o interrupts: 6500008 with interrupts 6957920)
f: 12.5 kHz Load: 3.3 (w/o interrupts: 6500008 with interrupts 6720270)
f: 6.2 kHz Load: 1.7 (w/o interrupts: 6500008 with interrupts 6609267)
f: 3.1 kHz Load: 0.8 (w/o interrupts: 6500008 with interrupts 6553674)
f: 1.6 kHz Load: 0.4 (w/o interrupts: 6500008 with interrupts 6525878)

F_CPU=960000000 deg C=61

Tried at 10008 MHz - it ran a bit faster once - second run temp was up and it DIED.

Then tried at 396 MHz - oddly not much diff from 600 MHz:
F_CPU=396000000 deg C=49
f:100.0 kHz Load: 42.4 (w/o interrupts: 6500008 with interrupts 11283883)
f: 50.0 kHz Load: 17.6 (w/o interrupts: 6500008 with interrupts 7885296)
f: 25.0 kHz Load: 8.8 (w/o interrupts: 6500008 with interrupts 7127043)
f: 12.5 kHz Load: 4.4 (w/o interrupts: 6500008 with interrupts 6799712)
f: 6.2 kHz Load: 2.2 (w/o interrupts: 6500008 with interrupts 6646280)
f: 3.1 kHz Load: 1.1 (w/o interrupts: 6500008 with interrupts 6571840)
f: 1.6 kHz Load: 0.6 (w/o interrupts: 6500008 with interrupts 6535994)

F_CPU=396000000 deg C=49
 
Thanks for the measurments defragster. Unfortuately it does not look very good. ~40% load for having 4 interval timer toggling pins at 100kHz does somehow not fit to the otherwise very fast processor. Maybe there is room for optimization somewhere in the guts of the processor?
 
Thanks for the measurments defragster. Unfortuately it does not look very good. ~40% load for having 4 interval timer toggling pins at 100kHz does somehow not fit to the otherwise very fast processor. Maybe there is room for optimization somewhere in the guts of the processor?

It seems I just have to run posted code :) when I saw the OP code I wondered how much free time was in loop and it looked like you calculated that so I wouldn't have to.

Though it is ODD that there is not so much diff between the F_CPU=396000000 results and the ones in p#2 I assume are F_CPU=600000000?

F_CPU=396000000
f:100.0 kHz Load: 42.4 (w/o interrupts: 6500008 with interrupts 11283883)

F_CPU=600000000
f:100.0 kHz Load: 41.6 (w/o interrupts: 6500010 with interrupts 11125203)
Not sure if that shows a bottleneck, or some hole in the LOAD test method?
 
Though it is ODD that there is not so much diff between the F_CPU=396000000 results and the ones in p#2 I assume are F_CPU=600000000?

I think the root cause for that is that reading/writing of the peripheral registers (not tightly coupled) takes rather long and is not primarily related to F_CPU but to the speed of peripheral bus.
This test from FrankB https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=196563&viewfull=1#post196563 gave 101 cycles just for setting one TFLG. Since all PITs share one interrupt you need to read / write all 4 TFLGs in the ISR. Estimation: per PIT you get some 400 cycles -> f_max = 400/600MHz = 0.66µs -> 1.5MHz which is the same order of magnitude as Sanworks reports in #1.

(see also https://forum.pjrc.com/threads/54711...l=1#post195467 which shows a max frequency again with the same order of magnitude (beta version))

Not sure if that shows a bottleneck, or some hole in the LOAD test method?
Hopefully, but I'm somehow afraid that if a loop takes 6500008 cycles with disabled PITs and the same loop needs 11125203 cycles with enabled PITs something is eating up a lot of cycles in the background...
 
Status
Not open for further replies.
Back
Top