Teensy 4.0 Timers and Interrupts using registers

NuttyMonk

Well-known member
Hi all,

i'm wanting to do 48-bit timing on a Teensy 4.0. The only way i can do that is to use the QUADTimers and cascade them. After testing the QUADTimers i was able to determine that they run at 150MHz and based on the largest value i will need to use i have decided to run the timer at a pre-scaler of 1 with 3 timer channels cascaded. See code below.

I've based my code on this post...
https://forum.pjrc.com/threads/61462-Teensy4-QuadTimer-as-64-bit-counter?highlight=cascade+timer

I've gotten the QTimer and it's ISR to run ok and i can get it to run for the lengths of time i need. The counter values being printed to the serial port all look good. The problem i am having is trying to figure out how to get a timer to activate an ISR at the right time. As you can see in the code below, i have a function named calcBinary(). This function is my way of trying to get the binary values for each timer to match a value of 9,000,000,000 clock ticks. This is just a test value within my expected range. The printout of the values for the 3 timers works fine (16-bits binary) and i get an accurate result that matches my own calculations. These binary values would be used for compare matching. The problem is that all 3 timer channels would then be compare matching and activating their interrupts when they get a match. I just want 1 of the 3 channels to activate an ISR.

So how do i go about doing that? I don't want to have any code in the loop() function as that would negate the point of running hardware timers in the first place. I want accuracy in the low nanoseconds. With a 150MHz timer clock i can get 6.67r nanoseconds per timer clock tick which is great. That will produce nice, stable and accurate square waves.

Does anyone (maybe manitou?) have a suggestion for me to figure out when a certain number of timer clock ticks have passed?

Any help is appreciated.

Also, what does the code below do?

Code:
IMXRT_TMR_t * TMR4 = (IMXRT_TMR_t *)&IMXRT_TMR4;

#define IRQ_QTIMERx IRQ_QTIMER4



Code:
#include <SPI.h>
#include <elapsedMillis.h>

IMXRT_TMR_t * TMR4 = (IMXRT_TMR_t *)&IMXRT_TMR4;

#define IRQ_QTIMERx IRQ_QTIMER4

elapsedMillis freerunTimer;
uint32_t timed = 0;
uint16_t countTimeMillis = 0;
uint16_t prevCountTimeMillis = 0;

void my_isr() {
  TMR4->CH[1].CSCTRL &= ~(TMR_CSCTRL_TCF1);  // clear
  countTimeMillis = freerunTimer;
  Serial.println("my_isr() CALLED !!!");
}

void setup() {
  Serial.begin(115200);
  Serial.println("=================");
  Serial.println(" Teensy 4 Timers ");
  Serial.println("=================");

  // set up QuadTimer4 
  CCM_CCGR6 |= CCM_CCGR6_QTIMER4(CCM_CCGR_ON);           // enable QTMR4 clock
  TMR4->CH[0].CTRL = 0;                   // stop
  TMR4->CH[1].CTRL = 0;                   // stop
  TMR4->CH[2].CTRL = 0;                   // stop
  TMR4->CH[0].CNTR = 0;                   // set count to 0
  TMR4->CH[1].CNTR = 0;                   // set count to 0
  TMR4->CH[2].CNTR = 0;                   // set count to 0
  TMR4->CH[0].COMP1 =  0xffff;            // send count signal to next counter on overflow at 0xffff
  TMR4->CH[1].COMP1 =  0xffff;            // send count signal to next counter on overflow at 0xffff
  TMR4->CH[2].COMP1 =  0xffff;            // send count signal to next counter on overflow at 0xffff
  TMR4->CH[0].CMPLD1 =  0xffff;
  TMR4->CH[1].CMPLD1 =  0xffff;
  TMR4->CH[2].CMPLD1 =  0xffff;
  TMR4->CH[2].CTRL  = TMR_CTRL_CM (7);    // Count Mode:           Cascaded counter mode
  TMR4->CH[2].CTRL |= TMR_CTRL_PCS(5);    // Primary Count Source: CH[1] output
  TMR4->CH[1].CTRL  = TMR_CTRL_CM (7);    // Count Mode:           Cascaded counter mode
  TMR4->CH[1].CTRL |= TMR_CTRL_PCS(4);    // Primary Count Source: CH[0] output
  TMR4->CH[0].CTRL  = TMR_CTRL_CM (1);    // Count Mode:           Count rising edges of primary source
  TMR4->CH[0].CTRL |= TMR_CTRL_PCS(8);    // Primary Count Source: IP bus clock divide by 1 prescaler

  attachInterruptVector(IRQ_QTIMERx, my_isr);
  TMR4->CH[1].CSCTRL &= ~(TMR_CSCTRL_TCF1);  // clear
  TMR4->CH[1].CSCTRL |= TMR_CSCTRL_TCF1EN;  // enable interrupt
  NVIC_ENABLE_IRQ(IRQ_QTIMERx);

  calcBinary();

  freerunTimer = 0;
}

void calcBinary() {
  uint64_t thisNumber = 9000000000;
  uint16_t timer1Number = thisNumber;
  uint16_t timer2Number = thisNumber >> 16;
  uint16_t timer3Number = thisNumber >> 32;

  Serial.print("timer1Number = ");
  Serial.print(timer1Number);
  Serial.print(", ");
  Serial.println(timer1Number, BIN);
  
  Serial.print("timer2Number = ");
  Serial.print(timer2Number);
  Serial.print(", ");
  Serial.println(timer2Number, BIN);
  
  Serial.print("timer3Number = ");
  Serial.print(timer3Number);
  Serial.print(", ");
  Serial.println(timer3Number, BIN);
}

void loop() {
  if (countTimeMillis != prevCountTimeMillis) {
    Serial.println("TMR4->CH[1] OVERFLOW ");
    Serial.print("countTimeMillis = ");
    Serial.println(countTimeMillis);
    prevCountTimeMillis = countTimeMillis;
  }
  
  Serial.println(TMR4->CH[0].CNTR);
  Serial.println(TMR4->CH[1].CNTR);
  Serial.println(TMR4->CH[2].CNTR);
  Serial.println();
  
  delay(200);
}

Cheers

NM
 
Do you mean that you want to get an interrupt when the timers each have a specific value, such as T0=X, T1=Y, T2=Z? I don't think there is a way to do that.

Can you say more about what you're trying to do? If you are counting bus clocks, and you want to generate an interrupt after some value of clocks that requires 48 bits, then you know the time, so what are you trying to measure? You could just divide the clock by 4 to get an interrupt from a single 16-bit QuadTimer.
 
Do you mean that you want to get an interrupt when the timers each have a specific value, such as T0=X, T1=Y, T2=Z? I don't think there is a way to do that.

I was just getting the timers and interrupts working and then i realised i don't know how to decide when the 3 cascaded timer channels have reached the correct count.

Can you say more about what you're trying to do? If you are counting bus clocks, and you want to generate an interrupt after some value of clocks that requires 48 bits, then you know the time, so what are you trying to measure? You could just divide the clock by 4 to get an interrupt from a single 16-bit QuadTimer.

I am currently using elapsedMiros to do the timing in my project but i am getting approx 40 uS of latency due to constantly having to check the elapsedMicros value in the loop() function (i have 4 clocks running simultaneously plus a couple of other timing things going on, all in the loop() function). I figured that if i could use a hardware timer with an interrupt i could get the accuracy down to the low nanoseconds instead of just microseconds and have little to no latency. Unfortunately the hardware timers are only 16-bit so i tried cascading them and that all works fine now, i can count up to 48-bits. I can't figure out how to get the code to decide if a count has been reached though.

The maximum value i am looking for is 60 seconds which at 150MHz is 9,000,000,000 clock ticks, although if it could be higher that would be great. Even if i divide the clock by 128, the maximum time i can get from a 16-bit timer is 0.055924 seconds and each tick is 853.3 nanoseconds which isn't too great.

I have an idea that i have 2 global variables, one for the timer channel which represents the most significant bits and another for the timer channel that represents the middle significant bits. When the timer channel with the most significant bits reaches its value i set its variable from 0 to 1. Then when the timer channel with the middle significant bits hits its value i set its variable from 0 to 1. I do this in their interrupt routines. Then in the interrupt routine for the timer channel with the least significant bits i can check if the other two timers variables are set to 1, and if they are i carry out the rest of the code in the interrupt routine. I'll maybe try that tomorrow.

I just wish one of the libraries out there was able to do all of this for me but no luck.

Cheers

NM
 
@NuttyMonk: I have absolutely no experience with programming the specific timer + interrupt hardware in the Teensy, so I'm hoping that maybe my ignorance could work in our favor here. With that restriction in mind, would this approach help you to get to where you'd like to be ??

Assume T1 is the fastest ticking timer, T2 is fed by the overflow of T1, & T3 is fed by the overflow of T2. Initial conditions would be set such that all three timers start with an initial count of 0.

If I did my math correctly, your target count of 9,000,000,000 would be achieved when T3 is at a terminal count of 2, T2 is at a terminal count of 6257, and T1 is at a terminal count of 6656 as follows:

(2 * 65536 * 65536) + (6257 * 65536) + 6656 = 9,000,000,000

So, for your sketch to be able to indicate when you have reached the overall terminal count of 9,000,000,000, you would start by enabling the interrupt from T3 to fire at a count of 2 (the other two timers are simply counting & overflowing - no interrupts set for them yet). When the interrupt from T3 fires off (as it rolls over from a count of 1 to a count of 2), in that interrupt handler, you would then enable the interrupt from T2 to fire at a count of 6257 (T1 continues counting & overflowing, but still no interrupt set for T1 yet). When the interrupt from T2 fires off (as it rolls over from a count of 6256 to a count of 6257, with T3 still at a count of 2), you would then enable the interrupt from T1 to fire at a count of 6656. When that interrupt from T1 fires off (as it rolls over from a count of 6655 to a count of 6656, with T3 still at a count of 2, and T2 still at a count of 6257), you have reached your desired overall terminal count of 9,000,000,000.

Does this make sense ?? For a different desired overall terminal count, you would (re)calculate the new terminal count for each timer in a similar fashion & set the interrupts to fire at the appropriate calculated terminal counts.

Mark J Culross
KD5RXT
 
Have you tried IntervalTimer, as shown below? It supports a maximum interval of about 178 seconds, so it seems to work fine at 60 seconds. To ensure minimum delay I've shown how to set to highest priority, but you have to be careful doing this. It seems odd to need nanosecond latency for something that occurs every 60 seconds.

Code:
#include <IntervalTimer.h>

IntervalTimer timer;
volatile uint32_t timer_count;

void timer_ISR( void ) {
  timer_count++;
  Serial.println( timer_count );
}

void setup() {
  Serial.begin( 9600 );                
  while (!Serial) {} 
  Serial.println( "start" ); 
  timer.begin( timer_ISR, 60'000'000 );
  timer.priority( 0 ); // set to highest priority (optional? necessary?)
}

void loop() {
}
 
Have you tried IntervalTimer, as shown below? It supports a maximum interval of about 178 seconds, so it seems to work fine at 60 seconds. To ensure minimum delay I've shown how to set to highest priority, but you have to be careful doing this. It seems odd to need nanosecond latency for something that occurs every 60 seconds.

The minimum time interval is about 3.7 milliseconds, the maximum is about 60 seconds, at least for the moment. I also want the clock pulses coming out of the project to be as jitter-free and stable as possible and preferably all based on hardware, not software.

kd5rxt-mark; said:
Does this make sense ?? For a different desired overall terminal count, you would (re)calculate the new terminal count for each timer in a similar fashion & set the interrupts to fire at the appropriate calculated terminal counts.

This is pretty much what i described in my previous post, although i didn't do as good a job of it. :)
 
The minimum time interval is about 3.7 milliseconds, the maximum is about 60 seconds, at least for the moment. I also want the clock pulses coming out of the project to be as jitter-free and stable as possible and preferably all based on hardware, not software.

This is pretty much what i described in my previous post, although i didn't do as good a job of it. :)

I'm having a hard time imagining what is your objective. Can you say more about what you're trying to do, and why you need nanosecond precision? You may be able to get the 3 x 16-bit timer method to work with the method outlined by kd5xrt-mark, but that seems tricky because the time between interrupt #2 and interrupt #3 will vary with the 48-bit value. It's probably better to use one of the 32-bit timers (GPT?) and divide the clock slightly so you can get the desired period with a single output compare.
 
I'm having a hard time imagining what is your objective. Can you say more about what you're trying to do, and why you need nanosecond precision? You may be able to get the 3 x 16-bit timer method to work with the method outlined by kd5xrt-mark, but that seems tricky because the time between interrupt #2 and interrupt #3 will vary with the 48-bit value. It's probably better to use one of the 32-bit timers (GPT?) and divide the clock slightly so you can get the desired period with a single output compare.

I am running 4 clock outputs from the Teensy 4.0. They are for an audio project as a clock source for sequencing purposes. That means i need 4 independent timers. The PIT timers would be good to use as they are 32-bit but they share a common ISR. This makes them unusable as far as i can tell unless i just use the ISR to compare values and call another function which is usually to be avoided. The only other timers on the Teensy 4.0 (that there are 4 of) are the QTimers.

The maximum BPM of the 4 clocks in my project is 999.9 and the minimum is 1. Each beat can also be sub-divided into 16 individual steps which also produce clock pulses. Therefore the minimum step size is 3.75 milliseconds and the maximum is 60,000 milliseconds (although i would like the option to extend that but it isn't essential), which is quite a big range. If a clock is running at maximum bpm (999.9) with 16 steps of sub-division, accuracy becomes much more important. I would like to get 1 microsecond precision or better, going into the nanoseconds is just a bonus. Having the clocks pulses trigger by the use of hardware timers and their ISRs is also much better as latency no longer becomes an issue. I am currently getting 40 uS of latency using elapsedMicros due to the overheads in the loop() function and the latency can get worse than that.

The reason i am using a Teensy 4.0 is purely because they are currently available. I was using a 3.2 but they won't be back in stock until late 2023 at the earliest. Realistically, a microcontroller dev-board with 4 x 64-bit clocks would be brilliant but i don't think there are any that exist in a realistic price range.

I might have to give up the idea of using timers and just relying on elapsedMicros instead. Also, the timing method should be adjustable on the fly as the clock pulse positions can change from one pulse to the next with larger or greater gaps between pulses. This is all part of the sequencing functions of the project. I've tried intervalTimer but went over to elapsedMicros. Maybe i should try intervalTimer again and see how it goes on the Teensy 4.0.

Cheers

NM
 
I've just done a quick test with the intervalTimer library and it's stable as a rock. I've used the code below and i'm checking how long it takes for each step using elapsedMicros and it's showing me that each clock pulse comes in exactly 1uS late on each and every pulse. So 250,001 rather than the value of 250,000 that i've set it up to run at. In reality the values in the stepHalfTime[] array could be quite different for each step with a variance of +- 92%.

Nice. Didn't think it would be that stable but i was very wrong. Anyone know the max value of the intervalTimer library in microseconds? Is it a 16 or 32-bit value? What timers does the library use on the Teensy 4.0?

Code:
#include <SPI.h>
#include <IntervalTimer.h>
#include <elapsedMillis.h>

IntervalTimer myTimer;
elapsedMicros freerunTimer;

uint8_t clock1Pin = 2;
uint32_t stepHalfTime[33] = { 0, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000, 250000 }; // an array of step length values
volatile uint8_t clockPosition = 1;
volatile uint8_t clockState = 1;

void setup() {
  Serial.begin(115200);
  while (!Serial) {}
  Serial.println("============================");
  Serial.println("Teensy 4.0 Clock Timing Test");
  Serial.println("============================");

  pinMode(clock1Pin, OUTPUT);

  Serial.print("clockPosition = ");
  Serial.print(clockPosition);
  Serial.print(", clockState = ");
  Serial.println(clockState);

  myTimer.begin(timer_ISR, stepHalfTime[clockPosition]);
  myTimer.priority( 0 ); // set to highest priority
}

void timer_ISR( void ) {
  if (clockState) {
    clockState = 0;
  } else {
    clockState = 1;
    if (clockPosition + 1 > 32) {
      clockPosition = 1;
    } else {
      clockPosition++;
    }
  }

  uint32_t thisElapsedMicros = freerunTimer;
  freerunTimer = 0;
  
  myTimer.begin(timer_ISR, stepHalfTime[clockPosition]);
  digitalWrite(clock1Pin, clockState);
  
  Serial.print("thisElapsedMicros = ");
  Serial.println(thisElapsedMicros);
  Serial.print("clockPosition = ");
  Serial.print(clockPosition);
  Serial.print(", clockState = ");
  Serial.println(clockState);
}

void loop() {

}
 
I've just done a quick test with the intervalTimer library and it's stable as a rock. I've used the code below and i'm checking how long it takes for each step using elapsedMicros and it's showing me that each clock pulse comes in exactly 1uS late on each and every pulse. So 250,001 rather than the value of 250,000 that i've set it up to run at. In reality the values in the stepHalfTime[] array could be quite different for each step with a variance of +- 92%.

Nice. Didn't think it would be that stable but i was very wrong. Anyone know the max value of the intervalTimer library in microseconds? Is it a 16 or 32-bit value? What timers does the library use on the Teensy 4.0?

It's part of the Teensy4 core, not a library (see cores\Teensy4\IntervalTimer.cpp). It's a 32-bit timer, and it has a maximum interval of more than 178 sec.
 
It's part of the Teensy4 core, not a library (see cores\Teensy4\IntervalTimer.cpp). It's a 32-bit timer, and it has a maximum interval of more than 178 sec.

OK. So it's a 24MHz clock. That gives it a tick time of 41.67 nanoseconds and the accuracy seems to be within +-1 microsecond. Very nice.

The last question is can i have 4 of them running at once?

Cheers

NM
 
Check the source. Think there might be only two. Is there any relationship between the 4 frequencies that would allow one time to support more than one frequency?
 
IntervalTimer runs on the PIT timers (at least on teensy4 i believe), and on Teensy4 these share the interrupt as you pointed out.

I am running 4 clock outputs from the Teensy 4.0. They are for an audio project as a clock source for sequencing purposes. That means i need 4 independent timers. The PIT timers would be good to use as they are 32-bit but they share a common ISR. This makes them unusable as far as i can tell unless i just use the ISR to compare values and call another function which is usually to be avoided. The only other timers on the Teensy 4.0 (that there are 4 of) are the QTimers.

NM

looking at the source this is exactly what IntervalTimer is doing, its calling one Interrupt function and from there it calls the registered function for each timer from a function table.

you can have more resolution by setting the PIT timer clock to 150Mhz, but then maximum time will be shorter than 60 seconds, it also looks like IntervalTimer is set up to use 24Mhz
 
I can't find anywhere in the elapsedMillis documentation page on PJRC that states what timers it uses.

Does anyone know?

Cheers

NM
 
I can't find anywhere in the elapsedMillis documentation page on PJRC that states what timers it uses.

Does anyone know?

Cheers

NM

It does not rely on any timers, but "class elapsedMillis" uses millis() for time reference:

...
elapsedMillis(void) { ms = millis(); }
...
 
elapsedMillis uses millis(), and millis() simply returns systick_count, which is updated in the ISR for the ARM SysTick timer. The Teensy4 systick_isr() from EventResponder.cpp and millis() from core_pins.h are shown below. The SysTick timer generates interrupts at 1 kHz, and that running count is the basis for millis() and ElapsedMillis. There's only so much detail you can ever find on the PJRC site/forum/wiki, so I use NotePad++ as a convenient and fast way to search the Teensy cores and libraries. I'm sure there are other similar tools.

Code:
extern "C" void systick_isr(void)
{
	systick_cycle_count = ARM_DWT_CYCCNT;
	systick_millis_count++;
}

Code:
static inline uint32_t millis(void)
{
	return systick_millis_count;
}

edit: once again, @defragster responds succinctly while I'm dithering over details!
 
It does not rely on any timers, but "class elapsedMillis" uses millis() for time reference:

...
elapsedMillis(void) { ms = millis(); }
...

So, how does millis() and micros() get their timestamps? do they use timers?

I ask because i am using elapsedMicros to test the timing of intervalTimer and it's jumping about between 124,000 125,000 and 126,000 microseconds. It only goes up and down in increments of 1,000 microseconds when it should be more accurate than that, counting in 1 microsecond increments. I was wondering if my use of intervalTimer is causing this because they used the same timer.

Cheers

NM
 
So, how does millis() and micros() get their timestamps? do they use timers?

I ask because i am using elapsedMicros to test the timing of intervalTimer and it's jumping about between 124,000 125,000 and 126,000 microseconds. It only goes up and down in increments of 1,000 microseconds when it should be more accurate than that, counting in 1 microsecond increments. I was wondering if my use of intervalTimer is causing this because they used the same timer.

Cheers

NM

millis() provides 1 ms accuracy. If you want to test the accuracy of IntervalTimer, read the ARM cycle counter in your callback function, compute the difference between cycle count values, and then convert the difference to nanoseconds

ns = cycles/(1E9/F_CPU))
 
elapsedMillis uses millis(), and millis() simply returns systick_count, which is updated in the ISR for the ARM SysTick timer. The Teensy4 systick_isr() from EventResponder.cpp and millis() from core_pins.h are shown below. The SysTick timer generates interrupts at 1 kHz, and that running count is the basis for millis() and ElapsedMillis. There's only so much detail you can ever find on the PJRC site/forum/wiki, so I use NotePad++ as a convenient and fast way to search the Teensy cores and libraries. I'm sure there are other similar tools.

Code:
extern "C" void systick_isr(void)
{
	systick_cycle_count = ARM_DWT_CYCCNT;
	systick_millis_count++;
}

Code:
static inline uint32_t millis(void)
{
	return systick_millis_count;
}

edit: once again, @defragster responds succinctly while I'm dithering over details!

lol. Yup, there is good support on this forum.

I was getting microsecond accuracy when i was using elapsedMicros before. I'm not sure if i've used it for timing testing since i moved to the Teensy 4.0 from the 3.2. Does the Teensy 3.2 and 4.0 have systicks that run at different speeds?

Cheers
 
millis() provides 1 ms accuracy. If you want to test the accuracy of IntervalTimer, read the ARM cycle counter in your callback function, compute the difference between cycle count values, and then convert the difference to nanoseconds

ns = cycles/(1E9/F_CPU))

Ah, now that works well. Thanks joe.

Getting very accurate times now.
 
elapsedMillis uses millis(), and millis() simply returns systick_count, which is updated in the ISR for the ARM SysTick timer. The Teensy4 systick_isr() from EventResponder.cpp and millis() from core_pins.h are shown below. The SysTick timer generates interrupts at 1 kHz, and that running count is the basis for millis() and ElapsedMillis. There's only so much detail you can ever find on the PJRC site/forum/wiki, so I use NotePad++ as a convenient and fast way to search the Teensy cores and libraries. I'm sure there are other similar tools.
...
edit: once again, @defragster responds succinctly while I'm dithering over details!

Detail dithering catches me at times too - I did take a minute to find the 'quote from code' ...

lol. Yup, there is good support on this forum.

I was getting microsecond accuracy when i was using elapsedMicros before. I'm not sure if i've used it for timing testing since i moved to the Teensy 4.0 from the 3.2. Does the Teensy 3.2 and 4.0 have systicks that run at different speeds?

Cheers

millis() and micros() are used for each of the respective 'elapsed' types.

Indeed T_4.x uses a slower timer for the millis() sys_tick so the micros() uses ARM_DWT_CYCCNT to interpolate micros offset from the last updated millis() sys_tick, otherwise the millis clock was only giving micros to about nearest 10us {so I wrote that in T_4.0 beta}

So it should still result in microsecond accuracy and repeatability, in fact AFAIK the micros() on T_4.x returns faster than on T_3.x. The T_4.x just does math from 600 MHz clock tick in under 40 CPU cycles, the micros() query on T_3.x has more complex clock query IIRC and math.
 
Back
Top