ISR latency Teensy 3.1 and 3.6

Status
Not open for further replies.

TelephoneBill

Well-known member
Looking for a little guidance on interrupt service routine timing...

I want to use an FTM timer to compare two positive going edge signals (repeating) on two input pins configured as CH0 and CH1. I am concerned about the latency which exists from the instant of the rising edge to the instant that the first line of code executes in an associated ISR routine.

From memory, I think the latency is about 2 uSec for Teensy 3.1 running at 120 MHz. Can anyone confirm this? What about Teensy 3.6 running at 240 MHz - how long is this likely to be?

I'm also puzzled why it should take what seems to be such a long time for just dumping registers on the stack - 2 uSecs seems a lot of clock cycles.

Is there any way to speed this up?

Thanks for any info/confirmation of timing.
 
Last edited:
Thanks. Useful references.

Hmm... a lot more to think about since the simple days of the Intel 8259. Just goes to show how we take "Teensyduino" for granted.

Bill Grundmann's blog was enlightening. I'm guessing that the Teensy core files push all the registers (including the floating point?) and restore these as a "wrapper" around the ISR that I may write. That's why it takes so many clocks. Speed up might then be a case of saving fewer registers - if these are not altered by my ISR statements - but this will need care, if I write my own attachinterrupt routine.
 
@tni - don't fully understand the timestamp comment, but its sounds really useful. I was thinking that the FTM channel would capture the counter value at the instant of the transition (a 16 bit value) and then the ISR would read this into another variable and reset everything for the next rising edge. This is all my ISR code would do for two signal sources. Very happy to be shown another way - yes, I'm only interested in comparing edge timing from one signal source with those of another to get phase lock between them.

One complication is the edges from the primary source are not fully "periodic". There are large (periodic) gaps of random noise in the pulse train.

The 2 uSec was not my code - I thought this was the latency before my code begins. If its much less than this, then my memory was playing tricks. I'll measure it.

Never used DMA before - but again it sounds useful. I'm interested too in your post124805 suggesting that positive edges and negative edges can be both captured and distinguished between them from the same pin. I will read this more carefully.
 
@tni - don't fully understand the timestamp comment, but its sounds really useful. I was thinking that the FTM channel would capture the counter value at the instant of the transition (a 16 bit value)

Yes, but the timer can do this for each channel. CH0 captures the first edge; CH1 captures the second edge. You can read both captured values in the CH1 interrupt and compute the delta.
 
Today I measured the ISR latency for Teensy 3.1...

I configured FTM1 CH0 as an output counting 300 pulses of the 60 MHz peripheral bus clock (Teensy 3.1 = 120 MHz overclock) between toggling states. This effectively divides the 60 MHz clock to produce a 100 KHz square wave on pin3. I also configured pin21 as an output which I could set high or low in code, and I wrote an ISR routine for FTM1 using the TOF (timer overflow flag) as an interrupt trigger. The first statement in the ISR routine was "digitalWritefast(21, 1)" which sent pin21 high.

So the timing between an edge transition of the 100 KHz and pin21 going high gave me a delay value which equals the "ISR latency period" + "execution time for digitalWriteFast (1 clock cycle of 8.33 nanoSec)".

Here is a scope picture of that delay. Trace 4 (blue) is the rising edge of 100 KHz, and Trace 3 (purple) is the rising edge of pin21.

FTM1 ISR timing.jpg

As you can see, the timing delay is 230 nanoSecs, so subtracting the statement execution time of about 8 nanoSecs, this makes the ISR latency timing to be 222 nanoSecs. My memory was therefore playing tricks thinking it was ten times longer (2uSec).

I did notice occasional flicker of Trace 3 with a longer delay, but this was not often. I presume this would be some Teensyduino system interrupt taking priority over the FTM1 interrupt routine. Perhaps someone with a deeper knowledge could explain the cycles required to push the registers onto the stack and relate this to the 222 nanoSecs latency.
 
Can you post your code? did you put ISR in FASTRUN mode?
The Cortex M4 URL in post #2, suggests 12 cycles latency to enter ISR, plus possibly another 17 cycles if FPU regs are saved. I'm not sure if it's always saving FPU regs, or if it's smart enough to know if float has been used. 12+17 cycles = 29 cycles = 241 us @120mhz, so that's pretty close. It could be 12 cycles plus the various delays mentioned in the M4 URL ... above my pay grade.

EDIT: Ooops, you said T3.1, so no FPU.

EDIT 2: i implemented sketch as you described. T3.2@120mhz with FASTRUN on ISR, i get 132 ns delay before digitalWriteFast fires. Without FASTRUN i was seeing 216ns

ftmisr.png

Code:
// teensy 3 FTM timer and isr latency  with scope pin 16 and 14



volatile unsigned long ticks;

FASTRUN void ftm1_isr(void) {
  digitalWriteFast(14, HIGH);
  FTM1_SC &= ~FTM_SC_TOF;  // reset interrupt
  ticks = 1;
}


void init_pwm(int khz) {
  CORE_PIN16_CONFIG = PORT_PCR_MUX(3) | PORT_PCR_DSE | PORT_PCR_SRE; // enable PWM
  //  enabled by init SIM_SCGC6 |= SIM_SCGC6_FTM1;
  // FTM1_CnSC set by init
  NVIC_ENABLE_IRQ(IRQ_FTM1);
  FTM1_SC = 0;
  FTM1_CNT = 0;
  FTM1_MOD = (F_BUS / 1000) / khz - 1;
  FTM1_C0V = (F_BUS / 3000) / khz - 1;
  FTM1_SC = FTM_SC_CLKS(1) | FTM_SC_PS(0) | FTM_SC_TOF | FTM_SC_TOIE;
}

void setup() {
  pinMode(14, OUTPUT);
  init_pwm(40);    // pwm freq in khz on pin 16
  spin();
}

FASTRUN void spin() {
  while (1) {
    if (ticks) {
      digitalWriteFast(14, 0);
      ticks = 0;
    }
  }
}

void loop() {}

I also tried to measure exit time, but that's fuzzy and varies. best i saw was 260ns (duration of pin 14 HIGH)

FWIW, I also tested on T3.5@120mhz, scope looked the same, about 130ns (16 cycles) ISR enter latency, and about 260ns exit latency.


------------
Changing the ISR to set and clear pin 14 with digitalWriteFast() as described in https://www.nxp.com/docs/en/application-note/AN12078.pdf
the ISR latency of T3.5 was measured at 12 cycles.
ftmisr6.png
Yellow is PWM on pin 16, blue is pin 14. For my slow scope I had to add 6 nop's (50 ns) between the pin 14 set and clear.
 
Last edited:
@manitou - Thank you for your diligence. You beat me to the code. Mine was part of a larger program, so this morning I made a smaller test file which gave the same results as my previous post. Not much point in posting this now as I get the same figure of entry latency as you do without FASTRUN. I had overlooked exit latency, so thanks for the reminder.

I confess that FASTRUN was not something familiar to me. I remember reading something Paul wrote way back about the cpu running at two different speeds on overclock (depending how things are coded) so that would explain it.

If you are seeing 132 nS with FASTRUN, then this suggest 16 cycles @8.333 nS per cycle (120 Mhz) = 133 nS. So I guess that FASTRUN is needed (as Paul's comment) to get the cpu clock rate up to 120 MHz, and my code without it must be running at 222 nS divided by 16 = approx 72 MHz. Seems to make sense as this is the fastest option in the Teensyduino environment without overclock - that is, if you want to achieve overclock speed, then need to use FASTRUN tag. I've learnt something new [again :) !!].

I also forgot to increase the priority of ISR. This morning, I upped the ISR priority to "0" and the Systick down to "32". I saw an important difference on the scope. There was a little random jitter on pin21 but nowhere near as much as before. (I wonder what had greater priority still if anything?).

Thanks also for the tests on T3.5. Very useful to know. (Note - I'm working on an update to my Teensy precision timing project. I should have something of great interest to post soon and this thread has helped a lot. My new project involves something called Loran C radio signals.)
 
ARM claims the interrupt latency is supposed to be 12 cycles, if running from zero-wait memory. So your measurement of 16 sounds about right, since a few cycles would be needed to put constants into the registers and then write to the GPIO register.

I saw an important difference on the scope. There was a little random jitter on pin21 but nowhere near as much as before. (I wonder what had greater priority still if anything?).

When DMA is thrown into the mix, the DMA access to memory can sometime add a cycle or two of delay.

Some DMA, like the USB host and even certain (seldom used) settings on the normal DMA controller can do multi-word burst access, which ties the memory bus up for more cycles. Generally that's worthwhile since it gives an incredible bandwidth boost for those peripherals which need high speed.
 
Status
Not open for further replies.
Back
Top