Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 10 of 10

Thread: ISR latency Teensy 3.1 and 3.6

  1. #1
    Senior Member
    Join Date
    Mar 2015
    Location
    UK
    Posts
    255

    ISR latency Teensy 3.1 and 3.6

    Looking for a little guidance on interrupt service routine timing...

    I want to use an FTM timer to compare two positive going edge signals (repeating) on two input pins configured as CH0 and CH1. I am concerned about the latency which exists from the instant of the rising edge to the instant that the first line of code executes in an associated ISR routine.

    From memory, I think the latency is about 2 uSec for Teensy 3.1 running at 120 MHz. Can anyone confirm this? What about Teensy 3.6 running at 240 MHz - how long is this likely to be?

    I'm also puzzled why it should take what seems to be such a long time for just dumping registers on the stack - 2 uSecs seems a lot of clock cycles.

    Is there any way to speed this up?

    Thanks for any info/confirmation of timing.

  2. #2
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,998
    Last edited by manitou; 07-21-2017 at 04:43 PM.

  3. #3
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    Why do you care about latency? The FTM timers can capture the timestamp, if you only want to measure the time between CH0 / CH1 edge.

    Interrupts have a fair amount of latency, but if you are seeing 2us, you have inefficient code.

    You could transfer the captured timer values into a ring-buffer using DMA:
    https://forum.pjrc.com/threads/37148...l=1#post124805

  4. #4
    Senior Member
    Join Date
    Mar 2015
    Location
    UK
    Posts
    255
    Thanks. Useful references.

    Hmm... a lot more to think about since the simple days of the Intel 8259. Just goes to show how we take "Teensyduino" for granted.

    Bill Grundmann's blog was enlightening. I'm guessing that the Teensy core files push all the registers (including the floating point?) and restore these as a "wrapper" around the ISR that I may write. That's why it takes so many clocks. Speed up might then be a case of saving fewer registers - if these are not altered by my ISR statements - but this will need care, if I write my own attachinterrupt routine.

  5. #5
    Senior Member
    Join Date
    Mar 2015
    Location
    UK
    Posts
    255
    @tni - don't fully understand the timestamp comment, but its sounds really useful. I was thinking that the FTM channel would capture the counter value at the instant of the transition (a 16 bit value) and then the ISR would read this into another variable and reset everything for the next rising edge. This is all my ISR code would do for two signal sources. Very happy to be shown another way - yes, I'm only interested in comparing edge timing from one signal source with those of another to get phase lock between them.

    One complication is the edges from the primary source are not fully "periodic". There are large (periodic) gaps of random noise in the pulse train.

    The 2 uSec was not my code - I thought this was the latency before my code begins. If its much less than this, then my memory was playing tricks. I'll measure it.

    Never used DMA before - but again it sounds useful. I'm interested too in your post124805 suggesting that positive edges and negative edges can be both captured and distinguished between them from the same pin. I will read this more carefully.

  6. #6
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    Quote Originally Posted by TelephoneBill View Post
    @tni - don't fully understand the timestamp comment, but its sounds really useful. I was thinking that the FTM channel would capture the counter value at the instant of the transition (a 16 bit value)
    Yes, but the timer can do this for each channel. CH0 captures the first edge; CH1 captures the second edge. You can read both captured values in the CH1 interrupt and compute the delta.

  7. #7
    Senior Member
    Join Date
    Mar 2015
    Location
    UK
    Posts
    255
    Today I measured the ISR latency for Teensy 3.1...

    I configured FTM1 CH0 as an output counting 300 pulses of the 60 MHz peripheral bus clock (Teensy 3.1 = 120 MHz overclock) between toggling states. This effectively divides the 60 MHz clock to produce a 100 KHz square wave on pin3. I also configured pin21 as an output which I could set high or low in code, and I wrote an ISR routine for FTM1 using the TOF (timer overflow flag) as an interrupt trigger. The first statement in the ISR routine was "digitalWritefast(21, 1)" which sent pin21 high.

    So the timing between an edge transition of the 100 KHz and pin21 going high gave me a delay value which equals the "ISR latency period" + "execution time for digitalWriteFast (1 clock cycle of 8.33 nanoSec)".

    Here is a scope picture of that delay. Trace 4 (blue) is the rising edge of 100 KHz, and Trace 3 (purple) is the rising edge of pin21.

    Click image for larger version. 

Name:	FTM1 ISR timing.jpg 
Views:	52 
Size:	64.0 KB 
ID:	11052

    As you can see, the timing delay is 230 nanoSecs, so subtracting the statement execution time of about 8 nanoSecs, this makes the ISR latency timing to be 222 nanoSecs. My memory was therefore playing tricks thinking it was ten times longer (2uSec).

    I did notice occasional flicker of Trace 3 with a longer delay, but this was not often. I presume this would be some Teensyduino system interrupt taking priority over the FTM1 interrupt routine. Perhaps someone with a deeper knowledge could explain the cycles required to push the registers onto the stack and relate this to the 222 nanoSecs latency.

  8. #8
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,998
    Can you post your code? did you put ISR in FASTRUN mode?
    The Cortex M4 URL in post #2, suggests 12 cycles latency to enter ISR, plus possibly another 17 cycles if FPU regs are saved. I'm not sure if it's always saving FPU regs, or if it's smart enough to know if float has been used. 12+17 cycles = 29 cycles = 241 us @120mhz, so that's pretty close. It could be 12 cycles plus the various delays mentioned in the M4 URL ... above my pay grade.

    EDIT: Ooops, you said T3.1, so no FPU.

    EDIT 2: i implemented sketch as you described. T3.2@120mhz with FASTRUN on ISR, i get 132 ns delay before digitalWriteFast fires. Without FASTRUN i was seeing 216ns

    Click image for larger version. 

Name:	ftmisr.png 
Views:	61 
Size:	42.8 KB 
ID:	11053

    Code:
    // teensy 3 FTM timer and isr latency  with scope pin 16 and 14
    
    
    
    volatile unsigned long ticks;
    
    FASTRUN void ftm1_isr(void) {
      digitalWriteFast(14, HIGH);
      FTM1_SC &= ~FTM_SC_TOF;  // reset interrupt
      ticks = 1;
    }
    
    
    void init_pwm(int khz) {
      CORE_PIN16_CONFIG = PORT_PCR_MUX(3) | PORT_PCR_DSE | PORT_PCR_SRE; // enable PWM
      //  enabled by init SIM_SCGC6 |= SIM_SCGC6_FTM1;
      // FTM1_CnSC set by init
      NVIC_ENABLE_IRQ(IRQ_FTM1);
      FTM1_SC = 0;
      FTM1_CNT = 0;
      FTM1_MOD = (F_BUS / 1000) / khz - 1;
      FTM1_C0V = (F_BUS / 3000) / khz - 1;
      FTM1_SC = FTM_SC_CLKS(1) | FTM_SC_PS(0) | FTM_SC_TOF | FTM_SC_TOIE;
    }
    
    void setup() {
      pinMode(14, OUTPUT);
      init_pwm(40);    // pwm freq in khz on pin 16
      spin();
    }
    
    FASTRUN void spin() {
      while (1) {
        if (ticks) {
          digitalWriteFast(14, 0);
          ticks = 0;
        }
      }
    }
    
    void loop() {}
    I also tried to measure exit time, but that's fuzzy and varies. best i saw was 260ns (duration of pin 14 HIGH)

    FWIW, I also tested on T3.5@120mhz, scope looked the same, about 130ns (16 cycles) ISR enter latency, and about 260ns exit latency.
    Last edited by manitou; 07-22-2017 at 09:57 PM.

  9. #9
    Senior Member
    Join Date
    Mar 2015
    Location
    UK
    Posts
    255
    @manitou - Thank you for your diligence. You beat me to the code. Mine was part of a larger program, so this morning I made a smaller test file which gave the same results as my previous post. Not much point in posting this now as I get the same figure of entry latency as you do without FASTRUN. I had overlooked exit latency, so thanks for the reminder.

    I confess that FASTRUN was not something familiar to me. I remember reading something Paul wrote way back about the cpu running at two different speeds on overclock (depending how things are coded) so that would explain it.

    If you are seeing 132 nS with FASTRUN, then this suggest 16 cycles @8.333 nS per cycle (120 Mhz) = 133 nS. So I guess that FASTRUN is needed (as Paul's comment) to get the cpu clock rate up to 120 MHz, and my code without it must be running at 222 nS divided by 16 = approx 72 MHz. Seems to make sense as this is the fastest option in the Teensyduino environment without overclock - that is, if you want to achieve overclock speed, then need to use FASTRUN tag. I've learnt something new [again :-) !!].

    I also forgot to increase the priority of ISR. This morning, I upped the ISR priority to "0" and the Systick down to "32". I saw an important difference on the scope. There was a little random jitter on pin21 but nowhere near as much as before. (I wonder what had greater priority still if anything?).

    Thanks also for the tests on T3.5. Very useful to know. (Note - I'm working on an update to my Teensy precision timing project. I should have something of great interest to post soon and this thread has helped a lot. My new project involves something called Loran C radio signals.)

  10. #10
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,166
    ARM claims the interrupt latency is supposed to be 12 cycles, if running from zero-wait memory. So your measurement of 16 sounds about right, since a few cycles would be needed to put constants into the registers and then write to the GPIO register.

    Quote Originally Posted by TelephoneBill View Post
    I saw an important difference on the scope. There was a little random jitter on pin21 but nowhere near as much as before. (I wonder what had greater priority still if anything?).
    When DMA is thrown into the mix, the DMA access to memory can sometime add a cycle or two of delay.

    Some DMA, like the USB host and even certain (seldom used) settings on the normal DMA controller can do multi-word burst access, which ties the memory bus up for more cycles. Generally that's worthwhile since it gives an incredible bandwidth boost for those peripherals which need high speed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •