micros() inside interrupt ?

Jp3141

Well-known member
I have 2 external pin interrupts to my Teensy 3.2 (48 MHz)-- one at ~60 Hz, the other at 1 Hz (see project at https://github.com/jp3141/60Hz)

Here's (what I think are) the relevant parts of the code. The main loop prints the value of 'phase' every approx. 5 s. Phase should be between 0 and 360. This works well, except occasionally (3 times in 6 days), phase becomes 4294 (and no other values > 360 occur). Phase is declared as volatile.

So: given that the 1 Hz 60 Hz interrupts have their own ISR routines -- and the 60 Hz can interrupt the 1 Hz routine -- and PhaseMicros should get reset to 0 every 1/60 s (i.e. every 16666 us), how do I get these rare 4294 values ?

At PhaseMicros of 16666 us, Phase should calculate to 16666*360*60/1000000 = 360; it can't (shouldn't) get larger than this. Even if I missed one 60 Hz interrupt (not really possible) it would be 720 max.

I do notice that these corruptions tend to occur at instants where it is probable that the 60 Hz ISR has interrupted the 1 Hz one, but I can't see any reason for that to be a problem.

Given that this is a 32 bit MCU, I expect that all updates of the variables are atomic. However I don't know where the code for micros() is -- are its updates atomic ?

Any ideas ? This is awkward to debug as the errors are quite rare (< 1 per 2 days). I could improve debugging by using a function generator to send in (say) 60 kHz and 1 kHz interrupts, but that's awkward right now.


Code:
elapsedMicros PhaseMicros;
volatile uint32_t Phase;

void setup() {
...
}

void loop() {

[COLOR="#FF0000"][B]... // every 5 s read 'Phase' and print it.
[/COLOR][/B]
}

FASTRUN void ftm0_isr(void) { // 60 Hz
... // other stuff
    PhaseMicros = 0;
... // other stuff 
}

FASTRUN void ftm1_isr(void) { // 1 Hz; 
... // other stuff
    Phase = PhaseMicros * 360 * 60 / 1000000;
... // other stuff
}
 
The 4294 value you are printing is probably 0xffffffff = 4294967295 / 1000000. Have you thought about changing the code to
Code:
Phase = PhaseMicros * 36 * 6 / 10000;
just to reduce the chance of overflowing the numerator?
 
That's a small float number - probably resolved at compile time,

The result suggests the: PhaseMicros = 0; was missed for 12 cycles?

Also, without seeing all the code ...

To debug make a copy of PhaseMicros in the 1 Hz isr() to see what the value is used in the calc.

Perhaps in the 60Hz add volatile int count60++ each time it is entered, and in the 1 Hz isr() save a copy of that and set it to zero.

that might show if the two are cycling as expected without contention over the: elapsedMicros PhaseMicros;

Alternatively, rather than resetting PhaseMicros = 0; let it run and use volatile uint32_t PhaseMicrosNow = PhaseMicros; and in the 1 Hz isr() use (PhaseMicros - PhaseMicrosNow) for the distance between the readings.
 
Thanks -- I'll insert some debug code.

I did try to look at the compiler listing:
Code:
1fff895e:       f000 f84b       bl      1fff89f8 <__micros_veneer>
1fff8962:       4b1e            ldr     r3, [pc, #120]  ; (1fff89dc <ftm1_isr+0x124>)
    Phase = PhaseMicros * 360 * Target_Hz / 1000000;
1fff8964:       491e            ldr     r1, [pc, #120]  ; (1fff89e0 <ftm1_isr+0x128>)
1fff8966:       681b            ldr     r3, [r3, #0]
1fff8968:       4a1e            ldr     r2, [pc, #120]  ; (1fff89e4 <ftm1_isr+0x12c>)
1fff896a:       1ac0            subs    r0, r0, r3
1fff896c:       f245 4360       movw    r3, #21600      ; 0x5460
1fff8970:       fb03 f000       mul.w   r0, r3, r0
1fff8974:       fba1 3000       umull   r3, r0, r1, r0
1fff8978:       0c80            lsrs    r0, r0, #18
    Overflow_1Hz_Counter = 0;
1fff897a:       2300            movs    r3, #0
    } else {
      iFilter = 1 ;  // reset filter if 1 pps lost
      digitalWrite(LED, HIGH);     
    }
    prev_C0V_1Hz = C0V_1Hz; // save for next 1 Hz pulse
    Phase = PhaseMicros * 360 * Target_Hz / 1000000;
1fff897c:       6010            str     r0, [r2, #0]
    Overflow_1Hz_Counter = 0;

but don't understand why the relevant C line is there twice; also don't see any reference to a division by 1000000. The #21600 is 360*60 which the compiler pre-evaluated.

It is interesting that the 4294 I observe is (2^32-1)/1000000, but I don't know what could generate that. Since this computation occurs inside an interrupt could it interfere with micros() updating ?

Is there any need in code like this to cast the constants to unsigned long ?
 
In your code
Code:
  Phase = PhaseMicros * 360 * 60 / 1000000;

PhaseMicros is an object of type elapsedMicros. The access to the PhaseMicros in this line is not atomic but generates a quite some of code (see below). If it is interrupted by the 60Hz interrupt, which also accesses it, a lot of stuff might go wrong. To check, I'd simply wrap the line in noInterupts() / interupts() and see if the problem goes away. I personally would not use elapsedMicros or even micros for this. I'd read out the cyclecounter and and substract some start value as defragster suggested. If it doesn't fix the problem it would at least make debugging simpler :)

Here the disassembly of my test code (below) generated by the compilerExplorer which (IMHO) is a bit simpler to read. No detailed analysis, but my gut feeling is that there are a lot of chances to generate issues if the code generated for Phase = PhaseMicros * 360 * 60 / 1000000 is interrupted by ftm0_isr.

Code:
elapsedMicros PhaseMicros;
volatile uint32_t Phase;

void setup() {

}

void loop() {

// every 5 s read 'Phase' and print it.

}

FASTRUN void ftm0_isr(void) { // 60 Hz
 // other stuff
    PhaseMicros = 0;
 // other stuff
}

FASTRUN void ftm1_isr(void) { // 1 Hz;
 // other stuff
 Phase = PhaseMicros * 360 * 60 / 1000000;
 // other stuff
}

Code:
setup:
        bx      lr
loop:
        bx      lr
ftm0_isr:
        push    {r3, lr}
        bl      micros
        ldr     r3, .L5
        str     r0, [r3]
        pop     {r3, pc}
.L5:
        .word   .LANCHOR0
ftm1_isr:
        push    {r3, lr}
        bl      micros
        ldr     r3, .L9
        ldr     r1, .L9+4
        ldr     r3, [r3]
        ldr     r2, .L9+8
        subs    r0, r0, r3
        movw    r3, #21600
        mul     r0, r3, r0
        umull   r3, r0, r1, r0
        lsrs    r0, r0, #18
        str     r0, [r2]
        pop     {r3, pc}
.L9:
        .word   .LANCHOR0
        .word   1125899907
        .word   .LANCHOR1
_GLOBAL__sub_I_PhaseMicros:
        push    {r3, lr}
        bl      micros
        ldr     r3, .L13
        str     r0, [r3]
        pop     {r3, pc}
.L13:
        .word   .LANCHOR0
Phase:
PhaseMicros:
 
for sure the elapsed is extra code - and cyclecounter less than one tenth the overhead of even raw micros() - that is just part of what elapsed relies on.

Not resetting the elapsed was suggested to assure the code wouldn't get confused in the process. Assumed (not seeing all the code) both interrupts at same default level so they wouldn't interrupt each other.

The end suggestion in p#3 was to minimize micros fighting - and using cyclecounter instead would be even better instead of elapsed.

do this with cyclecounter perhaps Cycles instead of Micros:
Code:
Alternatively, rather than resetting PhaseMicros = 0; let it run and use volatile uint32_t PhaseMicrosNow = PhaseMicros; and in the 1 Hz isr() use (PhaseMicros - PhaseMicrosNow) for the distance between the readings.

... something like this?:
Code:
volatile uint32_t PhaseCycles;
volatile uint32_t Phase;

FASTRUN void ftm0_isr(void) { // 60 Hz
... // other stuff
    PhaseCycles = ARM_DWT_CYCCNT;
... // other stuff 
}

FASTRUN void ftm1_isr(void) { // 1 Hz; 
... // other stuff
    Phase = (ARM_DWT_CYCCNT - PhaseCycles) * 360 * 60 / F_CPU;
... // other stuff
}

// reading ARM_DWT_CYCCNT is about 3 cycles, and a touch more resolution
// reading micros() is about 36 cycles - with code behind elapsed that is added as well
 
Last edited:
defragster said:
Assumed (not seeing all the code) both interrupts at same default level so they wouldn't interrupt each other.
The code is in the GitHub repo linked above, here the relevant lines setting the priorities: https://github.com/jp3141/60Hz/blob/ece1dc3115d05211b050e2917d85dd326e76d684/60Hz.ino#L121

defragster said:
... something like this?:

I'd say this looks much better :) (Board is T3.2, need to replace F_CPU_ACTUAL by F_CPU to make it compile)

Accces to PhaseCycles is atomic now. The code for the phase calculation is weird (but most probably correct), maybe someone else can shed some light on this

Code:
ftm0_isr:
        ldr     r2, .L2         // load address of cyclecounter into r2 (FFFF FFFF E000 1004)
        ldr     r3, .L2+4       // load address of PhaseCycles var to r3
        ldr     r2, [r2]        // load cycCnt value  to r2
        str     r2, [r3]        // and store it in PhaseCycles
        bx      lr
.L2:
        .word   -536866812      // 0xE0001004 ARM_DWT_CYCCNT
        .word   .LANCHOR0       // address of PhaseCycles
ftm1_isr:
        ldr     r2, .L5         // r2 = &ARM_DWT_CYCCNT  (FFFF FFFF E000 1004)
        ldr     r3, .L5+4       // r3 = &PhaseCycles
        ldr     r2, [r2]        // r2 = cycCnt value  
        ldr     r3, [r3]        // r3 =  PhaseCycles value
        ldr     r0, .L5+8       // r0 = 375299969 
        ldr     r1, .L5+12      // r1 = &Phase 
        subs    r3, r2, r3      // r3 = cycCnt - PhaseCycles 
        movw    r2, #21600      // r2 = 360*60
        mul     r3, r2, r3      // r3 *= 21600       
        umull   r2, r3, r0, r3  // (r2,r3) = 375299969 * r3 ????
        lsrs    r3, r3, #22     //  r3 >> 22 (divide by 2^22)
        str     r3, [r1]        //  Phase = r3
        bx      lr
.L5:
        .word   -536866812      // 0xE0001004 ARM_DWT_CYCCNT
        .word   .LANCHOR0       // &PhaseCycles
        .word   375299969       // ??? 
        .word   .LANCHOR1       // &Phase
Phase:
PhaseCycles:

Edit: Of course it is not yet proven if the original issue is caused by interrupting the 1s ISR by the 60Hz ISR.... But it would be worth a try
 
Last edited:
Sorry Lutz, but asm does not seem to fit code (asm mentions 'micros')

micros is the address of the micros() function. The CompilerExplorer doesn't link, it therefore can't know the address of functions not defined in the translation unit. micros() is called when you access elapsedMicros variables.
 
micros is the address of the micros() function. The CompilerExplorer doesn't link, it therefore can't know the address of functions not defined in the translation unit. micros() is called when you access elapsedMicros variables.

I know, but in in code in #5 is no elapsedMicros (I know, I'm too picky) and you were saying "disassembly of my test code (below)", followed by a code snippet

OK I see it now, thesr IS a elapsedMicros declaration
 
OK I see it now, thesr IS a elapsedMicros declaration

Yes :)

The disassembly should resemble the source, unless I made some error setting up the compiler explorer. When I set that up in VisualTeensy a couple of weeks ago I compared the CompilerExplorer output with the asm from the corresponding lst file and didn't spot any differences. The CE sometimes uses alternative notation for some opcodes but other than this I found no differences.
 
The code is in the GitHub repo linked above, here the relevant lines setting the priorities: https://github.com/jp3141/60Hz/blob/ece1dc3115d05211b050e2917d85dd326e76d684/60Hz.ino#L121

I'd say this looks much better :) (Board is T3.2, need to replace F_CPU_ACTUAL by F_CPU to make it compile)

...

Easier to assume when the code wasn't linked so nicely :) Indeed priority is altered and unique.

Opps - T_3.2, edited snippet in p#6. Also note the cycle Counter isn't started by default on the T_3.2.

This code in Teensy4 for micros() (IIRC this tested to work on the T_3.2): PaulStoffregen/cores/blob/master/teensy4/delay.c#L70

Shows how this code could be wrapped to be atomic in the lower priority isr() - perhaps like:
Code:
#include "arm_math.h"	// micros() synchronization
uint32_t systick_safe_read;	 // micros() synchronization

FASTRUN void ftm1_isr(void) { // 1 Hz;
 // other stuff
	do {
		__LDREXW(&systick_safe_read);
		 Phase = (ARM_DWT_CYCCNT - PhaseCycles) * 360 * 60 / F_CPU;
	} while ( __STREXW(1, &systick_safe_read));
 // other stuff
}

That will repeat the 'Phase' math should ANY interrupt occur during the do{}while; to assure it is done 'atomically'.
 
Thanks all.

1) I now see the purpose and use of _LDREX & _STREX. I expect this should also work if an interrupt was nested between them that also used LDREX/STREX (around some completely different variables); likely the outer _STREX would also fail (falsely), but would succeed the 2nd time around.

2) Thanks to the pointer CompilerExplorer; the .lst was becoming intractable (I'll need to get more familiar with ARM asm). I still don't see how the '/ 1000000' gets implemented -- it might have something to do with the '375299969', but even then I don't understand why that wasn't combined with the '360 *60' ? -- in the end it could be implemented by a (64-bit ?) multiplication by 92771293 followed by a right shift of 32.

3) I guess I could wrap both uses (the assignment to 0, and the use in the Phase calculation) of PhaseMicros in 'noInterrupts() / interrupts()' -- but if that's really necessary in general, shouldn't the Arduino code have implemented that ?

4) in the end, I will try with ARM_DWT_CYCCNT (why isn't that enabled at Teensy startup ? power savings ?
Code:
ARM_DEMCR |= ARM_DEMCR_TRCENA;
ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
volatile uint32_t PhaseCycles;
and
Code:
PhaseCycles = ARM_DWT_CYCCNT;
.
.
.
Phase = (ARM_DWT_CYCCNT - PhaseCycles) * 360 * 60 / F_CPU;
as it is cleaner to me.
 
Hope that gets great results! Aways fun to find a puzzle to solve here :). Especially when it is just reusing code I wrote that had a fun special purpose :cool:.

#1: Please post the use of the _LDREX & _STREX if used and it works properly and easily. For READ usage that provides a great way be atomic! Having it repeat for writes could have 'repeat' side effects with interrupts enabled.

#2: Compiler Explorer is nice, will need to do a VisualTeensy install to have it handy. Wondering if the compiler does the 'integer' math and leaves the 'Hidden' Float math for runtime - given all the code is with integers?
> Maybe doing this would look different? const float foo = 360 * 60 / F_CPU;

#3: Having random 'noInterrupts() / interrupts()' inserted would be odd and ugly - if needed it would be in the source code as 'Arduino' doesn't do anything but hand the source to the compiler. The ElapsedMicros code is in a cpp CLASS. So maybe it isn't 'Interrupt safe' with shared volatile?

#4: Indeed, the use of ARM_DWT_CYCCNT is cleaner and faster and even on a T_3.2, 96 times more precise.
> T_3.x didn't have an inherent need to start the ARM_DWT_CYCCNT, so no reason to enable it. Not sure if it would have been enabled on the T_4.x if not for the solution needed for micros() to get 10X better resolution from the low speed clock provided for the millis() systick.
 
I'm going to try using ARM_DWT_CYCCNT. Since the intermediate multiplication result can overflow, I used 360LL to force a 64-bit calculation (the final result won't overflow).

Code:
(setup): 
volatile uint32_t PhaseMicrosARM0=0; // From internal counter; not really microseconds. 
...

(60_Hz ISR):
PhaseMicrosARM0 = ARM_DWT_CYCCNT;
...

(1_Hz ISR)
Phase = (ARM_DWT_CYCCNT-PhaseMicrosARM0) * 360LL * Target_Hz / F_BUS; // final result is 0..359; intermediate could be 64 bit (48M*360), so LL forces 64 bit

It's running now; will update in a week with results.
 
Last edited:
I can confirm that (by using ARM_DWT_CYCCNT) it is working as desired now. 48 h and no spurious values for Phase.
 
The ARM_DWT_CYCCNT is fast and handy and accurate.

Cool is that just flagged in the _isr() with the ARM_DWT_CYCCNT saved?
 
Back
Top