Best practices for usage of elapsedMicros, elapsedMillis, and ARM_DWT_CYCCNT

mnissov

Well-known member
I have a need for timing things in general, sometime particularly short durations. To this extent I have a couple of questions on best practices in this context

Suppose that I have something like the following
Code:
void loop(){
    if (event1_timer >= THRESHOLD1){
        // do some stuff
        event1_timer = 0;
        event2_flag = 1;
    }
    if (event2 && (event2_timer >= THRESHOLD2){
        // some other stuff
    }

    if (led_timer >= THRESHOLD_LED){
        digitalToggleFast(_led_);
        led_timer = 0
    }
}

where I have 2 events (one dependent on the other) and an LED heartbeat.

I've picked up the habit of using elapsedMicros and elapsedMillis in such instances, where I use one of these objects for each event and set it to 0 during the event. My logic in this is that checking for ">=" is cheaper than "event1_timer - THRESHOLD1 >= 0"

Of course this means that each of this event timers can really only be used for one thing. So given 6 events with unique timing needs, I'll need 6 instances of elapsedMicros/Millis, depending on the durations.

My questions are:
  • is there an explicit point where it begins to "cost more" to use so many instances vs making the subtraction and comparison
  • does it even really make any difference to set to 0 to avoid this subtraction
  • In what circumstances (given timing allows without overflow) does it make better sense to use something like ARM_DWT_CYCCNT as opposed to elapsedMicros?
    • Obviously ARM_DWT_CYCCNT has better resolution, but does it have the same drift as elapsedMicros? I'm just unsure what the use case for ARM_DWT_CYCCNT over elapsedMicros would be, assuming both offer sufficient resolution and duration
 
What are you trying to accomplish by optimizing a few cpu cycles on that feature. those elapsed objects are unsigned long values that do some subtraction with millis() and micros(). If I remember correctly how CPUs work, adding (and substracting) integers as well as comparing them takes up 1 cpu cycle each. So the difference between both options is maybe 1-2 cpu cycles and 4 bytes or ram. The moment you need to optimize your code on that level, you don't need to ask how much resources both variants take.
I would say, without knowing your project: Optimize when there is a reason for it and do it where you have the biggest impact. In your example you letting a LED blink. Little research of how LEDs are dimmed (blinking without the human eye recognizing to imitate dimming) is somewhere between 300 and 5000 times per second. If you are approaching the 5000 mark, you have 120000 cpu cycles available for each loop. No reason to waste time because of one or 2 of them.

btw; if you need "perfect timing" for something, you might think about an InterruptTimer. That way the other functionality is "paused" when the estimated time is reached. That way you won't loose accuracy by having other code have impact on your time-counting.
If you want to skip that, but still make the timing more accurate, instead of resetting the timer to 0 think about subtracting the value you test against. That way your won't accumulate small errors. (like entering the inner block of the if at THRESHOLD+n because the code that was executed between those ifs took more time, than the difference between the timer and the threshold.)
 
i think you misunderstand me, it's less for never-ending optimization and more for learning good practices, particularly regarding ARM_DWT_CYCCNT. I feel I've seen this mentioned but without a rationale for when it makes sense to use.

Edit: also the led intention was as a heartbeat, so it's blinking at 1hz. Not that it really matters, this was a simple example to convey how I have different timings for different purposes.
 
To quote Paul on the Teensy 4.1 product page in the section about elapsedMicros and elapsedMillis:
The number of these variable is only limited to the available memory.
I think that says it all. it’s a class that holds a 32bit integer and a few methods that do simple math on the millis() and micros() calls. As long as you can spare 4 bytes; I would say: use one of them per task. Writing code clean helps understanding what’s going on way better than most comments.

Same with resetting it or substracting… if you don’t reset the timer, you are missing the point of those objects. They exists, so you don’t have to calculate the difference between now and the last measure points every time you are using it.
in core Arduino there is millis() and micros(). if you want to know the elapsedTime, you have something like that:
Code:
uint32_t start = micros();
…..
uint32_t elapsed = micros() - start;
start = micros();
As the elapsed obects are doing exactly that, the reason for their existence is, that you don’t have to do calculation over and over all over the place. So not resetting the elapsed object is taking away the reason why you are using it.

The cycle count has a higher resolution, but is dependent on the CPU clock to measure time. As you can change the cpu clock, you cannot rely on it beeing 600.000.000 per second. i don’t know, if you can Name best practices here, as it is nothing but a decision on precision. I would say: only use it, if it is really important to your application to measure at a higher resolution than 1/millions of a second.
 
All three are perfectly good options: Millis, Micros, ARM_DWT_CYCCNT. Which to use depends on the needs and limitations of the timing at hand.

All can be used 'bare' in the fashion of the p#4 code:
Code:
uint32_t start = micros();
….. // THIS IS TIMED
uint32_t elapsed = micros() - start;

The 'Elapsed' class variables encapsulate that as noted for great utility, and typically resetting to ZERO offers the 'best use' for designed purpose and preventing any issues of overflow {overflow of the 32 bit variable} or wrapping when the time measured takes into account the limits noted below.
Reading the Millis single system value is faster than Micros that reads the Millis info and uses ARM_DWT_CYCCNT to resolve to Micros in under 40 cpu cycles. Reading the ARM_DWT_CYCCNT is only about 3 cycles, and very precise to the count of F_CPU/(F_CPU_ACTUAL) speed.

Millis won't wrap for days.
Micros wraps in about 71.582788266666666666666666666667 MINUTES << EDITED
ARM_DWT_CYCCNT will wrap in just over 7 seconds.

The unsigned 32 bit math works for a single overflow for comparisons - but fails when it wraps past a single total count of the unsigned 32 bit INT.
 
Last edited:
micros overflows after 71 minutes right? Or am I remembering incorrectly.

Reading the Millis single system value is faster than Micros that reads the Millis info and uses ARM_DWT_CYCCNT to resolve to Micros in under 40 cpu cycles. Reading the ARM_DWT_CYCCNT is only about 3 cycles, and very precise to the count of F_CPU/(F_CPU_ACTUAL) speed.
I had a couple questions to this, I know there's the systick interrupt which goes at 1 ms. So I figure millis uses this in part for timekeeping. I was curious do micros and ARM_DWT_CYCCNT suffer from the same drift? Or is there any observation anywhere of the drift to be expected in ARM_DWT_CYCCNT?

Also, I guess reading ARM_DWT_CYCCNT is very fast, but the calculation for conversion to an intuitive time base probably costs a little as well right? So replacing a "micros" call with a "ARM_DWT_CYCCNT" call and considering the different in computation is probably slightly more complicated. Though, naively, I would still guess ARM_DWT_CYCCNT to be faster.
 
micros overflows after 71 minutes right? Or am I remembering incorrectly.


I had a couple questions to this, I know there's the systick interrupt which goes at 1 ms. So I figure millis uses this in part for timekeeping. I was curious do micros and ARM_DWT_CYCCNT suffer from the same drift? Or is there any observation anywhere of the drift to be expected in ARM_DWT_CYCCNT?

Also, I guess reading ARM_DWT_CYCCNT is very fast, but the calculation for conversion to an intuitive time base probably costs a little as well right? So replacing a "micros" call with a "ARM_DWT_CYCCNT" call and considering the different in computation is probably slightly more complicated. Though, naively, I would still guess ARM_DWT_CYCCNT to be faster.

Indeed 71.582788266666666666666666666667 Minutes - the question mark was there for a reason ... Opps

There is a 1 ms interrupt that is used to track Millis() systick - that is also the base of Micros() response extended with ARM_DWT_CYCCNT offset since last systick.

ARM_DWT_CYCCNT is a running count with rollover at F_CPU_ACTUAL, typ 600 MHz.

Seems the same crystal that generates the systick is the one upscaled to run the CPU cycles?

The T_4.1 when getting a clear GPS PPS seems to be showing:
Code:
... cyc diff    85 	err= 0.14 us	 Mn=   85 Mx=   85 P= 999996.56

So the Teensy intervalTimer at 999996.56 us is only 85 cycles off as measured (twin interrupts: GPS PPS Pin and intervalTimer) in the current code here.
 
Back
Top