how to debug problem caused by micros() overrunning and going back to zero.

Status
Not open for further replies.

warpigs330

Active member
So I am working on a project where I use millis() and micros() to measure time. Right now I am doing this somewhat poorly, which causes my code to stall when micros() overruns and goes back to zero. I don't think this will be complicated to fix, but I don't want to wait 70 minutes or so every time to see if it is actually fixed. Is there a way to start the sketch with micros() almost out of space so I can test the overrun scenario quickly? Also I would love to hear y'all's general opinions about handling clock overruns. Also any techniques for debugging around edge cases that might not appear in normal testing would be handy.

To give an idea of what I am doing and why I am using millis() and micros(), Basically I am continuously taking samples from an ADC into a buffer, then when I press and hold a button I want to save the section of the buffer that was recorded while holding the button, to a second buffer (since the first is circular and constantly rewritten). I then want to replay the contents of that second buffer at the same rate as it was recorded. So to do this I take the time the button was pressed, the time it was released, and figure out a length (in time) of the recording from that. Also my recording buffer only takes a sample every 100 or so loops, and my recording playback interpolates between these values, so I am also using micros() to find the amount to move forward in the buffer every step.

Thanks for any help y'all can provide.
 
@luni had a 64 bit utility for such things IIRC - though just checked and didn't find it on github - maybe it was just a post ... or somebody else ...

If you use the ARM_DWT_CYCCNT value it runs 600 times faster than micros so you can see in under 8 seconds to test the overrun.
 
@luni Indeed has this for making a 64 bit version of ARM_DWT_CYCCNT:

github.com/luni64/TeensyTimerTool/.../Teensy/TCK/TckChannel.h#L172

The same would work for tracking the slower micros() value as well as long as it was called at least every 71 minutes.

Using the ARM_DWT_CYCCNT would give 600X better resolution if needed and was called each 7 seconds, and the data read of ARM_DWT_CYCCNT is 10X faster than getting system micros()

Code:
    // 64bit Counter -------------------------------------------------------------------------

    template <>
    inline uint64_t TckChannel<uint64_t>::getCycleCounter()
    {
        uint32_t now = ARM_DWT_CYCCNT; // (extend the cycle counter to 64 bit)
        if (now < lastCyCnt)
        {
            curHIGH++;
        }
        lastCyCnt = now;
        return (((uint64_t)curHIGH << 32) | now);
    }
 
This is awesome! thanks for the leads! To be honest the micros() resolution is already significantly more than I need, but this will be useful for testing the overrun condition.
 
Glad to help - Thanks @luni!

Not noted which Teensy? This will work as written on a T_4.x - for other T_3.x's the Cycle Counter needs to be started for that to work.

Wrote a sketch to see it - just yesterday timing with ARM_DWT_CYCCNT was rolling over here so I had to skip what I was doing, now this shows it working in 64 bits:
Code:
// 64bit Micros -------------------------------------------------------------------------
inline uint64_t micros64()
{
  static uint32_t lastCyCnt;
  static uint32_t curHIGH = 0;
  uint32_t now = micros(); // (extend the cycle counter to 64 bit)
  if (now < lastCyCnt)
  {
    curHIGH++;
  }
  lastCyCnt = now;
  return (((uint64_t)curHIGH << 32) | now);
}

// 64bit Counter -------------------------------------------------------------------------
inline uint64_t getCycleCounter()
{
  static uint32_t lastCyCnt;
  static uint32_t curHIGH = 0;
  uint32_t now = ARM_DWT_CYCCNT; // (extend the cycle counter to 64 bit)
  if (now < lastCyCnt)
  {
    curHIGH++;
  }
  lastCyCnt = now;
  return (((uint64_t)curHIGH << 32) | now);
}

elapsedMicros someTime;
uint64_t lastCycles;
void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);
  while (!Serial && millis() < 4000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  someTime=0;
  lastCycles = getCycleCounter();
}

void loop() {
  if ( someTime > 4000000 ) {
    uint64_t thisCycles = getCycleCounter();
    someTime -= 4000000;
    Serial.printf( "\tCurrent micros64()=%llu 0x%llx\n", micros64(), micros64() );
    Serial.printf( "Current getCycleCounter()=%llu 0x%llx  Cycles since last=%llu\n", thisCycles, thisCycles, thisCycles-lastCycles );
    lastCycles = thisCycles;
  }

}
 
Last edited:
@Paul - if you are reading - an elapsedCycles64 would be cool, and not roll over for 900+ years? And with ARM_DWT_CYCCNT reading in 3 cycles - even the 64 bit math and overhead might be faster than micros().
 
Is there a way to start the sketch with micros() almost out of space so I can test the overrun scenario quickly?

You could forcibly write to the 2 variables the core library uses, of course with the caveat future versions might change. But for what we have today, here's how you would do it.

Code:
extern "C" volatile uint32_t systick_millis_count;
extern "C" volatile uint32_t systick_cycle_count;

void setup() {
  while (!Serial) ; // wait
  Serial.println("micros rollover test");

  noInterrupts();
  systick_millis_count = 4290000;
  systick_cycle_count = ARM_DWT_CYCCNT;
  asm("dsb" ::: "memory"); // force actual write to memory
  interrupts();
}

void loop() {
  Serial.println(micros());
  delay(100);
}



Also I would love to hear y'all's general opinions about handling clock overruns.

The usual approach, which is used internally by elapsedMillis and elapsedMicros, is to subtract 32 bit unsigned integers. If the rollover is from 0xFFFFFFFF to zero (eg, does not roll over "early" with maximum value less than all bits 1s), and the total elapsed is less than 2^31, then subtracting always results in the correct answer. If this property of unsigned subtraction seems counter intuitive, just try a couple quick examples. The trick is to always store the answer in an unsigned variable, or typecast the result to unsigned if not storing in a variable, so the compiler doesn't try to treat as a signed integer.
 
@Paul - if you are reading - an elapsedCycles64 would be cool

Official time APIs using units not seconds-based are cool in the short term, but the long term effect on the software ecosystem is a lot of code which breaks when clock speeds change. Or a lot of #ifdef or extra error-prone work in user code. Not so cool.

Even if this were to be done, I'm pretty sure the code in msg #5 wouldn't be up to the task. For example, consider a program which wants to precisely log the time when some sort of unusual event occurs. As I understand that code, admittedly from only a quick look, it appears to assume it will be called more frequently than once every 2^31 / F_CPU seconds. That's approx 3.6 seconds today, and will become even shorter when we get faster chips.

This sort of code is tricky to get right. In fact, I'm not even 100% confident we're really using the ARM exclusive access instructions properly. I'm not going to look into that now, but I opened an issue to remember it for a rainy day.

https://github.com/PaulStoffregen/cores/issues/609

But just to be clear, if we someday add a more precise time API, it will be nanos() and elapsedNanoseconds, to complement the delayNanoseconds() we already have. People who want the best the hardware can do are welcome to access the ARM_DWT_CYCCNT hardware, but I'm not planning to ever add official APIs for non-seconds timing.
 
, it appears to assume it will be called more frequently than once every 2^31 / F_CPU seconds.

That's true of course. The snippet defragster copied is from a software timer where this is called from yield() so this shouldn't be a problem (if the timer isn't ticked much faster the whole software timer would be useless anyway). @warpigs330: Here some detailed info about extending the cycle counter to 64bit: https://github.com/TeensyUser/doc/wiki/Using-the-cycle-counter.

Anyway, the problem you originally wanted to solve shouldn't have an overflow issue.

E.g. this:
Code:
uint32_t start = micros();
delay(1000);
uint32_t delta = micros() - start;
Serial.println(delta);

will always print the correct delta regardless of an overflow of micros() as long as the delay is less than ~71.6 minutes
 
yeah, I figured I was doing it in a silly way. I was using floats for the time variables because I was wanting to do floating point math with them later, I didn't even think about the easy solution of unsigned ints. You don't need to worry about returning a negative number if there aren't any.
 
That's true of course. The snippet defragster copied is from a software timer where this is called from yield() so this shouldn't be a problem (if the timer isn't ticked much faster the whole software timer would be useless anyway). @warpigs330: Here some detailed info about extending the cycle counter to 64bit: https://github.com/TeensyUser/doc/wiki/Using-the-cycle-counter.

Anyway, the problem you originally wanted to solve shouldn't have an overflow issue.

E.g. this:
Code:
uint32_t start = micros();
delay(1000);
uint32_t delta = micros() - start;
Serial.println(delta);

will always print the correct delta regardless of an overflow of micros() as long as the delay is less than ~71.6 minutes

I thought that 'uint math' might be the trouble - but there was no code to see and it wasn't the question. And as noted I did have an overflow problem where a test was running over 7 seconds and was giving really odd numbers. so I could have used this.

So in the posted sketch there is a printed demonstration of " thisCycles-lastCycles ": Cycles since last=2400000000
Code:
Current getCycleCounter()=15362663747595 0xdf8e61d900b  Cycles since last=2400000051
	Current micros64()=25608440001 0x5f661c8c1
Current getCycleCounter()=15365063747595 0xdf9752aa80b  Cycles since last=2400000000
	Current micros64()=25612440001 0x5f69ed1c1
Current getCycleCounter()=15367463747595 0xdfa0437c00b  Cycles since last=2400000000
	Current micros64()=25616440001 0x5f6dbdac1
Current getCycleCounter()=15369863747595 0xdfa9344d80b  [B]Cycles since last=2400000000[/B]

In prior posts were notes that 7 seconds was a critical time - and the code prints each 4 seconds - plus or minus a couple dozen 'cycles'
 
Official time APIs using units not seconds-based are cool in the short term, but the long term effect on the software ecosystem is a lot of code which breaks when clock speeds change. Or a lot of #ifdef or extra error-prone work in user code. Not so cool.

Even if this were to be done, I'm pretty sure the code in msg #5 wouldn't be up to the task. For example, consider a program which wants to precisely log the time when some sort of unusual event occurs. As I understand that code, admittedly from only a quick look, it appears to assume it will be called more frequently than once every 2^31 / F_CPU seconds. That's approx 3.6 seconds today, and will become even shorter when we get faster chips.

This sort of code is tricky to get right. In fact, I'm not even 100% confident we're really using the ARM exclusive access instructions properly. I'm not going to look into that now, but I opened an issue to remember it for a rainy day.

https://github.com/PaulStoffregen/cores/issues/609

But just to be clear, if we someday add a more precise time API, it will be nanos() and elapsedNanoseconds, to complement the delayNanoseconds() we already have. People who want the best the hardware can do are welcome to access the ARM_DWT_CYCCNT hardware, but I'm not planning to ever add official APIs for non-seconds timing.

Opps, makes sense ... just a thought ... some part of one at least.

Indeed it needs to be called before rollover/passing the rollover of the last call, just like ARM_DWT_CYCCNT itself. Prior posts made note of that, everything has its limits.

Rollover fails at 2^31, or after (2^32)-2 when using unsigned 32 bit vars?

Interesting what might be wrong with Issue_609 and the micros() usage. Quick scan now found similar old links for reference usage.
 
..Indeed it needs to be called before rollover/passing the rollover of the last call, just like ARM_DWT_CYCCNT itself.

Looks like you are interested in this stuff :) Here some code which utilizes the seldom used RTC interrupt to update it every second. https://github.com/TeensyUser/doc/w...ock#the-periodic-timer-of-the-real-time-clock.

BTW: This writeup also shows how to implement a std::chrono compliant clock based on the cycle counter which syncs o the RTC. It counts time in ns since 1.1.1970. Unfortunately c++14 doesn't have a lot support for those clocks, so this is more a interesting exercise than being really useful. c++20 will add some stuff to make those std::chrono clocks much more useful.
 
Re: OP problem the p#10 code is a good solution with uint32_t math as used here - as long as micros() is high enough resolution and doesn't overflow during the timing interval.

p#5 code as abused from @luni code ( when checked/called before overflow during timing ) can be used for extended timing over longer intervals with microsecond or ARM_DWT_CYCCNT resolution.

Looks like you are interested in this stuff :) Here some code which utilizes the seldom used RTC interrupt to update it every second. https://github.com/TeensyUser/doc/w...ock#the-periodic-timer-of-the-real-time-clock.

BTW: This writeup also shows how to implement a std::chrono compliant clock based on the cycle counter which syncs o the RTC. It counts time in ns since 1.1.1970. Unfortunately c++14 doesn't have a lot support for those clocks, so this is more a interesting exercise than being really useful. c++20 will add some stuff to make those std::chrono clocks much more useful.

Intermittently interested :) ... with undo intensity, that fades on the next distraction.
> Would be interesting to try that real hardware RTC 1 second interrupt instead of the 1000X more frequent millis() systick faux interrupt.
--> Then millis() ,like micros(), could return offset based on ARM_DWT_CYCCNT since last 1 sec tick - but that would make millis() take more cycles to return, and be accurate as long as interrupts were on at least once per second.
Doing some things like WFI is dangerous as the millis() faux interrupt won't 'wake', and some things disable interrupts across multiple systick_millis_count updates and micros() and millis() slip - like timing LittleFS PROG Flash I/O times.
Micros being based on ARM_DWT_CYCCNT could actually track those missed ticks and round up - but that gets confusing with millis() lossy behavior and was coded out.

Good writeup, as shown above I scanned it as posted and even managed to find it on local machine (thanks Sublime global search) to get to where src was on github/TeensyUser/doc
 
it should still work with a single overflow during the timing period right? just not if it went back to zero twice during the interval.
 
it should still work with a single overflow during the timing period right? just not if it went back to zero twice during the interval.

No, once an overflow of a 'reference point' happens it cannot be detected or recovered.

uint math (p#10) gives the absolute diff between two points 'in time'. As soon as the 'micros()' part of this "micros() - start;" rolls over the difference observed that went up to 2^32 starts over, counting at 0,1,2,...

From above see:
Code:
uint32_t start = micros();
delay(1000);
uint32_t delta = micros() - start;
Serial.println(delta);

{pardon coding on the web and not tested} This could be quickly observed with 8 bit math:
Code:
uint8_t start = micros();
delayMicroseconds(26);
uint8_t delta = micros() - start;
Serial.println(delta);

versus:
Code:
uint8_t start = micros();
delayMicroseconds(260);
uint8_t delta = micros() - start;
Serial.println(delta);
 
But what about what Paul was saying above? "The usual approach is to subtract 32 bit unsigned integers. If the rollover is from 0xFFFFFFFF to zero (eg, does not roll over "early" with maximum value less than all bits 1s), and the total elapsed is less than 2^31, then subtracting always results in the correct answer." Doesn't this mean that as long as my delta is shorter than the micros() overrun it will return the correct answer?

I am still learning about this stuff but it seems that there is a contradiction, with you saying that uint subtraction is absolute value, and paul saying it rolls over. I haven't actually gotten to testing the code yet, but this stuff is fascinating regardless. Thanks again for the help.
 
But what about what Paul was saying above? "The usual approach is to subtract 32 bit unsigned integers. If the rollover is from 0xFFFFFFFF to zero (eg, does not roll over "early" with maximum value less than all bits 1s), and the total elapsed is less than 2^31, then subtracting always results in the correct answer." Doesn't this mean that as long as my delta is shorter than the micros() overrun it will return the correct answer? I am still learning about this stuff but it seems that there is a contradiction, with you saying that uint subtraction is absolute value, and paul saying it rolls over. I haven't actually gotten to testing the code yet, but this stuff is fascinating regardless. Thanks again for the help.

Okay - we are talking about different halves of the middle :)

Yes, it can run off the end 0xffff ffff ONE time

The math takes care of that

The Overflow I was talking about in p#17 is the NEW time of micros() passing the OLD time of start. That will Overflow the Difference from subtraction and go from 2^32-1 == 0xffff ffff to Zero and then start counting at 0,1,2,3...
 
okay, I thought that might be the case. in my case this is no problem. The maximum delta will be like at most 3 minutes. this is an easier solution than I was expecting to be honest. just need to change a couple variable types and make sure they are cast as floats in the right place.
 
...

This sort of code is tricky to get right. In fact, I'm not even 100% confident we're really using the ARM exclusive access instructions properly. I'm not going to look into that now, but I opened an issue to remember it for a rainy day.
https://github.com/PaulStoffregen/cores/issues/609

...

Posted on the issue 609. Will post link to this post as the code is edited.

Minor output change to code there. Seems to be working as expected, interrupt triggering on micros() atomic read effort is rare. When it happens micros() always agrees after.
Some few times when millis() changes micros lags behind : that was a change from call order as it changes on return from MYmicros();, now showing that value.

Output: capturing this many in a group is rare - it is cyclic - but not repeating pattern. b==Before and a==After MYmicros() call::
Code:
millis() b=130605 .vs. a=130606 > in 108403978 loop()s	 :: us b=130605999 .vs. a=130606000 .vs. my=130606000
millis() b=130609 .vs. a=130610 > in 108418367 loop()s	 :: us b=130609999 .vs. a=130610000 .vs. my=130610000
millis() b=130612 .vs. a=130613 > in 108429142 loop()s	 :: us b=130612999 .vs. a=130613000 .vs. my=130612999
millis() b=130614 .vs. a=130615 > in 108436303 loop()s	 :: us b=130614999 .vs. a=130615000 .vs. my=130615000
millis() b=130617 .vs. a=130618 > in 108447078 loop()s	 :: us b=130617999 .vs. a=130618000 .vs. my=130617999
 Off CNT and Changed millis() b=130619 .vs. a=130620	 :: micros() b=130619999 .vs. a=130620000
	 [B]Off CNT :: to=1 .vs. in=2 [/B] >IntTrig=361 in 108454239 loop()s : MYus=130620000
millis() b=130621 .vs. a=130622 > in 108461386 loop()s	 :: us b=130621999 .vs. a=130622000 .vs. my=130621999
millis() b=130623 .vs. a=130624 > in 108468547 loop()s	 :: us b=130623999 .vs. a=130624000 .vs. my=130624000
millis() b=130626 .vs. a=130627 > in 108479322 loop()s	 :: us b=130626999 .vs. a=130627000 .vs. my=130626999
millis() b=130633 .vs. a=130634 > in 108504554 loop()s	 :: us b=130633999 .vs. a=130634000 .vs. my=130634000
millis() b=130636 .vs. a=130637 > in 108515329 loop()s	 :: us b=130636999 .vs. a=130637000 .vs. my=130636999
millis() b=130638 .vs. a=130639 > in 108522490 loop()s	 :: us b=130638999 .vs. a=130639000 .vs. my=130639000
millis() b=130641 .vs. a=130642 > in 108533265 loop()s	 :: us b=130641999 .vs. a=130642000 .vs. my=130641999
 Off CNT and Changed millis() b=130643 .vs. a=130644	 :: micros() b=130643999 .vs. a=130644000
	 [B]Off CNT :: to=1 .vs. in=2[/B]  >IntTrig=362 in 108540426 loop()s : MYus=130644000
millis() b=130645 .vs. a=130646 > in 108547573 loop()s	 :: us b=130645999 .vs. a=130646000 .vs. my=130645999
millis() b=130647 .vs. a=130648 > in 108554734 loop()s	 :: us b=130647999 .vs. a=130648000 .vs. my=130648000
millis() b=130650 .vs. a=130651 > in 108565509 loop()s	 :: us b=130650999 .vs. a=130651000 .vs. my=130650999
millis() b=130657 .vs. a=130658 > in 108590741 loop()s	 :: us b=130657999 .vs. a=130658000 .vs. my=130658000
millis() b=130660 .vs. a=130661 > in 108601516 loop()s	 :: us b=130660999 .vs. a=130661000 .vs. my=130660999
millis() b=130662 .vs. a=130663 > in 108608677 loop()s	 :: us b=130662999 .vs. a=130663000 .vs. my=130663000
millis() b=130665 .vs. a=130666 > in 108619452 loop()s	 :: us b=130665999 .vs. a=130666000 .vs. my=130665999
millis() b=130667 .vs. a=130668 > in 108626613 loop()s	 :: us b=130667999 .vs. a=130668000 .vs. my=130668000
 Off CNT and Changed millis() b=130671 .vs. a=130672	 :: micros() b=130671999 .vs. a=130672000
	 [B]Off CNT :: to=1 .vs. in=2 [/B] >IntTrig=363 in 108641002 loop()s : MYus=130672000
millis() b=130672 .vs. a=130673 > in 108644535 loop()s	 :: us b=130672999 .vs. a=130673000 .vs. my=130673000
millis() b=130678 .vs. a=130679 > in 108666153 loop()s	 :: us b=130678999 .vs. a=130679000 .vs. my=130679000
millis() b=130681 .vs. a=130682 > in 108676928 loop()s	 :: us b=130681999 .vs. a=130682000 .vs. my=130681999

Code:
void setup() {
  // C:\T_Drive\tCode\Interrupts\Micros_IsrDetect\Micros_IsrDetect.ino
  Serial.begin(115200);
  while (!Serial && millis() < 4000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);

}

uint32_t inMicros = 0;
uint32_t toMicros = 0;
uint32_t lpCnt = 0;
uint32_t MWloop = 0;
void loop() {
  lpCnt++;
  uint32_t bMillis, aMillis;
  uint32_t bMicros, aMicros, Myus;
  bMicros = micros();
  bMillis = millis();
  Myus = MYmicros();
  aMillis = millis();
  aMicros = micros();
  toMicros++;
  if ( toMicros != inMicros ) {
    if ( bMillis != aMillis ) {
      MWloop++;
      Serial.printf("\n Off CNT and Changed millis() b=%u .vs. a=%u", bMillis, aMillis );
      Serial.printf("\t :: micros() b=%u .vs. a=%u", bMicros, aMicros );
      Serial.printf("\n\t Off CNT :: to=%u .vs. in=%u  >IntTrig=%lu in %u loop()s : MYus=%u", toMicros, inMicros, MWloop, lpCnt, Myus );
    }
  }
  else if ( bMillis != aMillis ) {
    Serial.printf("\nmillis() b=%u .vs. a=%u > in %u loop()s", bMillis, aMillis, lpCnt );
    Serial.printf("\t :: us b=%u .vs. a=%u .vs. my=%u", bMicros, aMicros, Myus );
  }

  inMicros = toMicros = 0;
  if ( !(lpCnt%10000000) ) delay (10000);
}


#include "arm_math.h" // micros() synchronization
extern volatile uint32_t systick_millis_count;
extern volatile uint32_t systick_cycle_count;
uint32_t MYmicros(void)
{
  uint32_t MYsystick_safe_read;  // micros() synchronization
  uint32_t smc, scc;
  do {
    __LDREXW(&MYsystick_safe_read);
    inMicros++;
    smc = systick_millis_count;
    scc = systick_cycle_count;
  } while ( __STREXW(1, &MYsystick_safe_read));
  uint32_t cyccnt = ARM_DWT_CYCCNT;
  asm volatile("" : : : "memory");
  uint32_t ccdelta = cyccnt - scc;
  uint32_t frac = ((uint64_t)ccdelta * scale_cpu_cycles_to_microseconds) >> 32;
  if (frac > 1000) frac = 1000;
  uint32_t usec = 1000 * smc + frac;
  return usec;
}
 
Status
Not open for further replies.
Back
Top