Does the Teensy 4.1 Suffer from the Same DelayMicroseconds() Bug as the Arduino

BrendaEM

Active member
Hi,

I have been writing a stepper motor type program for which the timing is a bit persnickety. For the Arduino's there is an issue whereas there the thing flips over at 14 bits, so special care is required if you want to delay to 16383 microseconds.

Does the Teensy 4.1 suffer from the same code?
What is the rollover limit?

(Thank you)

ref: https://www.arduino.cc/reference/en/language/functions/time/delaymicroseconds/

I use something like this, but I don't know if it's even necessary.

Code:
void delayMS( unsigned long len )
// Patched Microseconds Delay to Include Longer than 16383. Apparently DelayMicroseconds is only 14-Bit, as in WTF?!
// Ref: https://www.arduino.cc/reference/en/language/functions/time/delaymicroseconds/
{
  volatile unsigned long millisecs;
  volatile unsigned long microsecs;

  // Delay Longer and Adjust
  if ( len > 8192 )
  {
    // Calculate Miliseconds Delay
    millisecs = len / 1000;

    // Delay
    delay ( millisecs );

    // Calculate Remainder
    microsecs = len % 1000;

    // Delay Remainder
    delayMicroseconds ( microsecs ) ;
  }
  else
  {
    delayMicroseconds ( len );
  }
}
 
The code for delayMicroseconds() from cores\Teensy4\core_pins.h is shown below. The max value that works correctly will be however many usec are equal to 2^32 clock cycles. For 600 MHz, that would be (2^32)/600 = 7158278.82, so 7158278, or 7.158 seconds. Per the comment in the code, if usec is large, it would be better to break the delay into smaller chunks.

Code:
static inline void delayMicroseconds(uint32_t usec)
{
    uint32_t begin = ARM_DWT_CYCCNT;
    uint32_t cycles = F_CPU_ACTUAL / 1000000 * usec;
    // TODO: check if cycles is large, do a wait with yield calls until it's smaller
    while (ARM_DWT_CYCCNT - begin < cycles) ; // wait
}

Alternative (tested briefly) that should work for the max possible (2^32)-1 usec, and calls yield() every 1 ms, the same as delay() does.

Code:
static inline void delayMicroseconds(uint32_t usec)
{
  uint32_t begin = ARM_DWT_CYCCNT;
  while (usec > 1000) {
      while (ARM_DWT_CYCCNT - begin < (F_CPU_ACTUAL/1000)) ; // wait 1 ms
      usec -= 1000;
      yield();
      begin = ARM_DWT_CYCCNT;
  }
  uint32_t cycles = F_CPU_ACTUAL / 1000000 * usec;
  while (ARM_DWT_CYCCNT - begin < cycles) ; // wait remaining usec
}
 
Last edited:
@joepasquariello , I am sorry that it took so long to get back to you. I've been ill, again. It's like not owning an old Jaguar, but renting it from your mechanic, or in this case, doctors. LOL!

Thank you for the reply.

From what I have read, there is a different issue than an Arduino, but perhaps it won't be a problem for my stepper-motor timing application.

[In the Arduino, the symptom of too long a microsecond delay is: hardly and delay at all, perhaps none]

Delay Nanoseconds seems cool, too. It also mentions fewer problems with interrupts, which generally have to be disabled on the Arduino, if you want to clock something smoothly.
 
Code:
static inline void delayMicroseconds(uint32_t usec)
{
  uint32_t begin = ARM_DWT_CYCCNT;
  while (usec > 1000) {
      while (ARM_DWT_CYCCNT - begin < (F_CPU_ACTUAL/1000)) ; // wait 1 ms
      usec -= 1000;
      yield();
      begin = ARM_DWT_CYCCNT;
  }
  uint32_t cycles = F_CPU_ACTUAL / 1000000 * usec;
  while (ARM_DWT_CYCCNT - begin < cycles) ; // wait remaining usec
}
Note that you're sampling ARM_DWT_CYCCNT at different points inside the loop with possibly substantial processing in between (the call to yield()) - the final time it is sampled before exiting the 1ms wait should be transferred into the "begin" variable, otherwise you're "dropping" cycles that aren't accounted for.
 
@BrendaEM - for sure delayMicroseconds is accurate up to 32 bits on Teensy - just that the clock runs faster so 32 bits runs out sooner.

As far as tracking the CYCCNT seems if the begin were updated as follows there isnt an issue of missing counts not accounted for in update to the remaining usec desired:
Code:
static inline void delayMicroseconds(uint32_t usec)
{
  uint32_t begin = ARM_DWT_CYCCNT;
  while (usec > 1000) {
      while (ARM_DWT_CYCCNT - begin < (F_CPU_ACTUAL/1000)) ; // wait 1 ms
      begin = ARM_DWT_CYCCNT;
      usec -= 1000;
      yield();
  }
  uint32_t cycles = F_CPU_ACTUAL / 1000000 * usec;
  while (ARM_DWT_CYCCNT - begin < cycles) ; // wait remaining usec
}
It will miss a few ticks leaving the while after the test calcs and fails - but there are a default of 600 in a us and doing an intermediate assign in the test would capture it - but make the while() test take longer each time. From observations the ARM_DWT_CYCCNT transfer takes ~3 ticks.
Doing loop bounds of 7000 would get back perhaps 21 ticks each 7 seconds - but a delay that long wouldn't seem normal for an exact period of us's.
 
There's still the risk of an interrupt happening between those lines, resulting in indeterminable ticks lost. The same sampled value that ends the loop needs to be used as the "begin" value for the next, like so:
Code:
static inline void delayMicroseconds(uint32_t usec)
{
  uint32_t begin = ARM_DWT_CYCCNT;
  while (usec > 1000) {
      while (1) {
          uint32_t now = ARM_DWT_CYCCNT;
          if (now - begin < (F_CPU_ACTUAL/1000)) {
              begin = now;
              break;
          }
      }
      usec -= 1000;
      yield();
  }
  uint32_t cycles = F_CPU_ACTUAL / 1000000 * usec;
  while (ARM_DWT_CYCCNT - begin < cycles) ; // wait remaining usec
}
This won't make the while loop any slower, the variables are all held in registers.
 
Thank you @jmarsh and @defragster.

Is it that the Teensy follows the Arduino's 8-bit interrupt tradition, whereas it will handle an interrupt, right in the middle of a integer or long transfer, thereby splitting it. I wish there was a global static keyword, whereas all variables were static unless otherwise marked.

I am using I2C for a display, and I don't know if I2C uses interrupts internally, but because I am not using rotary encoders in my project, no others are used. I even went so far as just using pots for setting variables on my device project. Well, seeing that I posted it elsewhere, it's a 5-axis camera controller.

It's been 57 degrees where I do electronics, but I need to get out there and experiment--in the kitchen where I do electronics : )

This is an issue for stepper motors, because if the delay between pulses/transitions suddenly drops--they will stall.
It's about the delay, the area above the acceleration curve : )
 
Last edited:
Is it that the Teensy follows the Arduino's 8-bit interrupt tradition, whereas it will handle an interrupt, right in the middle of a integer or long transfer, thereby splitting it.
Since the Teensy is a 32-bit CPU any variables equal or smaller to that size will be transferred in one instruction (can't be interrupted).
 
Back
Top