Nanosecond delay details and counting nanoseconds functionality?

madmyers

Active member
For a Teensy 4.1 with Arduino ---

I've got some situations where I need to do 10s of nanosecond delays. I've found the delayNanoseconds function, but the documentation is not detailed.

Is there a location with hard specifics on this functions operation? I know from the Teensy page it's stated it will not be exact, but are more specifics available somewhere? For example, for a constant, will it ever delay less than the requested amount? For any given constant delay, what's the maximum it can be off (based on clock)? This would be very helpful.

Also, I'm looking for a way to measure nanosecond that have passed -- kind of like a time_t. I'd like to do something like

uint32_t diff;
uint32_t start = getNanoseconds();

// do stuff
// do stuff
// do stuff

diff = getNanoseconds() - start // yes, I know there's about 4s and this would need to deal with wrapping

Thanks for any thoughts on either of these questions.
 
Thanks.
I'm away from my Teensy at the moment but did some googling.

Does this seem to be an accurate way to use the DWT?


Summary:
uint32_t cycles; /* number of cycles */

KIN1_InitCycleCounter(); /* enable DWT hardware */
KIN1_ResetCycleCounter(); /* reset cycle counter */
KIN1_EnableCycleCounter(); /* start counting */
foo(); /* call function and count cycles */
cycles = KIN1_GetCycleCounter(); /* get cycle counter */
KIN1_DisableCycleCounter(); /* disable counting if not used any more
 
On Teensy using Arduino IDE, you can just use ARM_DWT_CYCCNT.

For example:

Code:
static uint32_t previous = 0;

void setup() {
}

void loop() {
  uint32_t n = ARM_DWT_CYCCNT;
  Serial.println(n - previous);
  previous = n;
  delayMicroseconds(15);
}
 
Oh, do I need this in setup still?

Code:
ARM_DEMCR |= ARM_DEMCR_TRCENA;
ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA

Saw that in another post
 
Following up with a clarification about delayNanoseconds ---

Does delayNanoseconds adapt for a different CPU speed? For example, perhaps it's coded to work at 600 MHz but not at 912 MHz.

I asked because when running at 912 MHz, I noticed

Code:
  Serial.println(F_CPU);

Displays 600000000.
 
Don't use F_CPU, use F_CPU_ACTUAL, which despite being all caps is a variable, not a macro, and it gets updated even when you change clock speeds at run-time. You can search the forum for more info.
 
Thank you. With your info, I can see the function

Code:
static inline void delayNanoseconds(uint32_t nsec)
{
    uint32_t begin = ARM_DWT_CYCCNT;
    uint32_t cycles =   ((F_CPU_ACTUAL>>16) * nsec) / (1000000000UL>>16);
    while (ARM_DWT_CYCCNT - begin < cycles) ; // wait
}

Uses F_CPU_ACTUAL so should adapt to different MHz.
 
At 600mhz 6 nano seconds is 10 clock cycles.

I'm curious if this does exactly 10 clock cycles or sometimes 11?

while (ARM_DWT_CYCCNT - begin < cycles) ;

I Believe NOPS are inconsistent due to the teensy being able to do 2 instructions at once
 
I'm curious if this does exactly 10 clock cycles or sometimes 11?

Just this morning I looked into mysterious results measuring the speed of writing 3 versus 4 bytes. Here's that other thread:


While a 1 cycle difference is probably something happening at a much lower level, perhaps bus arbitration or limitations of branch prediction, that thread shows how minor changes in the code cause the compiler to create very different results, and how this simple measurement misses some of the work due to how the compiler optimizes code.


At 600mhz 6 nano seconds is 10 clock cycles.

At 600 MHz, each clock is 1.667 ns. So 10 clocks would be 16.67ns.
 
At 600mhz 6 nano seconds is 10 clock cycles.
How do you figure that?

uint32_t cycles = ((600000000>>16) * 6) / (1000000000UL>>16);
simplifies to:
uint32_t cycles = ((9155) * 6) / 15258;
which evaluates to 3.

This code is only intended to delay for a period of time, not any specific number of instructions (since they don't map "cleanly" to cycles).
 
Back
Top