A Higher Resolution Micros() For The T4.0

Status
Not open for further replies.

Arctic_Eddie

Well-known member
Is it possible or feasible to create another clock function for the T4 with higher resolution? It might be called Nanos() with a 10 nsec resolution. I envision something like F_CLK / 6 with overflow being accumulated in a uint64_t variable. I tried a simple sketch with that variable size but it won't work with Serial.print(). The error mentions overloaded function. I assume that means the print function cannot accept a 64-bit value.

I'm not at all familiar with the T4 internal registers so would not attempt something like this.
 
Yup, the cycle counter is the answer. But it's only 32 bits, so it rolls over every ~7 seconds when running at 600 MHz.
 
I knew it would not be feasible with 32-bits. A 64-bit value counting 10 nsec would last ~5849 years. Is there any other variable size between 32 and 64 bits that can be handled? Would the construction of micros() offer any clue as how to create nanos()?
 
I knew it would not be feasible with 32-bits. A 64-bit value counting 10 nsec would last ~5849 years. Is there any other variable size between 32 and 64 bits that can be handled? Would the construction of micros() offer any clue as how to create nanos()?

There is simply no choice, as the Cyclecounter is hardware.
 
The cycle counter is probably crucial for internal operation so changing it's value might kill a program. I would expect the value at that address, 0xE0001004, to be related to micros().

I guess we'll have to wait for a Super Teensy 64-bit 1GHz product from Paul.
 
Even if overflow of that 32-bit cycle counter can trigger increment of another counter, you still won't have a coherent 64-bit counter with nanosecond resolution. Best to use the cycle counter for measuring short periods where you need the resolution, and find or define another one with lower frequency for measuring longer periods.
 
I see that in the ARM reference in the previous link that all the DWT values are 32-bits based on the address jump between each item, 4 bytes.

And just for kicks, I wanted to prove to myself that 64-bit integers actually exist. Here's a sketch to count overflows using a modulo detection scheme. With a divisor of 0x100'0000, ctr2 is 256 and 0x1000'0000 gives a value of 16. However, you can't print a 64-bit value with the Serial function as it's not one of the allowed input types.

Code:
// Test a 64-bit integer to see if it works

uint64_t ctr1 = 0;
uint16_t ctr2 = 0;

#define TEENSY_DELAY    3000        // Serial port does not start immediately
#define BAUD            115200      // How fast the port will output
#define LED_PIN         13          // My blinker

void setup() 
{
    Serial.begin( BAUD );
    while ( !Serial && ( millis() < TEENSY_DELAY ) );   // Wait for serial connect
    pinMode( LED_PIN, OUTPUT );
    digitalWriteFast( LED_PIN, LOW );
}

void loop() 
{
    // Run the loop for the full range of a 64-bit value
    for( uint64_t i = 0; i <= 0xffff'ffff; i++ )
    {
        ctr1++;
        if( ctr1 % 0x100'0000 == 0  )   // Check for lower order overflow
        {
            // Toggle LED at each overflow
            digitalWriteFast( LED_PIN, !digitalReadFast( LED_PIN ) );
            ctr2++;                     // Count overflows
        }
    }
    Serial.println( ctr2 );             // Proof
    while( 1 ) {};                      // Stall
}
 
Though only 32-bits, you can run the GPT timers at 24mhz (and even higher with some fiddling that will affect PIT timers).

I think the T4 quad timer can cascade/rollover counts to the next timer register, so you can get 32, 48, or 64-bit counting. see
https://github.com/manitou48/teensy4/blob/master/qtmr_cascade.ino
and maybe you can read the registers atomically

there may be other T4 timers that cascade ??

also, in timer overflow ISR you could increment your own 32-bit extension, but rollover race-condition is messy
 
Last edited:
Thanks for the link. I don't quite understand it all but it's encouraging that these kind of things can be done.

How would one go about reading a DWT register at address 0xE0001004? I'm guessing a 32-bit pointer aimed at that address.
 
Thanks for the link. I don't quite understand it all but it's encouraging that these kind of things can be done.

How would one go about reading a DWT register at address 0xE0001004? I'm guessing a 32-bit pointer aimed at that address.

this is pre-defined, me thinks
#define ARM_DWT_CYCCNT *(volatile uint32_t *)0xE0001004
 
Guess I can't make any use of that location.

Why do you think that?
Sure, a counter running with 600Mhz will never have the same values at reset. And I can't think of a reason why this would be needed?
print: Yes Serial.print does not support 64 Bit.
 
Is there any other variable size between 32 and 64 bits that can be handled?

The PIT timers might have a way to use 2 of them to get a 64 bit count. I have not personally tried to use this feature, so the best I can do is mention I've seen it in the reference manual.


I guess we'll have to wait for a Super Teensy 64-bit 1GHz product from Paul.

The upcoming 1 GHz chip will still be 32 bits. ;)
 
The PIT timers might have a way to use 2 of them to get a 64 bit count. I have not personally tried to use this feature, so the best I can do is mention I've seen it in the reference manual.

here is a sketch that chains the PIT timers to get 64 bits and I think (check ref manual) the timer registers read is atomic
https://github.com/manitou48/teensy4/blob/master/pit_micros64.ino
The PIT runs at 24mhz by default, but can be configured for higher speeds (fiddling affects GPT as well)

errata https://www.nxp.com/docs/en/nxp/errata/IMXRT1060CE.pdf

printf will do 64-bits Serial.printf("%llu us\n", pit_cycles() / 24);
 
Last edited:
Here a working example using the cycle counter. It first checks if an overflow happend since the last call. If so, it increments the higher 32 bit of a 64bit tracking variable to reflect that. This of course only works if you call nanos at least once per 7sec. Usually, placing a dummy call to nanos in loop should be enough. If your sketch can not guarantee that, you can always place a call to nanos() in yield() or have a timer call nanos every say 5s as shown in the example.

The following code is a quick proof of principle only. Didn't test it much but it seems to work Ok. It can certainly be optimized and should have some reentrance protection if you want to use it in a real life project.

Code:
uint64_t nanos()
{
    static uint32_t oldCycles = ARM_DWT_CYCCNT;
    static uint64_t highValue = 0;

    uint32_t newCycles = ARM_DWT_CYCCNT;
    if (newCycles < oldCycles)
    {
        highValue += 0x0000'0001'0000'0000;
    }
    oldCycles = newCycles;
    return (highValue | newCycles) * (1E9/F_CPU);
}



void setup()
{
    (new IntervalTimer())->begin([] { nanos(); }, 5'000'000);  // call nanos every 5s in the background. Only needed if your sketch doesn't call nanos at least once per 7s
}

void loop()
{
    Serial.printf("%" PRIu64 " ns\n", nanos());
    delay(500);
}
 
Last edited:
The following code is a quick proof of principle only. Didn't test it much but it seems to work Ok. It can certainly be optimized and should have some reentrance protection if you want to use it in a real life project.

for Teensy 3 you need to configure the cycle counter in setup()
Code:
    ARM_DEMCR |= ARM_DEMCR_TRCENA;   // enable debug/trace
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;   // enable

Also see WIKI
 
Last edited:
Status
Not open for further replies.
Back
Top