Measuring precise CPU time.

Status
Not open for further replies.

Kuba0040

Well-known member
Hello,
I am doing some optimization work on my Teensy 3.2 synth (unreleated to the audio library). I would like to test and see how much faster some methods of doing things are over others. Basically, is there a way to measure precisely (individual CPU cycles would be great) how long does code take to run. Something similar to clocks_t for example, in C++. If there is such a way, please let me know.
Thank You.
 
Yes. Use ARM_DWT_CYCCNT
This is a 32 Bit ARM CPU register which counts cpu cycles.
It may be needed to enable it (See next post by luni ;) )
 
Yes there is a cycle counter here an example:

Code:
void setup()
{
    // The following 2 lines are only necessary for T3.0, T3.1 and T3.2
    ARM_DEMCR    |= ARM_DEMCR_TRCENA;         // enable debug/trace
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;   // enable cycle counter
}

void loop()
{
    uint32_t cnt = ARM_DWT_CYCCNT;
    float     ns = cnt * 1E9f/F_CPU;
    Serial.printf("CYCCNT: %10u (%0.6g ns)\n",cnt, ns);

    delay(1000);
}

And here some more information https://github.com/TeensyUser/doc/wiki/Using-the-cycle-counter including how to extend it to 64 bit.
 
I found this "if()" in setup to work without worrying about diff between T_3.x or T_4.x, or if it is already running, though never tested if restarting it hurt anything:
Code:
void setup()
{
  if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) { // Enable CPU Cycle Counter
    ARM_DEMCR    |= ARM_DEMCR_TRCENA;         // enable debug/trace
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;   // enable cycle counter
  }
  Serial.begin(115200);
  while (!Serial && millis() < 4000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
}

void loop()
{
  static uint32_t cntLast = ARM_DWT_CYCCNT;
  uint32_t cnt = ARM_DWT_CYCCNT;
  float     ns = cnt * 1E9f / F_CPU;
  Serial.printf("Cycles in delay(1000) %10u\tCYCCNT: %10u (%0.6g ns)\n", [B]cnt - cntLast[/B], cnt, ns);
  cntLast = cnt;
  delay(1000);
}

Math of " cnt - cntLast " with uint32_t vars handles the wrap of the 32 bit values always giving the diff, where the printed value shows it overflowing every 7 loop cycles.

Code:
T:\tCode\cycCnt\CycPerSec\CycPerSec.ino Jul 10 2021 09:13:14
Cycles in delay(1000)          4	CYCCNT:  483948218 (8.0658e+08 ns)
Cycles in delay(1000)  600004545	CYCCNT: 1083952763 (1.80659e+09 ns)
Cycles in delay(1000)  600005400	CYCCNT: 1683958163 (2.8066e+09 ns)
Cycles in delay(1000)  600005413	CYCCNT: 2283963576 (3.80661e+09 ns)
Cycles in delay(1000)  600005387	CYCCNT: 2883968963 (4.80662e+09 ns)
Cycles in delay(1000)  600005400	CYCCNT: 3483974363 (5.80662e+09 ns)
Cycles in delay(1000)  600005413	CYCCNT: 4083979776 (6.80663e+09 ns)
[U]Cycles in delay(1000)  600005387	CYCCNT:  389017867 (6.48363e+08 ns)[/U]
Cycles in delay(1000)  600005413	CYCCNT:  989023280 (1.64837e+09 ns)
 
Status
Not open for further replies.
Back
Top