Interesting loop timing on Teensy 4.0

Status
Not open for further replies.

ossi

Well-known member
To get an insight into the inner working of the Teensy 4.0 I have attached an oscilloskope to the LEDpin. Now I let the following program run, where I want to understand what happens in the inner for loop.
Code:
#include "imxrt.h"
extern "C" uint32_t set_arm_clock(uint32_t frequency);

#define led 13
#define MHz 1e6
#define ns 1e-9

double xx ;

void branchPredictionOff(){
  SCB_CCR &= ~SCB_CCR_BP;
  Serial.println("set branch prediction off!") ;
  }

void setup() {
  pinMode(led, OUTPUT);
  Serial.begin(115200);  
  while(!Serial){} ;
  Serial.print("Hello teensy40jumpPredictTest1Posting1...\n"); 
  set_arm_clock(100*MHz); // better scope picture
  branchPredictionOff() ;
  while(1){
    xx=0.0 ; 
    for(int k=0 ; k<20 ; k++){
      CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
      xx=xx+ 1.0 ;
      CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
      }
    Serial.printf("x=%15.10f\n",xx) ;
    delay(50) ;
    }
  }

/*   
loop is compiled as follows:
  d4: eeb7 6b00   vmov.f64  d6, #112  ; 0x3f800000  1.0
  d8: 2314        movs  r3, #20
  da: ed9f 7b0b   vldr  d7, [pc, #44] ; 108 <setup+0x88>      xx=0.0 ; 
  de: 3b01        subs  r3, #1                                for(int k=0 ; k<20 ; k++){
  e0: f8c4 5084   str.w r5, [r4, #132]  ; 0x84                  CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
  e4: ee37 7b06   vadd.f64  d7, d7, d6                          xx=xx+ 1.0 ;
  e8: f8c4 5088   str.w r5, [r4, #136]  ; 0x88                  CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
  ec: d1f7        bne.n de <setup+0x5e>                         }
*/

        
void loop(){
  }

The oscilloscope shows the following picture:
SCR04.PNG
(I changed the cpu clock to 100MHz to get a better scope picture).
The picture shows that the first loop executions take various amounts of time until the execution stabilizes with 40ns/loop=4 cycles/loop.
What is happening in the first cycles? I thought it would have to do with branch prediction. So I tried to switch off branch prediction (is my method ok?), but still the first loop cycles show different behaviour. What is the explanation? Can it be data-cache filling?
 
Status
Not open for further replies.
Back
Top