To get an insight into the inner working of the Teensy 4.0 I have attached an oscilloskope to the LEDpin. Now I let the following program run, where I want to understand what happens in the inner for loop.
The oscilloscope shows the following picture:
(I changed the cpu clock to 100MHz to get a better scope picture).
The picture shows that the first loop executions take various amounts of time until the execution stabilizes with 40ns/loop=4 cycles/loop.
What is happening in the first cycles? I thought it would have to do with branch prediction. So I tried to switch off branch prediction (is my method ok?), but still the first loop cycles show different behaviour. What is the explanation? Can it be data-cache filling?
Code:
#include "imxrt.h"
extern "C" uint32_t set_arm_clock(uint32_t frequency);
#define led 13
#define MHz 1e6
#define ns 1e-9
double xx ;
void branchPredictionOff(){
SCB_CCR &= ~SCB_CCR_BP;
Serial.println("set branch prediction off!") ;
}
void setup() {
pinMode(led, OUTPUT);
Serial.begin(115200);
while(!Serial){} ;
Serial.print("Hello teensy40jumpPredictTest1Posting1...\n");
set_arm_clock(100*MHz); // better scope picture
branchPredictionOff() ;
while(1){
xx=0.0 ;
for(int k=0 ; k<20 ; k++){
CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
xx=xx+ 1.0 ;
CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
}
Serial.printf("x=%15.10f\n",xx) ;
delay(50) ;
}
}
/*
loop is compiled as follows:
d4: eeb7 6b00 vmov.f64 d6, #112 ; 0x3f800000 1.0
d8: 2314 movs r3, #20
da: ed9f 7b0b vldr d7, [pc, #44] ; 108 <setup+0x88> xx=0.0 ;
de: 3b01 subs r3, #1 for(int k=0 ; k<20 ; k++){
e0: f8c4 5084 str.w r5, [r4, #132] ; 0x84 CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
e4: ee37 7b06 vadd.f64 d7, d7, d6 xx=xx+ 1.0 ;
e8: f8c4 5088 str.w r5, [r4, #136] ; 0x88 CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
ec: d1f7 bne.n de <setup+0x5e> }
*/
void loop(){
}
The oscilloscope shows the following picture:
(I changed the cpu clock to 100MHz to get a better scope picture).
The picture shows that the first loop executions take various amounts of time until the execution stabilizes with 40ns/loop=4 cycles/loop.
What is happening in the first cycles? I thought it would have to do with branch prediction. So I tried to switch off branch prediction (is my method ok?), but still the first loop cycles show different behaviour. What is the explanation? Can it be data-cache filling?