Disable instruction cache on Teensy 4.0

Status
Not open for further replies.

ossi

Well-known member
I wanted a program that demonstrates cache influence. To demonstrate behaviour without caching I put the program into FLASHMEM and disable the instruction cache with a routine I found in CMSIS:core_cm7.h.

Code:
#define led 13
void SCB_DisableICache (void){
    asm("dsb");
    asm("isb");
    SCB_CCR &= ~(uint32_t)SCB_CCR_IC  ;  /* disable I-Cache */
    SCB_CACHE_ICIALLU = 0UL;             /* invalidate I-Cache */
    asm("dsb");
    asm("isb");
}

double xx ;

FLASHMEM void setup() {
  pinMode(led, OUTPUT);
  Serial.begin(115200);  
  while(!Serial){} ;
  Serial.print("Hello ARTICLEteensy40disableIcacheDemo1forumPost...\n"); 
  Serial.printf("setup=%08XH\n",setup) ;
  SCB_DisableICache();   
  
  // if the following line is deleted a loop cycle needs 130ns, if this line is enabled a loop needs 6.6ns
  Serial.printf("SCB_CCR & SCB_CCR_IC)=%08XH\n", SCB_CCR & SCB_CCR_IC ) ; 
  
  Serial.printf("wait 5 secs...") ; delay(5000) ;
  while(1){
    xx=0.0 ; 
    for(int k=0 ; k<20 ; k++){
      CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
      xx=xx+ 1.0 ;
      CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
      }
    Serial.printf("x=%15.10f\n",xx) ;
    delay(50) ;
    }
  }
        
void loop(){
  }
The behaviour of the program depends very sensitively on the circumstances. If one line is enabled the loop cycle time is 6.6ns. That can only be explained by caching. In this case the instruction cache seems not to be disabled. If the line is disabled the loop is much slower. That probably is the slow access to FLASHMEM.

What is my fault such that the instruction cache is not disabled und all circumstances?
 
I wanted a program that demonstrates cache influence. To demonstrate behaviour without caching I put the program into FLASHMEM and disable the instruction cache with a routine I found in CMSIS:core_cm7.h.

Code:
#define led 13
void SCB_DisableICache (void){
    asm("dsb");
    asm("isb");
    SCB_CCR &= ~(uint32_t)SCB_CCR_IC  ;  /* disable I-Cache */
    SCB_CACHE_ICIALLU = 0UL;             /* invalidate I-Cache */
    asm("dsb");
    asm("isb");
}

double xx ;

FLASHMEM void setup() {
  pinMode(led, OUTPUT);
  Serial.begin(115200);  
  while(!Serial){} ;
  Serial.print("Hello ARTICLEteensy40disableIcacheDemo1forumPost...\n"); 
  Serial.printf("setup=%08XH\n",setup) ;
  SCB_DisableICache();   
  
  // if the following line is deleted a loop cycle needs 130ns, if this line is enabled a loop needs 6.6ns
  Serial.printf("SCB_CCR & SCB_CCR_IC)=%08XH\n", SCB_CCR & SCB_CCR_IC ) ; 
  
  Serial.printf("wait 5 secs...") ; delay(5000) ;
  while(1){
    xx=0.0 ; 
    for(int k=0 ; k<20 ; k++){
      CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
      xx=xx+ 1.0 ;
      CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
      }
    Serial.printf("x=%15.10f\n",xx) ;
    delay(50) ;
    }
  }
        
void loop(){
  }
The behaviour of the program depends very sensitively on the circumstances. If one line is enabled the loop cycle time is 6.6ns. That can only be explained by caching. In this case the instruction cache seems not to be disabled. If the line is disabled the loop is much slower. That probably is the slow access to FLASHMEM.

What is my fault such that the instruction cache is not disabled und all circumstances?

Interesting.
I don't know.
But in this example the data-cache plays a role, too. The disassembly shows a number of constants:
Code:
[...]
60001350:    20001085     .word    0x20001085
60001354:    20001280     .word    0x20001280
60001358:    20001050     .word    0x20001050
6000135c:    20003288     .word    0x20003288
60001360:    20000000     .word    0x20000000
60001364:    60001291     .word    0x60001291
60001368:    20000038     .word    0x20000038
6000136c:    20000650     .word    0x20000650
60001370:    e000ed14     .word    0xe000ed14
60001374:    e000ef50     .word    0xe000ef50
60001378:    20000048     .word    0x20000048
6000137c:    20000068     .word    0x20000068
60001380:    20001040     .word    0x20001040
60001384:    42004000     .word    0x42004000
60001388:    20000078     .word    0x20000078
So, as always, things are not as we think they should be..,
maybe play with compiler-switches. For exampe -mslow-flash-data - this prevents the constants.
 
Yea, but I don't know why adding a line before the inner loop influences the speed of the loop.
The disassembly is the same.

Could be a read-ahead - but that much? And even that would'nt explain it.
 
The effect can be demonstrated using the following program without attaching an oscilloscope to the LEDpin. It gives the loop-execution time that changes with switching a line on and off.

Code:
#define led 13

void SCB_DisableICache (void){
    asm("dsb");
    asm("isb");
    SCB_CCR &= ~(uint32_t)SCB_CCR_IC  ;  /* disable I-Cache */
    SCB_CACHE_ICIALLU = 0UL;                     /* invalidate I-Cache */
    asm("dsb");
    asm("isb");
}

double xx ;

void abort(){
  Serial.println("abort()...") ; 
  while(1){} ; 
  }

FLASHMEM void setup() {
  pinMode(led, OUTPUT);
  Serial.begin(115200);  
  while(!Serial){} ;
  Serial.print("Hello ARTICLEteensy40disableIcacheDemo1...\n"); 
  SCB_DisableICache();   
  //Serial.printf("wait 5 secs...\n") ; delay(5000) ; // without this line: 9us  with this line: 2us
 
  int kk=0 ;
  while(1){
    kk++ ;
    if(kk==10){ abort() ; }
    int start=micros() ;
    for(int k=0 ; k<50 ; k++){
      CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
      CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
      }
    Serial.printf("time=%d us\n",micros()-start) ;
    delay(50) ;
    }
  }
       
void loop(){
  }
 
My cheap scope had a hard time to find the right trigger-point and other settings automatically.
Had to switch to manual mode.
Thanks for this scopetest :)
 
Status
Not open for further replies.
Back
Top