Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 8 of 8

Thread: Disable instruction cache on Teensy 4.0

  1. #1
    Senior Member
    Join Date
    Feb 2019
    Posts
    107

    Disable instruction cache on Teensy 4.0

    I wanted a program that demonstrates cache influence. To demonstrate behaviour without caching I put the program into FLASHMEM and disable the instruction cache with a routine I found in CMSIS:core_cm7.h.

    Code:
    #define led 13
    void SCB_DisableICache (void){
        asm("dsb");
        asm("isb");
        SCB_CCR &= ~(uint32_t)SCB_CCR_IC  ;  /* disable I-Cache */
        SCB_CACHE_ICIALLU = 0UL;             /* invalidate I-Cache */
        asm("dsb");
        asm("isb");
    }
    
    double xx ;
    
    FLASHMEM void setup() {
      pinMode(led, OUTPUT);
      Serial.begin(115200);  
      while(!Serial){} ;
      Serial.print("Hello ARTICLEteensy40disableIcacheDemo1forumPost...\n"); 
      Serial.printf("setup=%08XH\n",setup) ;
      SCB_DisableICache();   
      
      // if the following line is deleted a loop cycle needs 130ns, if this line is enabled a loop needs 6.6ns
      Serial.printf("SCB_CCR & SCB_CCR_IC)=%08XH\n", SCB_CCR & SCB_CCR_IC ) ; 
      
      Serial.printf("wait 5 secs...") ; delay(5000) ;
      while(1){
        xx=0.0 ; 
        for(int k=0 ; k<20 ; k++){
          CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
          xx=xx+ 1.0 ;
          CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
          }
        Serial.printf("x=%15.10f\n",xx) ;
        delay(50) ;
        }
      }
            
    void loop(){
      }
    The behaviour of the program depends very sensitively on the circumstances. If one line is enabled the loop cycle time is 6.6ns. That can only be explained by caching. In this case the instruction cache seems not to be disabled. If the line is disabled the loop is much slower. That probably is the slow access to FLASHMEM.

    What is my fault such that the instruction cache is not disabled und all circumstances?

  2. #2
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,344
    Is this more or less a duplicate thread of your other one: https://forum.pjrc.com/threads/59371...the-Teensy-4-0

  3. #3
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,552
    Quote Originally Posted by ossi View Post
    I wanted a program that demonstrates cache influence. To demonstrate behaviour without caching I put the program into FLASHMEM and disable the instruction cache with a routine I found in CMSIS:core_cm7.h.

    Code:
    #define led 13
    void SCB_DisableICache (void){
        asm("dsb");
        asm("isb");
        SCB_CCR &= ~(uint32_t)SCB_CCR_IC  ;  /* disable I-Cache */
        SCB_CACHE_ICIALLU = 0UL;             /* invalidate I-Cache */
        asm("dsb");
        asm("isb");
    }
    
    double xx ;
    
    FLASHMEM void setup() {
      pinMode(led, OUTPUT);
      Serial.begin(115200);  
      while(!Serial){} ;
      Serial.print("Hello ARTICLEteensy40disableIcacheDemo1forumPost...\n"); 
      Serial.printf("setup=%08XH\n",setup) ;
      SCB_DisableICache();   
      
      // if the following line is deleted a loop cycle needs 130ns, if this line is enabled a loop needs 6.6ns
      Serial.printf("SCB_CCR & SCB_CCR_IC)=%08XH\n", SCB_CCR & SCB_CCR_IC ) ; 
      
      Serial.printf("wait 5 secs...") ; delay(5000) ;
      while(1){
        xx=0.0 ; 
        for(int k=0 ; k<20 ; k++){
          CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
          xx=xx+ 1.0 ;
          CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
          }
        Serial.printf("x=%15.10f\n",xx) ;
        delay(50) ;
        }
      }
            
    void loop(){
      }
    The behaviour of the program depends very sensitively on the circumstances. If one line is enabled the loop cycle time is 6.6ns. That can only be explained by caching. In this case the instruction cache seems not to be disabled. If the line is disabled the loop is much slower. That probably is the slow access to FLASHMEM.

    What is my fault such that the instruction cache is not disabled und all circumstances?
    Interesting.
    I don't know.
    But in this example the data-cache plays a role, too. The disassembly shows a number of constants:
    Code:
    [...]
    60001350:    20001085     .word    0x20001085
    60001354:    20001280     .word    0x20001280
    60001358:    20001050     .word    0x20001050
    6000135c:    20003288     .word    0x20003288
    60001360:    20000000     .word    0x20000000
    60001364:    60001291     .word    0x60001291
    60001368:    20000038     .word    0x20000038
    6000136c:    20000650     .word    0x20000650
    60001370:    e000ed14     .word    0xe000ed14
    60001374:    e000ef50     .word    0xe000ef50
    60001378:    20000048     .word    0x20000048
    6000137c:    20000068     .word    0x20000068
    60001380:    20001040     .word    0x20001040
    60001384:    42004000     .word    0x42004000
    60001388:    20000078     .word    0x20000078
    So, as always, things are not as we think they should be..,
    maybe play with compiler-switches. For exampe -mslow-flash-data - this prevents the constants.

  4. #4
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,552
    Hm, no, the inner loop does not access these consts.
    I don't know.

  5. #5
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,394
    again, referring to https://forum.pjrc.com/threads/54711...l=1#post197435
    paul speculates there may be bus buffers associated with flexspi FLASH

  6. #6
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,552
    Yea, but I don't know why adding a line before the inner loop influences the speed of the loop.
    The disassembly is the same.

    Could be a read-ahead - but that much? And even that would'nt explain it.

  7. #7
    Senior Member
    Join Date
    Feb 2019
    Posts
    107
    The effect can be demonstrated using the following program without attaching an oscilloscope to the LEDpin. It gives the loop-execution time that changes with switching a line on and off.

    Code:
    #define led 13
    
    void SCB_DisableICache (void){
        asm("dsb");
        asm("isb");
        SCB_CCR &= ~(uint32_t)SCB_CCR_IC  ;  /* disable I-Cache */
        SCB_CACHE_ICIALLU = 0UL;                     /* invalidate I-Cache */
        asm("dsb");
        asm("isb");
    }
    
    double xx ;
    
    void abort(){
      Serial.println("abort()...") ; 
      while(1){} ; 
      }
    
    FLASHMEM void setup() {
      pinMode(led, OUTPUT);
      Serial.begin(115200);  
      while(!Serial){} ;
      Serial.print("Hello ARTICLEteensy40disableIcacheDemo1...\n"); 
      SCB_DisableICache();   
      //Serial.printf("wait 5 secs...\n") ; delay(5000) ; // without this line: 9us  with this line: 2us
     
      int kk=0 ;
      while(1){
        kk++ ;
        if(kk==10){ abort() ; }
        int start=micros() ;
        for(int k=0 ; k<50 ; k++){
          CORE_PIN13_PORTSET = CORE_PIN13_BITMASK; // here-1
          CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK; // here-2
          }
        Serial.printf("time=%d us\n",micros()-start) ;
        delay(50) ;
        }
      }
           
    void loop(){
      }

  8. #8
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,552
    My cheap scope had a hard time to find the right trigger-point and other settings automatically.
    Had to switch to manual mode.
    Thanks for this scopetest

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •