Unexpectedy behavior due to optimization?

Status
Not open for further replies.

ossi

Well-known member
I wanted to measure execution timing of floating point cos routine using the following program:
Code:
int led = 13;

void setup() {         
  pinMode(led, OUTPUT); 
  delay(1000) ;
  Serial.begin(9600);  
  Serial.print("Hello World ");  
  }

int32_t time3() {
  int32_t k ;
  float sum,d ;
  sum=0.0 ; d=0.0 ;
  for(k=0 ; k<1000 ; k++){
    sum += cos(d) ;
    d += 0.00001 ;
    }
  return sum ;  
  }

int32_t dummy ;

void loop2(){
  while(1){
    digitalWrite(led, HIGH);
    dummy = time3() ;
    digitalWrite(led, LOW);
    delay(5) ;
    }
  }

void loop() {
  loop2() ;
  }

The idea was to measure the on-time of pin13 using a scope. The routine time3 is called between setting and clearing the LED. The strange thing is, that I measure 170ns on-time. That is far too short for a execution of the code. If I make sum and d volatile I measure some ms. That seems ok. But why has volatile an influence here?

Can it be that the compiler optimizes in the following way: The function time3 does not depend on any value, so computing the results one time and then caching them would be ok.
So the function is executed only once and then the cached result is used?:confused:
 
You could try with d=random(min, max) instead of 0 (I let you decide which min and max are best), so that the MCU has to do the math every single time in your infinite loop.
 
Yes, the compiler is amazingly good at optimizing away math when all the inputs are constants known at compile time.

Also consider using cosf(). The normal cos() function, without the "f", uses 64 bit double which is much slower.

While not an issue on Teensy 3.x, for the sake of correctness you should probably also append "f" to all those floating point constants. By default the compiler treats float constants as 64 bit double, so the default behavior of adding a constant to a 32 bit float is to promote the whole thing to slower 64 bit math, then spend extra time converting back to 32 bits to store the result. With Teensy 3.x we use a non-standard compiler option to treat all float constants as 32 bits, so this code runs fast. But if run on other boards which use the default compiler behavior, this program will end up using 64 bit double for nearly everything.
 
Thanks a lot for your helpful comments. I just got my Teensy 36 this morning. I am now learning a lot. I am very glad to have a board with floating-point capability without an operating system. Expect further questions from me coming soon.:)
 
When profiling performance, always make sure your calculations have side effects (with minimal overhead) to prevent compiler optimizing out your profiling code. For example keep accumulating the results and print out the accumulated result in the end. In your case compiler can see that dummy isn't used and time3() doesn't produce other side effects, so it can be completely optimized out. Just by printing out dummy would prevent that optimization.
 
Status
Not open for further replies.
Back
Top