Best way to do complex calculation ?

Status
Not open for further replies.

MattH

Active member
Hi everyone,

I am trying to apply a mathematical model (5th order polynomial) to a sensor output value on a Teensy3.6.
I know there is only a 2-decimal precision and that the usual workaround is to multiply by several power of 10. I have been changing things over, variable types, multiplying, dividing but in the end I am either loosing precision or overflowing... I am not sure I make the best use of variable type either ( I know that on Teensy 3.X there are some 64-bit variables)

Any advice ?

Many thanks

Code:
void setup() {
  Serial.begin(9600);
}

void loop() {
  int rawMin=5;    
  int rawMax=700;  
  double calMin=calibrate_sensor(rawMin); 
  double calMax=calibrate_sensor(rawMax); 
  Serial.println(calMin); // calibrated value should be close to 0.001
  Serial.println(calMax); // calibrated value should be close to 3.927
}

double calibrate_sensor(int sensorOutput){
  float a=0.042866;
  float b=0.321221;
  float c=0.875095;
  float d=1.249905;
  float e=1.610887;
  float f=1.529406;
  double mean=540.4259;
  double std=212.4908;
  double x=(sensorOutput-mean)/std; // transpose
  double result=a*pow(x,5)+b*pow(x,4)+c*pow(x,3)+d*pow(x,2)+e*x+f;
  return result;
}

I know that float a=0.042866 is not right as it truncates a to 0.04
 
float will give you about 7 digits of precision. It is the Serial.print that is only showing you 2.
Try:
Code:
  Serial.println(calMin,7); // calibrated value should be close to 0.001
  Serial.println(calMax,7);

Pete
 
Even though double has 64 bits of precision, Serial.print() uses only 32 bit float, so the extra precision will be used for the computation but then lost just before printing.

If you care about speed, 32 bit float is much faster because Teensy 3.6 has a 32 bit FPU. But first get more digits printing to see if you really need double in the calculation. Then try all float (if you care about speed) and see how the result compares.
 
As as do care about speed (I have 16 sensors and I am hoping to get real-time-ish measures) I tried some optimisation using my Teensy 3.2 (should be faster on the 3.6):
-pre-declaring the constants
-using pointers (first time for me !)

Results :
conventional method: 983 us
pre-declared: 986 us
pointers: 986 us

I assume there is no difference between conventional and pre-declared because of some kind of compiler optimisation that would generate pretty much the same assembler code in the end but I am surprised about the pointers. I am doing that wrong ?

Cheers

Code:
  float A=0.042866;
  float B=0.321221;
  float C=0.875095;
  float D=1.249905;
  float E=1.610887;
  float F=1.529406;
  float MEAN=540.4259;
  float STD=212.4908;

  float *pa;
  float *pb;
  float *pc;
  float *pd;
  float *pe;
  float *pf;
  float *pmean;
  float *pstd;

void setup() {
  Serial.begin(9600);
  pa=&A;
  pb=&B;
  pc=&C;
  pd=&D;
  pe=&E;
  pf=&F;
  pmean=&MEAN;
  pstd=&STD;

}

void loop() {
  int rawMin=5;    
  int rawMax=700; 
  unsigned long startTime1=micros(); 
  float calMin1=calibrate_sensor(rawMin); 
  float calMax1=calibrate_sensor(rawMax); 
  unsigned long elapsedTime1=micros()-startTime1;

  unsigned long startTime2=micros(); 
  float calMin2=calibrate_sensorPredeclared(rawMin); 
  float calMax2=calibrate_sensorPredeclared(rawMax); 
  unsigned long elapsedTime2=micros()-startTime2;
  
  unsigned long startTime3=micros(); 
  float calMin3=calibrate_sensorPoint(rawMin); 
  float calMax3=calibrate_sensorPoint(rawMax); 
  unsigned long elapsedTime3=micros()-startTime3;
  
  Serial.println("simple");
  Serial.println(calMin1,4); // calibrated value should be close to 0.001
  Serial.println(calMax1,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime1);
  Serial.println("pre-declared const");
  Serial.println(calMin2,4); // calibrated value should be close to 0.001
  Serial.println(calMax2,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime2);
  Serial.println("pointer");
  Serial.println(calMin3,4); // calibrated value should be close to 0.001
  Serial.println(calMax3,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime3);
  Serial.println("_______");
  delay(1000);
}


float calibrate_sensor(int sensorOutput){
  float a=0.042866;
  float b=0.321221;
  float c=0.875095;
  float d=1.249905;
  float e=1.610887;
  float f=1.529406;
  float mean=540.4259;
  float std=212.4908;
  float x=(sensorOutput-mean)/std; // transpose
  float result=a*pow(x,5)+b*pow(x,4)+c*pow(x,3)+d*pow(x,2)+e*x+f;
  return result;
}

float calibrate_sensorPredeclared(int sensorOutput){
  float x=(sensorOutput-MEAN)/STD; // transpose
  float result=A*pow(x,5)+B*pow(x,4)+C*pow(x,3)+D*pow(x,2)+E*x+F;
  return result;
}

float calibrate_sensorPoint(int sensorOutput){
  float x=(sensorOutput-*pmean)/ *pstd; // transpose
  float result=*pa*pow(x,5)+*pb*pow(x,4)+*pc*pow(x,3)+*pd*pow(x,2)+*pe*x+*pf;
  return result;
}
 
Nice - the sketch ran as posted! Yes, the T_3.6 will be faster ...

Here is what a T_3.6 looks like with Native FPU first at 96 MHz then again at 180 MHz:
simple
0.0010
3.9270
597
pre-declared const
576
pointer
579
_______
simple
324
pre-declared const
307
pointer
319
_______
Removed Result values that were all the same.

Note: a pre declared const would be like this :: const float A=0.042866;
Compiler may have seen them as unchanged - only saved 2us.
 
Thanks defragster,
I didn't know whether a pointer could point to a const (thinking about it, there is no reason why not) that's why I left the "const" out of the script. Thanks for confirming it's OK.

Do you have any advice on how to gain a few more us ? Obviously there will be a physical limit but I am also interested in learning how to optimise the speed of my code just for good practice. I thought pointers would be the next step but it's disappointing...
 
I am not sure I am understanding that correctly:
Native FPU first at 96 MHz then again at 180 MHz
Is it possible to programatically change the clock speed ? Is T3.6 clocked by default to its maximum 180MHz ?
Cheers
 
Normally, you should be able to increase greatly the speed of your code by omitting the slow pow() or powf() functions. Rewrite your polynome ax^5 + bx^4 + cx^3 + dx^2 + ex +f using the Horner’s method to ((((ax + b)x + c)x + d)x + e)x + f which only uses alternating simple multiplications and additions.
 
@Thermingenieur: Impressive !!! and so simple... c'est pour ça qu'il y a des ingénieurs !

I get down to a single us ! How is that even possible ?! Great !




Code:
  const float A=0.042866;
  const float B=0.321221;
  const float C=0.875095;
  const float D=1.249905;
  const float E=1.610887;
  const float F=1.529406;
  const float MEAN=540.4259;
  const float STD=212.4908;
  const float *pa;
  const float *pb;
  const float *pc;
  const float *pd;
  const float *pe;
  const float *pf;
  const float *pmean;
  const float *pstd;

void setup() {
  Serial.begin(9600);
  pa=&A;
  pb=&B;
  pc=&C;
  pd=&D;
  pe=&E;
  pf=&F;
  pmean=&MEAN;
  pstd=&STD;
}

void loop() {
  int rawMin=5;    
  int rawMax=700; 
  unsigned long startTime1=micros(); 
  float calMin1=calibrate_sensor(rawMin); 
  float calMax1=calibrate_sensor(rawMax); 
  unsigned long elapsedTime1=micros()-startTime1;

  unsigned long startTime2=micros(); 
  float calMin2=calibrate_sensorPredeclared(rawMin); 
  float calMax2=calibrate_sensorPredeclared(rawMax); 
  unsigned long elapsedTime2=micros()-startTime2;
  
  unsigned long startTime3=micros(); 
  float calMin3=calibrate_sensorPoint(rawMin); 
  float calMax3=calibrate_sensorPoint(rawMax); 
  unsigned long elapsedTime3=micros()-startTime3;

  unsigned long startTime4=micros(); 
  float calMin4=calibrate_sensorHorner(rawMin); 
  float calMax4=calibrate_sensorHorner(rawMax); 
  unsigned long elapsedTime4=micros()-startTime4;

  
  Serial.println("simple");
  Serial.println(calMin1,4); // calibrated value should be close to 0.001
  Serial.println(calMax1,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime1);
  Serial.println("pre-declared const");
  Serial.println(calMin2,4); // calibrated value should be close to 0.001
  Serial.println(calMax2,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime2);
  Serial.println("pointer");
  Serial.println(calMin3,4); // calibrated value should be close to 0.001
  Serial.println(calMax3,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime3);
  Serial.println("Thermingenieur");
  Serial.println(calMin4,4); // calibrated value should be close to 0.001
  Serial.println(calMax4,4); // calibrated value should be close to 3.927
  Serial.println(elapsedTime4);
  Serial.println("_______");
  delay(1000);
}


float calibrate_sensor(int sensorOutput){
  float a=0.042866;
  float b=0.321221;
  float c=0.875095;
  float d=1.249905;
  float e=1.610887;
  float f=1.529406;
  float mean=540.4259;
  float std=212.4908;
  float x=(sensorOutput-mean)/std; // transpose
  float result=a*pow(x,5)+b*pow(x,4)+c*pow(x,3)+d*pow(x,2)+e*x+f;
  return result;
}

float calibrate_sensorPredeclared(int sensorOutput){
  float x=(sensorOutput-MEAN)/STD; // transpose
  float result=A*pow(x,5)+B*pow(x,4)+C*pow(x,3)+D*pow(x,2)+E*x+F;
  return result;
}

float calibrate_sensorPoint(int sensorOutput){
  float x=(sensorOutput-*pmean)/ *pstd; // transpose
  float result=*pa*pow(x,5)+*pb*pow(x,4)+*pc*pow(x,3)+*pd*pow(x,2)+*pe*x+*pf;
  return result;
}

float calibrate_sensorHorner(int sensorOutput){
  float x=(sensorOutput-MEAN)/STD; // transpose
  float result=((((A*x + B)*x + C)*x + D)*x + E)*x + F;
  return result;
}
 
Glad I could help but one doesn’t need to be an engineer for that. I don’t know how it is nowadays, but about 40 years ago, the Horner’s method was 10th grade stuff for everybody.
 
Well, I was in 10th grade "only" 25 years ago but Horner's method was still around. I guess I was too excited about using exotic functions (I am new to programming) and eager to write complex code that I forgot the basic stuff. Thanks for reminding me of this.
I am sure this would be a good lesson for many others: Best way to do complex calculation ? Think about your maths before your pointers ;)
Cheers
 
I get down to a single us ! How is that even possible ?! Great !
Really? I tried your code with a teensy 3.2 which does not have a floating point unit and I also get times < 1µs. I might be wrong but this seems to be way too fast for all that floating point operations. Are you sure that the compiler doesn't optimize the complete calculation away? Since you are calling calibrate_sensorHorner() in loop() with constant values it might well be that the compiler chooses to calculate it once and then just outputs this stored value. Another possibility: Since the Horner scheme is so simple the compiler can easily calculate the result at compile time if it detects that all the used numbers are constant.

I may be wrong, but but it might be a good idea to determine the calculation time with real measured sensor values not just constants.
 
Interesting point. Easily checked I replaced
Code:
int rawMin=5;    
int rawMax=700;
by
Code:
int rawMin=random(5, 700);    
int rawMax=random(5, 700);

It is slightly slower: 8us
I thought that the same bias would apply to the other calculation methods but their speeds remain unchanged.
 
8µs makes much more sense than below 1µs :).

I thought that the same bias would apply to the other calculation methods but their speeds remain unchanged.
The compiler can not know what pow() is doing. Since pow() might as well return different values every time you call it (even with the same constant arguments) the compiler can not optimize the call away and you don't get any difference if you switch from constants to variable function arguments.
 
I am not sure I am understanding that correctly:
Is it possible to programatically change the clock speed ? Is T3.6 clocked by default to its maximum 180MHz ?
Cheers

That was the results of two separate compile and uploads. I put a while(1); to stop it at the end of loop() - ran at 96 then 180.

I didn't look at the math involved - just wanted to compare the FPU results. Assuming your T_3.2 was at 96 - it was under 2X as fast?
 
To fully leverage the FPU, you'd need to change those pow() to powf(). The regular pow() function uses 64 bit double, which forces more of the surrounding math to also be done with slow 64 bit double.
 
Status
Not open for further replies.
Back
Top