double precision maths queries ...

Status
Not open for further replies.

ninja2

Well-known member
I'm developing a teensy sketch and MATLAB script in parallel, but my limited experience with high precision/high accuacy maths has led me following differences in calculated &/or reported values.

MATLAB uses double precision by default. My "equivalent" teensy sketch also uses doubles.

When I define an angle constant in degrees, convert it to radians and take the tangent the resulting answers are an exact match to 6 decimal places, but they vary in higher precision digits.

angle = -66.75828º
tan() = -2.3284907725008 (MATLAB)
tan() = -2.328490478388 (teensy)

Even the initial conversion to radians is similarly 'inaccurate'. I was expecting (and need) much more accuracy!

I figured maybe it was just the output/printing method, so I tested several (including dtostrf). The best I got was exact equivalence (between MATLAB and teensy) to 8 digits.

Looking for clues as to what I am missing?
Any and all solutions gladly accepted :)

Below are:
1) simple teensy sketch
2) output from the sketch
3) output from MATLAB script

Code:
#define LED LED_BUILTIN
#include <Streaming.h>                                  // NB: with CJ edits to avoid SdFat/endl conflict

const double D2R = DEG_TO_RAD;                          // degrees to radians
const double angle = -66.75828 * D2R;                   // (radians) +ve for Northern hemisphere, -ve for Southern
const double angle2 = -1.16515178897; 

char valBuffer[25] = {0}; 
const double dnum = -1.01234567890123456789;                  // width:23   precision:20  

void setup() {
  Serial.begin(115200);
  while (!Serial && (millis() <= 4000)){                // wait for Serial to open, but only for 4 secs
    digitalWriteFast(LED,!digitalReadFast(LED));        // toggle LED
    delay(50);}                                         // rapidly!
  digitalWrite(LED, LOW);                               // then off
  Serial << "######## t3Dtest ########\n"; 
  Serial << F(" * Serial open, millis: ") << millis() << '\n';
  Serial << "------ ------- -------" << '\n';
  Serial << '\n';
  Serial << "** angle A ..." << '\n'; 
  Serial << "input A:\t-66.75828(degrees) * D2R" << '\n'; 
  Serial << "A:\t";     Serial.print(angle,12); Serial << " (radians)"<< '\n'; 
  Serial << "tan():\t"; Serial.print(tan(angle),12); Serial << '\n'; 
  Serial << '\n';
  Serial << "input A2:\t-1.16515178897 (radians)" << '\n'; 
  Serial << "A2:\t";    Serial.print(angle2,12); Serial << '\n'; 
  Serial << "tan():\t"; Serial.print(tan(angle2),12); Serial << '\n'; 
  Serial << '\n';
  Serial << "------ ------- -------" << '\n';
  Serial << "** dnum ..." << '\n'; 
  Serial << "input:\t-1.01234567890123456789" << '\n'; 
  dtostrf(dnum, 23, 20, valBuffer); // double in, width, precision, buffer
  //ostrf(dnum, 14, 11, valBuffer); // double in, width, precision, buffer
  Serial << "dtostrf:\t"; Serial.println(valBuffer);
  Serial << ".print:\t";  Serial.println(dnum,20); 
  Serial << "_FLOAT:\t" << _FLOAT(dnum,20) << '\n'; 
  Serial << "------ end setup -------" << '\n';
  }

void loop() {
  }

Output from the sketch
######## t3Dtest ########
* Serial open, millis: 485
------ ------- -------

** angle A ...
input A: -66.75828(degrees) * D2R
A: -1.165151743170 (radians)
tan(): -2.328490478388

input A2: -1.16515178897 (radians)
A2: -1.165151834488
tan(): -2.328491064822

------ ------- -------
** dnum ...
input: -1.01234567890123456789
dtostrf: -1.01234567165374755859
.print: -1.012345671653747
_FLOAT: -1.012345671653747
------ end setup -------

Output from MATLAB script

Dtest
_angle = -66.75828º
radian = -1.16515178897
_tan() = -2.3284907725
tand() = -2.3284907725
>>
 
What you are missing is that the Teensy's floating point constants are single precision and not double precision. So the constants are truncated to the 32-bit floating point format.

The reason for this is that the Teensy 3.5/3.6 have support for 32-bit floating point in hardware but not 64-bit. The 64-bit floating point is done via software emulation. By using the GCC switch to make constants 32-bits, it does not force expressions from single precision into double precision by having a double precision constant.

If you add an 'L' (or 'l') suffix to all constants, it will make these constants long double instead of float. On the ARM systems, long double has the same representation as double, so long double will be done the same way as double.

Ultimately part of this goes back to the original C compiler on the PDP-7 and then PDP-11. The PDP-11 tended to prefer doing arithmetic in double precision, so the C language was created where everything was assumed to be double, and constants were also double by default.
 
Last edited:
Thanks Michael. I've got that working in part now, but following still gives inaccurate values.
How should I do this?

Code:
const long double D2R = DEG_TO_RAD;
const long double angle = -66.75828;
long double inclination = 0;

void setup(){
.
.
  inclination = angle * D2R; 
}
 
Thanks Michael. I've got that working in part now, but following still gives inaccurate values.
How should I do this?

Code:
const long double D2R = DEG_TO_RAD;
const long double angle = -66.75828;
long double inclination = 0;

void setup(){
.
.
  inclination = angle * D2R; 
}

As I said, you need to add a 'L' suffix to all constants (at least all constants that are not simple integers and/or fit in 32-bits):

Code:
const long double D2R = DEG_TO_RAD;
const long double angle = -66.75828L;
long double inclination = 0;

void setup(){
.
.
  inclination = angle * D2R; 
}

Otherwise this declaration:

Code:
const long double angle = -66.75828;

is interpreted as:

Code:
const long double angle = (long double) (float) -66.75828f;
 
#define DEG_TO_RAD 0.017453292519943295769236907684886
in wiring.h is missing the "L", so use your own version.
 
Status
Not open for further replies.
Back
Top