Forum Rule: Always post complete source code & details to reproduce any issue!

# Thread: double precision maths queries ...

1. ## double precision maths queries ...

I'm developing a teensy sketch and MATLAB script in parallel, but my limited experience with high precision/high accuacy maths has led me following differences in calculated &/or reported values.

MATLAB uses double precision by default. My "equivalent" teensy sketch also uses doubles.

When I define an angle constant in degrees, convert it to radians and take the tangent the resulting answers are an exact match to 6 decimal places, but they vary in higher precision digits.

angle = -66.75828º
tan() = -2.3284907725008 (MATLAB)
tan() = -2.328490478388 (teensy)

Even the initial conversion to radians is similarly 'inaccurate'. I was expecting (and need) much more accuracy!

I figured maybe it was just the output/printing method, so I tested several (including dtostrf). The best I got was exact equivalence (between MATLAB and teensy) to 8 digits.

Looking for clues as to what I am missing?
Any and all solutions gladly accepted Below are:
1) simple teensy sketch
2) output from the sketch
3) output from MATLAB script

Code:
```#define LED LED_BUILTIN
#include <Streaming.h>                                  // NB: with CJ edits to avoid SdFat/endl conflict

const double angle = -66.75828 * D2R;                   // (radians) +ve for Northern hemisphere, -ve for Southern
const double angle2 = -1.16515178897;

char valBuffer = {0};
const double dnum = -1.01234567890123456789;                  // width:23   precision:20

void setup() {
Serial.begin(115200);
while (!Serial && (millis() <= 4000)){                // wait for Serial to open, but only for 4 secs
delay(50);}                                         // rapidly!
digitalWrite(LED, LOW);                               // then off
Serial << "######## t3Dtest ########\n";
Serial << F(" * Serial open, millis: ") << millis() << '\n';
Serial << "------ ------- -------" << '\n';
Serial << '\n';
Serial << "** angle A ..." << '\n';
Serial << "input A:\t-66.75828(degrees) * D2R" << '\n';
Serial << "A:\t";     Serial.print(angle,12); Serial << " (radians)"<< '\n';
Serial << "tan():\t"; Serial.print(tan(angle),12); Serial << '\n';
Serial << '\n';
Serial << "input A2:\t-1.16515178897 (radians)" << '\n';
Serial << "A2:\t";    Serial.print(angle2,12); Serial << '\n';
Serial << "tan():\t"; Serial.print(tan(angle2),12); Serial << '\n';
Serial << '\n';
Serial << "------ ------- -------" << '\n';
Serial << "** dnum ..." << '\n';
Serial << "input:\t-1.01234567890123456789" << '\n';
dtostrf(dnum, 23, 20, valBuffer); // double in, width, precision, buffer
//ostrf(dnum, 14, 11, valBuffer); // double in, width, precision, buffer
Serial << "dtostrf:\t"; Serial.println(valBuffer);
Serial << ".print:\t";  Serial.println(dnum,20);
Serial << "_FLOAT:\t" << _FLOAT(dnum,20) << '\n';
Serial << "------ end setup -------" << '\n';
}

void loop() {
}```
Output from the sketch
######## t3Dtest ########
* Serial open, millis: 485
------ ------- -------

** angle A ...
input A: -66.75828(degrees) * D2R
tan(): -2.328490478388

A2: -1.165151834488
tan(): -2.328491064822

------ ------- -------
** dnum ...
input: -1.01234567890123456789
dtostrf: -1.01234567165374755859
.print: -1.012345671653747
_FLOAT: -1.012345671653747
------ end setup -------

Output from MATLAB script

Dtest
_angle = -66.75828º
_tan() = -2.3284907725
tand() = -2.3284907725
>>  Reply With Quote

2. ## What you are missing is that the Teensy's floating point constants are single precision and not double precision. So the constants are truncated to the 32-bit floating point format.

The reason for this is that the Teensy 3.5/3.6 have support for 32-bit floating point in hardware but not 64-bit. The 64-bit floating point is done via software emulation. By using the GCC switch to make constants 32-bits, it does not force expressions from single precision into double precision by having a double precision constant.

If you add an 'L' (or 'l') suffix to all constants, it will make these constants long double instead of float. On the ARM systems, long double has the same representation as double, so long double will be done the same way as double.

Ultimately part of this goes back to the original C compiler on the PDP-7 and then PDP-11. The PDP-11 tended to prefer doing arithmetic in double precision, so the C language was created where everything was assumed to be double, and constants were also double by default.  Reply With Quote

3. Thanks Michael. I've got that working in part now, but following still gives inaccurate values.
How should I do this?

Code:
```const long double D2R = DEG_TO_RAD;
const long double angle = -66.75828;
long double inclination = 0;

void setup(){
.
.
inclination = angle * D2R;
}```  Reply With Quote

4. Originally Posted by ninja2 Thanks Michael. I've got that working in part now, but following still gives inaccurate values.
How should I do this?

Code:
```const long double D2R = DEG_TO_RAD;
const long double angle = -66.75828;
long double inclination = 0;

void setup(){
.
.
inclination = angle * D2R;
}```
As I said, you need to add a 'L' suffix to all constants (at least all constants that are not simple integers and/or fit in 32-bits):

Code:
```const long double D2R = DEG_TO_RAD;
const long double angle = -66.75828L;
long double inclination = 0;

void setup(){
.
.
inclination = angle * D2R;
}```
Otherwise this declaration:

Code:
`const long double angle = -66.75828;`
is interpreted as:

Code:
`const long double angle = (long double) (float) -66.75828f;`  Reply With Quote

in wiring.h is missing the "L", so use your own version.  Reply With Quote

6. got all that, great stuff
no more 'double trouble'
thanks all  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•