Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 6 of 6

Thread: double precision maths queries ...

  1. #1
    Senior Member ninja2's Avatar
    Join Date
    Aug 2016
    Location
    Adelaide, Australia
    Posts
    151

    double precision maths queries ...

    I'm developing a teensy sketch and MATLAB script in parallel, but my limited experience with high precision/high accuacy maths has led me following differences in calculated &/or reported values.

    MATLAB uses double precision by default. My "equivalent" teensy sketch also uses doubles.

    When I define an angle constant in degrees, convert it to radians and take the tangent the resulting answers are an exact match to 6 decimal places, but they vary in higher precision digits.

    angle = -66.75828
    tan() = -2.3284907725008 (MATLAB)
    tan() = -2.328490478388 (teensy)

    Even the initial conversion to radians is similarly 'inaccurate'. I was expecting (and need) much more accuracy!

    I figured maybe it was just the output/printing method, so I tested several (including dtostrf). The best I got was exact equivalence (between MATLAB and teensy) to 8 digits.

    Looking for clues as to what I am missing?
    Any and all solutions gladly accepted

    Below are:
    1) simple teensy sketch
    2) output from the sketch
    3) output from MATLAB script

    Code:
    #define LED LED_BUILTIN
    #include <Streaming.h>                                  // NB: with CJ edits to avoid SdFat/endl conflict
    
    const double D2R = DEG_TO_RAD;                          // degrees to radians
    const double angle = -66.75828 * D2R;                   // (radians) +ve for Northern hemisphere, -ve for Southern
    const double angle2 = -1.16515178897; 
    
    char valBuffer[25] = {0}; 
    const double dnum = -1.01234567890123456789;                  // width:23   precision:20  
    
    void setup() {
      Serial.begin(115200);
      while (!Serial && (millis() <= 4000)){                // wait for Serial to open, but only for 4 secs
        digitalWriteFast(LED,!digitalReadFast(LED));        // toggle LED
        delay(50);}                                         // rapidly!
      digitalWrite(LED, LOW);                               // then off
      Serial << "######## t3Dtest ########\n"; 
      Serial << F(" * Serial open, millis: ") << millis() << '\n';
      Serial << "------ ------- -------" << '\n';
      Serial << '\n';
      Serial << "** angle A ..." << '\n'; 
      Serial << "input A:\t-66.75828(degrees) * D2R" << '\n'; 
      Serial << "A:\t";     Serial.print(angle,12); Serial << " (radians)"<< '\n'; 
      Serial << "tan():\t"; Serial.print(tan(angle),12); Serial << '\n'; 
      Serial << '\n';
      Serial << "input A2:\t-1.16515178897 (radians)" << '\n'; 
      Serial << "A2:\t";    Serial.print(angle2,12); Serial << '\n'; 
      Serial << "tan():\t"; Serial.print(tan(angle2),12); Serial << '\n'; 
      Serial << '\n';
      Serial << "------ ------- -------" << '\n';
      Serial << "** dnum ..." << '\n'; 
      Serial << "input:\t-1.01234567890123456789" << '\n'; 
      dtostrf(dnum, 23, 20, valBuffer); // double in, width, precision, buffer
      //ostrf(dnum, 14, 11, valBuffer); // double in, width, precision, buffer
      Serial << "dtostrf:\t"; Serial.println(valBuffer);
      Serial << ".print:\t";  Serial.println(dnum,20); 
      Serial << "_FLOAT:\t" << _FLOAT(dnum,20) << '\n'; 
      Serial << "------ end setup -------" << '\n';
      }
    
    void loop() {
      }
    Output from the sketch
    ######## t3Dtest ########
    * Serial open, millis: 485
    ------ ------- -------

    ** angle A ...
    input A: -66.75828(degrees) * D2R
    A: -1.165151743170 (radians)
    tan(): -2.328490478388

    input A2: -1.16515178897 (radians)
    A2: -1.165151834488
    tan(): -2.328491064822

    ------ ------- -------
    ** dnum ...
    input: -1.01234567890123456789
    dtostrf: -1.01234567165374755859
    .print: -1.012345671653747
    _FLOAT: -1.012345671653747
    ------ end setup -------

    Output from MATLAB script

    Dtest
    _angle = -66.75828
    radian = -1.16515178897
    _tan() = -2.3284907725
    tand() = -2.3284907725
    >>

  2. #2
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    3,310

    Cool

    What you are missing is that the Teensy's floating point constants are single precision and not double precision. So the constants are truncated to the 32-bit floating point format.

    The reason for this is that the Teensy 3.5/3.6 have support for 32-bit floating point in hardware but not 64-bit. The 64-bit floating point is done via software emulation. By using the GCC switch to make constants 32-bits, it does not force expressions from single precision into double precision by having a double precision constant.

    If you add an 'L' (or 'l') suffix to all constants, it will make these constants long double instead of float. On the ARM systems, long double has the same representation as double, so long double will be done the same way as double.

    Ultimately part of this goes back to the original C compiler on the PDP-7 and then PDP-11. The PDP-11 tended to prefer doing arithmetic in double precision, so the C language was created where everything was assumed to be double, and constants were also double by default.
    Last edited by MichaelMeissner; 04-06-2017 at 01:32 PM.

  3. #3
    Senior Member ninja2's Avatar
    Join Date
    Aug 2016
    Location
    Adelaide, Australia
    Posts
    151
    Thanks Michael. I've got that working in part now, but following still gives inaccurate values.
    How should I do this?

    Code:
    const long double D2R = DEG_TO_RAD;
    const long double angle = -66.75828;
    long double inclination = 0;
    
    void setup(){
    .
    .
      inclination = angle * D2R; 
    }

  4. #4
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    3,310
    Quote Originally Posted by ninja2 View Post
    Thanks Michael. I've got that working in part now, but following still gives inaccurate values.
    How should I do this?

    Code:
    const long double D2R = DEG_TO_RAD;
    const long double angle = -66.75828;
    long double inclination = 0;
    
    void setup(){
    .
    .
      inclination = angle * D2R; 
    }
    As I said, you need to add a 'L' suffix to all constants (at least all constants that are not simple integers and/or fit in 32-bits):

    Code:
    const long double D2R = DEG_TO_RAD;
    const long double angle = -66.75828L;
    long double inclination = 0;
    
    void setup(){
    .
    .
      inclination = angle * D2R; 
    }
    Otherwise this declaration:

    Code:
    const long double angle = -66.75828;
    is interpreted as:

    Code:
    const long double angle = (long double) (float) -66.75828f;

  5. #5
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    #define DEG_TO_RAD 0.017453292519943295769236907684886
    in wiring.h is missing the "L", so use your own version.

  6. #6
    Senior Member ninja2's Avatar
    Join Date
    Aug 2016
    Location
    Adelaide, Australia
    Posts
    151
    got all that, great stuff
    no more 'double trouble'
    thanks all

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •