Precision of arithmetic in C (Teensy 4.0 double vs. float)

Status
Not open for further replies.

ossi

Well-known member
Code:
float f1,f2 ;
  f2=f1+f2 ;
  f2=sin(f1) ;
Is it right that float arithmetic in C is generally done with double precision? Looking at the code generated for the Teensy 4.0 I get this impression. Is it right then that in the above statement the sin function and the addition are evaluated with double precision? That would mean that using float variables instead of double variables would make the code slower due to the necessary type conversions? Is this issue fixed by some specification of C?
 
To force float over double, you can use float constants (1.77f) and the float version of math functions ( sinf() ). T4 has hardware double and float. T3.5/3.6 has only single-precision FPU hardware and uses a gcc compile option to coerce floating point constants to float (even without the trailing f). And you can always cast variables or constants -- (float) f1
 
Code:
float f1,f2 ;
  f2=f1+f2 ;
  f2=sin(f1) ;
Is it right that float arithmetic in C is generally done with double precision? Looking at the code generated for the Teensy 4.0 I get this impression. Is it right then that in the above statement the sin function and the addition are evaluated with double precision? That would mean that using float variables instead of double variables would make the code slower due to the necessary type conversions? Is this issue fixed by some specification of C?

In the original C as created by Dennis Ritchie on the PDP-7 and then later on the PDP-11, the language was biased towards everything being 'double'. As I understand it, the PDP-11 that they were using had a floating point unit that was either in 64-bit floating point mode or 32-bit floating point, and the C runtime environment always ran in double-precision mode. If you used 'float' it would be converted automatically to 'double' to do any calculations (similar to the way 'short' and 'char' are logically promoted to 'int').

Thus the rule in the C standard that floating point constants without a 'f' or 'l' suffix would be 'double'.

When the X3J11 committee was established (which BTW I was on the X3J11 comittee from the original meeting through the publication of the ANSI C89 and then ISO C90 standards), one of the many changes that were made from K&C C to ANSI/ISO C was to establish that a binary operator whose arguments were both 'float' logically did the calculation in 'float'.

So doing:
Code:
    float f1, f2;
    // ...
    f2 = f1 + f2;

Does the calculation in single precision. On the Teensy 3.5 and 3.6 this is important because the chip only has support for doing 'float' calculations in hardware, and 'double' is done via software emulation. The Teensy LC/3.2 does not have hardware floating point at all, but generally the single precision emulation routines are faster and smaller than the double precision emulation routines.

This means that the code:
Code:
    float f1, f2;
    // ...
    f1 = f2 + 1.0;

must logically be done as:

Code:
    float f1, f2;
    double tmp1, tmp2;
    // ...
    tmp1 = (float) f2;
    tmp2 = f2 + 1.0;
    f2 = (float) tmp2;

Going on to the math routines, the main routines are specified as taking 'double' arguments and returning a 'double' result. So:
Code:
    float f1, f2;
    // ...
    f1 = sin (f2);

will logically be done as:
Code:
    float f1, f2;
    double tmp1, tmp2;
    // ...
    tmp1 = (double) f1;
    tmp2 = sin (tmp1);
    f2 = (float) tmp2;

But the X3J11 committee also added a parallel set of math functions with a 'f' suffix that take 'float' arguments and return a 'float' result. Thus:
Code:
    float f1, f2;
    // ...
    f1 = sinf (f2);

is normally much faster because it doesn't have to convert to 'double' and on the Teensy 3.5/3.6 it can be done with hardware instructions. Some implementations of the single precision math library will switch to double precision internally in places where they need more precision (and for the double functions, implement higher precision internally).

The AVR compiler for the original Arduino processors (such as the 328p used in the Uno) complicated things because it made 'double' the same size as 'float'. This actually is illegal C/C++, but it might be understandable, since the AVR processors are only 8-bit processors.

Due to the 3.5/3.6 having single precision floating point, on the Teensy LC/3.x chips and the historical baggage of the AVR processors, Paul added the compiler option '-fsingle-precision-constant' to Teensy LC/3.x builds which said that floating point constants without a suffix would be treated 'float' instead of 'double'. This help a lot of code developed on the Arduino Uno, but of course failed for non-Arduino codes that assumed they had more precision. You can't win everything, and the law of unintended consequences will come back to bite you.

Now, the Teensy 4.0 has hardware support for both single and double precision. Paul has dropped the '-fsingle-precision-constant' option for Teensy 4.0 builds. So there is less difference between single and double precision. I haven't looked at the instruction times, but I would imagine add/subtract might be roughly the same speed, single precision divide is likely to be faster than double precision divide (multiply might be the same speed or it might be slower). The single precision math functions are generally faster because the library routine has to use fewer operations to calculate the result to the desired accuracy.

The best practices are:
  • Always use the 'f' suffix on single precision constants to keep the calculation in single precision;
  • There is no suffix for double precision, but if you need the precision, use 'l' (lower case L) or 'L' and an explicit cast: (double)(1.2L);
  • If you use the math functions, use the single precision version with the 'f' suffix.

I should mention that while I do understand things at the compiler/standards level, I have not looked at the ARM instructions or the ARM backend of the compiler, so I am interpolating results.
 
Status
Not open for further replies.
Back
Top