64bit and IEEE754 representation and math

Status
Not open for further replies.

Epyon

Well-known member
I have a decimal value of 1607154700 represented as a 64 bit unsigned data type. I can access this value one byte at a time. On the Teensy, I want to reconstruct these bytes into the original 64 bit data type and then do some math on it and print it human readable before storing it. However, my knowledge of how to handle these data types fails me. If I look up threads on the forum they are often from before 2015, and that year the native double types apparently changed from 64 bit to 32 bit.

Below is my test sketch. data[] is the array of bytes I receive and that I have to combine to a 64 bit integer. multiplier is a floating point variable to multiply the integer with, result stores the result and is printed and saved to SD. All these variables are dynamic, in my main sketch data[] and multiplier can change every loop.

I also added tmp to check if the byte-to-int conversion yields the same result.
Code:
uint64_t tmp64u;

byte data[] = {0x41, 0xA3, 0x29, 0x02, 0x8F, 0x45, 0xF2, 0xD8};
uint64_t tmp = 0x41A329028F45F2D8;

float multiplier = 0.1L;
float result = 0.0L;

void setup() {
  Serial.begin(9600);
  delay(3000);
  Serial.println("Start");
  for (int q = 0; q < 8; q++) { //copy and shift the bytes into a temp var
    tmp64u |= data[q];
    if (q < 7) {
      tmp64u = tmp64u << 8;
    }
  }
  Serial.printf("%llu,%lli,%lle\n", tmp64u, tmp64u, tmp64u);
  Serial.printf("%llu,%lli,%lle\n", tmp, tmp, tmp);
  
  result = tmp64u * multiplier;
  Serial.printf("%llu,%lli,%lle\n", result, result, result);
  
  result = tmp * multiplier;
  Serial.printf("%llu,%lli,%lle\n", result, result, result);
}

void loop() {
  // put your main code here, to run repeatedly:

}

This is my output:
Code:
Start
4729669124639552216,4729669124639552216,1.607274e+08
4729669124639552216,4729669124639552216,1.607274e+08
4871277705657581568,4871277705657581568,4.729669e+17
4871277705657581568,4871277705657581568,4.729669e+17
Combining the bytes works. If i print them as scientific notation, it also appears to work. Printing them as decimals yields strange results. If I perform some math on it, even the scientific notation doesn't work anymore.

How can I print the value human readable (1607154700) and perform math on them correctly? I don't care for speed (I only calculate the result every few seconds) but I need the precision.

This sketch runs on a T3.2 with Arduino 1.6.11 and TD 1.3b3.
 
Last edited:
The Teensy build process includes the option: -fsingle-precision-constant which has the effect that all floating point constants are treated as single precision constants. The reason is under default C/C++ rules floating point constants are typically double precision, and if you combine a single precision value and a double precision value (i.e. the constant), the compiler will force the expression to be done in double precision. This is great on systems that have have double precision hardware like the general purpose systems used to run Linux, Windows, or MAC code, but it is problematical on systems like the Teensy 3.5/3.6 that only have single precision hardware.

If you are curious, this harkens back to the second machine C targeted (i.e. the venerable PDP-11), which had both single/double precision hardware on some systems, but you had to do a mode switch to switch the runtime into single precision mode, do the calculation, and switch back. It was faster to do all calculations in double precision format.

This will lose 32-bits of precision if you store a floating point constant into a double.

The 'work around' is to use a 'L' suffix on floating point constants, which forces the type to long double. On the ARM platform, long double has the same format as double, but other systems have long double that has more precision (but may be slower than double).

The type correct way to write such constants is to use the 'L' suffix, but either use a const definition or an explicit cast back to double:

Code:
const double PI = 3.1415926835L;

    // or

#define PI ((double)3.1415826835L)
 
I have a decimal value of 1607154700 represented as a 64 bit unsigned data type. I can access this value one byte at a time. On the Teensy, I want to reconstruct these bytes into the original 64 bit data type and then do some math on it and print it human readable before storing it. However, my knowledge of how to handle these data types fails me. If I look up threads on the forum they are often from before 2015, and that year the native double types apparently changed from 64 bit to 32 bit.

This is my test sketch:
Code:
uint64_t tmp64u;
byte data[] = {0x41, 0xA3, 0x29, 0x02, 0x8F, 0x45, 0xF2, 0xD8};
uint64_t tmp = 0x41A329028F45F2D8;

float multiplier = 0.1;
float result = 0.0L;

void setup() {
  Serial.begin(9600);
  delay(3000);
  Serial.println("Start");
  for (int q = 0; q < 8; q++) { //copy and shift the bytes into a temp var
    tmp64u |= data[q];
    if (q < 7) {
      tmp64u = tmp64u << 8;
    }
  }
  Serial.printf("%llu,%lli,%lle\n", tmp64u, tmp64u, tmp64u);
  Serial.printf("%llu,%lli,%lle\n", tmp, tmp, tmp);
  
  result = tmp64u * multiplier;
  Serial.printf("%llu,%lli,%lle\n", result, result, result);
  
  result = tmp * multiplier;
  Serial.printf("%llu,%lli,%lle\n", result, result, result);
}

void loop() {
  // put your main code here, to run repeatedly:

}

This is my output:
Code:
Start
4729669124639552216,4729669124639552216,1.607274e+08
4729669124639552216,4729669124639552216,1.607274e+08
4871277705657581568,4871277705657581568,4.729669e+17
4871277705657581568,4871277705657581568,4.729669e+17
Combining the bytes works. If i print them as scientific notation, it also appears to work. Printing them as decimals yields strange results. If I perform some math on it, even the scientific notation doesn't work anymore.

How can I print the value human readable (1607154700) and perform math on them correctly?

This sketch runs on a T3.2 with Arduino 1.6.11 and TD 1.3b3.

in addition to Michael's comment,
I would simply look into the representation using the following program

Code:
typedef union
{ uint64_t ulongVal;
  uint8_t byteVal[8];
//  float64_t doubleVal;
  double doubleVal; //for Arduino IDE
} testU_t;

testU_t test;

void setup()
{
  // put your setup code here, to run once:
  //
	pinMode(13,OUTPUT);
	while(!Serial);
	Serial.printf("Teensy\n\r");

	test.ulongVal = 1ll<<32;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.ulongVal = 1ll<<48;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = 1.0;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = -1.0;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = 1.0/2;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = -1.0/2;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = 1.0/16;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = -1.0/16;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = 1.0/128;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);

	test.doubleVal = -1.0/128;
	Serial.printf("%x %x %x %x %x %x %x %x\r\n", test.byteVal[0],test.byteVal[1],test.byteVal[2],test.byteVal[3],
			 test.byteVal[4],test.byteVal[5],test.byteVal[6],test.byteVal[7]);


}

void loop()
{
	// put your main code here, to run repeatedly:
}

which produces as output
Code:
Teensy
0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 f0 3f
0 0 0 0 0 0 f0 bf
0 0 0 0 0 0 e0 3f
0 0 0 0 0 0 e0 bf
0 0 0 0 0 0 b0 3f
0 0 0 0 0 0 b0 bf
0 0 0 0 0 0 80 3f
0 0 0 0 0 0 80 bf

In particular you see that float/ double have completely different internal representation printing a integer with only %e format and without casting does not work.
 
Last edited:
All variables have to be dynamic, in my main sketch they change every loop. So I can't declare them const. Every few loops I receive new bytes to be combined into a 64 bit integer that has to be multiplied by a new floating point multiplier, the result has to be printed through serial and stored on SD.

I've edited my first post with some additional information.

I need precision more than speed. I only do one calculation every few seconds or so, no need to optimise for speed. I've added the L suffix during variable declaration to force the calculation to be done in double precision, but the results aren't right.
 
In particular you see that float/ double have completely different internal representation printing a integer with only %e format and without casting does not work.
Arduino IDE doesn't know the float64_t type. That's why I used a regular float result and initialised it with 0.0L, in the hope it would use a 64 bit type. Then I would multiply the 64 bit integer tmp64u with a 64 bit float multiplier which should yield a 64 bit float.
 
Arduino IDE doesn't know the float64_t type. That's why I used a regular float result and initialised it with 0.0L, in the hope it would use a 64 bit type. Then I would multiply the 64 bit integer tmp64u with a 64 bit float multiplier which should yield a 64 bit float.

replacing float64_t by double gives the same result (now tested with Arduino IDE 1.6.9)

(original cut and past missed the typedef union line in the beginning, now edited)
 
'double' is and always has been a 64-bit on Teensy 3.x. What is that serialized value you are receiving? Is it an integer or a float? '1607154700' that you mentioned is nowhere to be seen in your code.
 
'double' is and always has been a 64-bit on Teensy 3.x. What is that serialized value you are receiving? Is it an integer or a float? '1607154700' that you mentioned is nowhere to be seen in your code.
It should be an unsigned integer (but the documentation about the instrument passing me the byte array is scarce). 0x41A329028F45F2D8 is the hex representation of the value, but it yields different decimal results when using various online tools.
 
0x41A329028F45F2D8 is not some some simple integer representation of 1607154700 and it's not a IEEE754 floating point value either.

If you interpret 0x41A329028F45F2D8 as double, it's 160727367.637.
 
All variables have to be dynamic, in my main sketch they change every loop. So I can't declare them const. Every few loops I receive new bytes to be combined into a 64 bit integer that has to be multiplied by a new floating point multiplier, the result has to be printed through serial and stored on SD.

I've edited my first post with some additional information.

I need precision more than speed. I only do one calculation every few seconds or so, no need to optimise for speed. I've added the L suffix during variable declaration to force the calculation to be done in double precision, but the results aren't right.

The point was to declare your constants using const (which gives the type) or to use an explicit cast to (double) to get the value into the proper type.
 
I modified your sketch and get correct data
Code:
uint64_t tmp64u;
byte data[] = {0x41, 0xA3, 0x29, 0x02, 0x8F, 0x45, 0xF2, 0xD8};
uint64_t tmp = 0x41A329028F45F2D8;

double multiplier = 1.0;
double result = 0.0;

void setup() {
  while(!Serial);
  Serial.println("Start");
  for (int q = 0; q < 8; q++) { //copy and shift the bytes into a temp var
    tmp64u |= data[q];
    if (q < 7) {
      tmp64u = tmp64u << 8;
    }
  }
  Serial.printf("%llx,%lli, %llx\n", tmp64u, tmp64u, (double)tmp64u);
  Serial.printf("%llx,%lli, %llx\n", tmp, tmp, (double)tmp);
  
  result = tmp64u * multiplier;
  Serial.printf("%llx,%lle\n", result, result);
  
  result = tmp * multiplier;
  Serial.printf("%llx,%lle\n", result, result);
}

void loop() {
  // put your main code here, to run repeatedly:

}
resulting in
Code:
Start
41a329028f45f2d8,4729669124639552216, 43d068ca40a3d17d
41a329028f45f2d8,4729669124639552216, 43d068ca40a3d17d
43d068ca40a3d17d,4.729669e+18
43d068ca40a3d17d,4.729669e+18

I put multiplier to 1.0 so one can compare the results

you see also the different internal representation of integer and double (note: double is NOT a synonym of 64 bit integer)
 
0x41A329028F45F2D8 is not some some simple integer representation of 1607154700 and it's not a IEEE754 floating point value either.

If you interpret 0x41A329028F45F2D8 as double, it's 160727367.637.
(note: double is NOT a synonym of 64 bit integer)
Yes, the manual of the device providing the bytewise data defined it as a double. But afaik it should not contain any digits after the radix, therefore I called it 'integer'. I need to multiply it by a value that will make it have digits after the radix (making it a floating point), and I was under the impression that 754 was the only method of representing 64 bit floating point values on Teensy.

I can't replace uint64_t tmp64u by double tmp64u or long double, the compiler doesn't accept the bit shift of the separate bytes.

The device providing the data is a live energy meter, the value it displayed on its screen was 1607154700, but it could have incremented a bit during my tests.

The documentation is very scarce. 32 bit values are called 'float' in the documentation. There I join 4 bytes and cast them to float, this works. But 64 bit doesn't work this way..
 
Last edited:
If it is a double, you do this and get a usable value:
Code:
unsigned char data_rev[] = {0xD8, 0xF2, 0x45, 0x8F, 0x02, 0x29, 0xA3, 0x41};
double dv;
memcpy(&dv, data_rev, sizeof(double));
 
Yes, the manual of the device providing the bytewise data defined it as a double. But afaik it should not contain any digits after the radix, therefore I called it 'integer'. I need to multiply it by a value that will make it have digits after the radix (making it a floating point), and I was under the impression that 754 was the only method of representing 64 bit floating point values on Teensy.

I can't replace uint64_t tmp64u by double tmp64u or long double, the compiler doesn't accept the bit shift of the separate bytes.

The device providing the data is a live energy meter, the value it displayed on its screen was 1607154700, but it could have incremented a bit during my tests.

The documentation is very scarce. 32 bit values are called 'float' in the documentation. There I join 4 bytes and cast them to float, this works. But 64 bit doesn't work this way..

It is NOT in IEEE 754R format if it uses this encoding. Except for unnormalized numbers (and Infinity/Nan), ALL numbers always have the top bit set in the mantissa and that bit (called the hidden bit) is not actually in the stored number. This gives you 1 more bit of precision for the normal numbers.
 
If it is a double, you do this and get a usable value:
Code:
unsigned char data_rev[] = {0xD8, 0xF2, 0x45, 0x8F, 0x02, 0x29, 0xA3, 0x41};
double dv;
memcpy(&dv, data_rev, sizeof(double));
This seems to work, I'll try it tomorrow on live data.

Didn't think of memcpy of course :( . Strange that shifting the bytes manually into a double didn't work though.
 
Strange that shifting the bytes manually into a double didn't work though.
You didn't shift them into a double, you shifted them into an integer. You can't shift them into a double - there are no floating point shifts.

You could cast an uin64_t* to a double* and get a similar effect to the memcpy, but that is asking for trouble since it breaks the C++ aliasing rules (you may get mis-compiled code).

Code:
// don't do this
uint64_t tmp = 0x41A329028F45F2D8;
dv = *((double*)&tmp);
 
Okay, it works. Thanks tni, Michael, WMXZ!

So to be clear, double is always floating point, but not in 754 representation? Unsigned long long would then be the integer uint64_t (on Teensy)? Is there a reason a 64 bit double can be printed through normal Serial.print() and a 64 bit integer not?
 
Okay, it works. Thanks tni, Michael, WMXZ!

So to be clear, double is always floating point, but not in 754 representation? Unsigned long long would then be the integer uint64_t (on Teensy)? Is there a reason a 64 bit double can be printed through normal Serial.print() and a 64 bit integer not?

On the ARM platform, double is always in IEEE 754R representation, on other platforms, it might not be in the IEEE 754R 64-bit binary format. However, it is getting rarer not use IEEE 754R format. There are also some chips that use IEEE 754R format, but do not adhere to all of the special features of IEEE 754R like negative 0, +/- infinity, NaN (not a number), denormal number support, rounding control, etc.

The Arduino developers did not add a method to the Print class for 64-bit integers on arm platforms. The AVR processors like the ATmega328P used in the Arduino Uno are 8 bit processors, do not have 64-bit integer support.
 
The AVR processors like the ATmega328P used in the Arduino Uno are 8 bit processors, do not have 64-bit integer support.
AVR GCC has had 64-bit integer support forever (Arduino hasn't included it in the documentation). It is however extremely slow and the code size is huge.
 
Status
Not open for further replies.
Back
Top