There is no issue, as long as you explicitly specify which byte order you use in the transfers.
On some host hardware architectures, accessing an unaligned float may not work, and byte order conversion may be needed. I recommend combining these. For example:
Code:
#include <stdint.h>
static inline float get_float32le(const void *ptr)
{
const unsigned char *const data = ptr;
union {
float f;
uint32_t u;
} temp;
temp.u = data[0] | ((uint32_t)(data[1]) << 8) | ((uint32_t)(data[2]) << 16) | ((uint32_t)(data[3]) << 24);
return temp.f;
}
Instead of
*(float *)(mybuffer + offset) you use
get_float32le(mybuffer + offset). This is equivalent to using Python
struct.unpack("<f", mybuffer[offsetffset+4]).
While it looks like the above function is "slow", using GCC on x86-64 with
-O2 (as is common and recommended), it optimizes to just a single
MOVSS SSE/AVX machine instruction. (Verified on GCC 7.5.0.)
The code itself is portable, and will work on any architecture where 'float' is IEEE-754 Binary32.
Teensy 3.x and 4.x use IEEE-754 Binary32 in little-endian byte order, so as long as you observe correct alignment, you don't need to do anything special.
The inverse, storing a float in little-endian byte order to an unaligned buffer, (equivalent to Python struct.pack("<f", value)) is e.g.
Code:
#include <stdint.h>
static inline void set_float32le(void *dst, const float value)
{
unsigned char *const data = dst;
const union {
float f;
uint32_t u;
} temp = { .f = value };
data[0] = temp.u,
data[1] = temp.u >> 8,
data[2] = temp.u >> 16,
data[3] = temp.u >> 24;
}
but GCC tends to compile it to less optimal code, not just a single store.
Over a decade ago, I wrote some routines in Fortran and C to store and access IEEE-754 Binary64 ("double") data in arbitrary byte order, using a "prototype" value for both byte order and format identification.
For example, 65432.125 in IEEE-754 Binary32 ("float") is 0_10001110_(1)11111111001100000100000 in binary, and corresponds to 32-bit unsigned integer 1199544352 = 0x477F9820 if the floating-point and integer byte orders are the same.
There are four possible byte orders for 32-bit values, and eight for 64-bit values. Floating-point accessors that can use any byte order are e.g.
Code:
float float32(const void *const src, const unsigned char order)
{
const unsigned char *const data = src;
union {
float f;
uint32_t u;
unsigned char c[4];
} temp = { .c = { data[0], data[1], data[2], data[3] } };
if (order & 1)
temp.u = ((temp.u & 0x00FF00FF) << 8)
| ((temp.u >> 8) & 0x00FF00FF);
if (order & 2)
temp.u = ((temp.u & 0x0000FFFF) << 16)
| ((temp.u >> 16) & 0x0000FFFF);
return temp.f;
}
double float64(const void *const src, const unsigned char order)
{
const unsigned char *const data = src;
union {
float f;
uint64_t u;
unsigned char c[8];
} temp = { .c = { data[0], data[1], data[2], data[3], data[4], data[5], data[6], data[7] } };
if (order & 1)
temp.u = ((temp.u & UINT64_C(0x00FF00FF00FF00FF)) << 8)
| ((temp.u >> 8) & UINT64_C(0x00FF00FF00FF00FF));
if (order & 2)
temp.u = ((temp.u & UINT64_C(0x0000FFFF0000FFFF)) << 16)
| ((temp.u >> 16) & UINT64_C(0x0000FFFF0000FFFF));
if (order & 4)
temp.u = ((temp.u & UINT64_C(0x00000000FFFFFFFF)) << 32)
| ((temp.u >> 32) & UINT64_C(0x00000000FFFFFFFF));
return temp.f;
}
which can be used for data access, but more importantly can be used to test if the prototype values are recognized:
Code:
int float32_endian(const void *src, const float prototype)
{
int order;
for (order = 0; order < 4; order++)
if (float32(src, order) == prototype)
return order;
return -1;
}
int float64_endian(const void *src, const double prototype)
{
int order;
for (order = 0; order < 8; order++)
if (float64(src, order) == prototype)
return order;
return -1;
}
Both functions return the byte order that parses the value as the prototype, or -1 if the format does not match. Simples!
For bulk data conversion, I wrote optimized versions for no-byte-order-change (0, basically memmove()), and reverse-byte-order-change (~0), with the above "slow" version handling any of the other byte orders since I've never encountered them in the wild (but who knows, might exist). It is not human-slow, however; we're talking about whether accessing a gigabyte of data takes no appreciable time, or a fraction of a second.
This may be relevant to some Teensyduino developers. If you implement a Teensy gadget that stores binary data to files on an SD card, you might wish to make the format portable by adding a header that contains a suitable prototype value for each type (float, uint32_t, et cetera) you use, and order the elements in the structure so that they are aligned to their size and no padding is needed. Then, an application that processes/converts those files can trivially check and compensate for the byte order,
if it ever happened to differ from what the Teensy uses – without having to do any sort of #if - #endif preprocessor macro shenanigans. Even the above "slow" arbitrary-byte-order accessor functions are so fast on current computers that accessing a few million entries takes an insignificant fraction of a second, much less than a human can perceive; but the code itself is perfectly portable and byte-order-agnostic, and lets you just not worry about it.
Just like JPEG and PNG files, using binary data does not need to mean "unportable" or "hardware-specific". It takes just a bit of thinking beforehand, and verifying the code works as intended.
(My routines were used to let a distributed molecular dynamics simulation with a couple of hundred million atoms store snapshots of the system locally, with minimal delay to the simulation itself, and keeping individual files to a manageable size; with a helper library (in Fortran and in C) and a helper utility using that library that allowed the user to slice the system in time and/or space, outputting the slice in a standard format, while accessing the data spread over a large number of files. The entire dataset was just under two terabytes, if I recall correctly.)