Data type with Teensy4.0

Yes, thanks, I was wondering because I have a code that when I use double datasets and run it on Teensy, it gives different results than in ESP32 for the same parameters and dataset.
 
The Teensy 4x does it in hardware (where possible), and there is no sign that it would work wrong.
Can you post code, please? I'll run it on both, Teensy and ESP.
 
Here is a simple float/double sketch that I've run on ESP32 and Teensy 4. you can change the #if 0 to #if 1 to select float or double. Try it on your hardware.
Code:
// ll2utm
//  double/float performance test
// from Tom.java (wrong) convert.php   UTM.pm

#define REPS 1000

#if 0
#define FLTDBL float
#define MSG "float "
#define PI (3.141592653589793f)
#define SIN sinf
#define COS cosf
#define TAN tanf
#define SQRT sqrtf
#define POW powf
#else
#define FLTDBL double
#define MSG "double "
#define PI (3.141592653589793)
#define SIN sin
#define COS cos
#define TAN tan
#define SQRT sqrt
#define POW pow
#endif

FLTDBL lat = 35.0;       // expect 226202E 3877157N 17
FLTDBL lon = -84.0;
FLTDBL  UTMEasting, UTMNorthing;
FLTDBL Lat, Long;
int zone;

void ll2utm(FLTDBL Lat, FLTDBL Long) {
  //converts lat/long to UTM coords.  Equations from USGS Bulletin 1532
  //East Longitudes are positive, West longitudes are negative.
  //North latitudes are positive, South latitudes are negative
  //Lat and Long are in decimal degrees
  //Does not take into account thespecial UTM zones between 0 degrees and
  //36 degrees longitude above 72 degrees latitude and a special zone 32
  //between 56 degrees and 64 degrees north latitude
  //Written by Chuck Gantz- chuck.gantz@globalstar.com
  FLTDBL deg2rad = PI / 180;
  FLTDBL rad2deg = 180.0 / PI;

  //	FLTDBL a = 6378206.4;			// nad27
  //	FLTDBL eccSquared = 0.006768658;
  FLTDBL a = 6378137;			// wgs84/nad83
  FLTDBL eccSquared = 0.00669438;
  FLTDBL k0 = 0.9996;

  FLTDBL LongOrigin;
  FLTDBL eccPrimeSquared;
  FLTDBL N, T, C, A, M;

  FLTDBL LatRad = Lat * deg2rad;
  FLTDBL LongRad = Long * deg2rad;
  FLTDBL LongOriginRad;

  //compute the UTM Zone from the latitude and longitude
  zone = (int)((Long + 180) / 6) + 1;
  if ( lat >= 56.0 && lat < 64.0 && lon >= 3.0 && lon < 12.0 )
    zone = 32;
  // Special zones for Svalbard.
  if (lat >= 72.0 && lat < 84.0 )
  {
    if ( lon >= 0.0 && lon < 9.0 )
      zone = 31;
    else if ( lon >= 9.0 && lon < 21.0 )
      zone = 33;
    else if ( lon >= 21.0 && lon < 33.0 )
      zone = 35;
    else if ( lon >= 33.0 && lon < 42.0 )
      zone = 37;
  }
  LongOrigin = ( zone - 1 ) * 6 - 180 + 3; // +3 puts origin in middle of zone
  LongOriginRad  = LongOrigin * deg2rad;
  eccPrimeSquared = (eccSquared) / (1 - eccSquared);

  N = a / SQRT(1 - eccSquared * SIN(LatRad) * SIN(LatRad));
  T = TAN(LatRad) * TAN(LatRad);
  C = eccPrimeSquared * COS(LatRad) * COS(LatRad);
  A = COS(LatRad) * (LongRad - LongOriginRad);

  M = a * ((1	- eccSquared / 4		- 3 * eccSquared * eccSquared / 64 -
            5 * eccSquared * eccSquared * eccSquared / 256) * LatRad
           - (3 * eccSquared / 8	+ 3 * eccSquared * eccSquared / 32	+
              45 * eccSquared * eccSquared * eccSquared / 1024) * SIN(2 * LatRad)
           + (15 * eccSquared * eccSquared / 256 + 45 * eccSquared * eccSquared * eccSquared / 1024) * SIN(4 * LatRad)
           - (35 * eccSquared * eccSquared * eccSquared / 3072) * SIN(6 * LatRad));

  UTMEasting = (FLTDBL)(k0 * N * (A + (1 - T + C) * A * A * A / 6
                                  + (5 - 18 * T + T * T + 72 * C - 58 * eccPrimeSquared) * A * A * A * A * A / 120)
                        + 500000.0);

  UTMNorthing = (FLTDBL)(k0 * (M + N * TAN(LatRad) * (A * A / 2 + (5 - T + 9 * C + 4 * C * C) * A * A * A * A / 24
                               + (61 - 58 * T + T * T + 600 * C - 330 * eccPrimeSquared) * A * A * A * A * A * A / 720)));
  if (Lat < 0)
    UTMNorthing += 10000000.0; //10000000 meter offset for southern hemisphere
}


void utm2ll()
{
  //   adapted from Chuck Gantz- chuck.gantz@globalstar.com
  FLTDBL deg2rad = PI / 180;
  FLTDBL rad2deg = 180.0 / PI;

  FLTDBL k0 = 0.9996;
  //	FLTDBL a = 6378206.4;			// nad27
  //	FLTDBL eccSquared = 0.006768658;
  FLTDBL a = 6378137;			// wgs84/nad83
  FLTDBL eccSquared = 0.00669438;
  FLTDBL eccPrimeSquared;
  FLTDBL e1 = (1 - SQRT(1 - eccSquared)) / (1 + SQRT(1 - eccSquared));
  FLTDBL N1, T1, C1, R1, D, M;
  FLTDBL LongOrigin;
  FLTDBL mu, phi1, phi1Rad;
  FLTDBL x, y;
  FLTDBL ZoneNumber;

  x = UTMEasting - 500000.0; //remove 500,000 m offset for longitude
  y = UTMNorthing;

  ZoneNumber = zone;

  LongOrigin = (ZoneNumber - 1) * 6 - 180 + 3; //+3 puts origin in middle of zone

  eccPrimeSquared = (eccSquared) / (1 - eccSquared);

  M = y / k0;
  mu = M / (a * (1 - eccSquared / 4 - 3 * eccSquared * eccSquared / 64 - 5 * eccSquared * eccSquared * eccSquared / 256));

  phi1Rad = mu	+ (3 * e1 / 2 - 27 * e1 * e1 * e1 / 32) * SIN(2 * mu)
            + (21 * e1 * e1 / 16 - 55 * e1 * e1 * e1 * e1 / 32) * SIN(4 * mu)
            + (151 * e1 * e1 * e1 / 96) * SIN(6 * mu);
  phi1 = phi1Rad * rad2deg;

  N1 = a / SQRT(1 - eccSquared * SIN(phi1Rad) * SIN(phi1Rad));
  T1 = TAN(phi1Rad) * TAN(phi1Rad);
  C1 = eccPrimeSquared * COS(phi1Rad) * COS(phi1Rad);
  R1 = a * (1 - eccSquared) / POW(1 - eccSquared * SIN(phi1Rad) * SIN(phi1Rad), 1.5);
  D = x / (N1 * k0);

  Lat = phi1Rad - (N1 * TAN(phi1Rad) / R1) * (D * D / 2 - (5 + 3 * T1 + 10 * C1 - 4 * C1 * C1 - 9 * eccPrimeSquared) * D * D * D * D / 24
        + (61 + 90 * T1 + 298 * C1 + 45 * T1 * T1 - 252 * eccPrimeSquared - 3 * C1 * C1) * D * D * D * D * D * D / 720);
  Lat = Lat * rad2deg;

  Long = (D - (1 + 2 * T1 + C1) * D * D * D / 6 + (5 - 2 * C1 + 28 * T1 - 3 * C1 * C1 + 8 * eccPrimeSquared + 24 * T1 * T1)
          * D * D * D * D * D / 120) / COS(phi1Rad);
  Long = LongOrigin + Long * rad2deg;
}

void setup() {
  Serial.begin(9600);
  while (!Serial);
}

void loop() {
  uint32_t t;


  t = micros();
  for (int i = 0; i < REPS; i++) {
    ll2utm(lat, lon);
    utm2ll();
  }
  t = micros() - t;
  //  Serial.printf("%f %f  %.0fE %.0fN %d %f %f %d us\n",
  //                lat, lon, ceil(UTMEasting), ceil(UTMNorthing), zone, Lat, Long, t);
  float err = sqrtf((lat - Lat) * (lat - Lat) + (lon - Long) * (lon - Long));
  Serial.print(MSG); Serial.print(REPS); Serial.print(" reps  ");
  Serial.print(t);  Serial.print(" us   err ");
  Serial.println(err, 6);
  delay(5000);
}
optimization Faster on T4 and -O2 on ESP32 -- error calculation looks the same for me. ESP32 has hardware float, T4 has hardware double and float.
Code:
ESP32 -O2
 float 1000 reps  30516 us   err 0.000004    
 double 1000 reps  253389 us   err 0.000000

T4  Faster
 float 1000 reps  4238 us   err 0.000004
 double 1000 reps  7758 us   err 0.000000

you can uncomment the Serial.printf() to see the lat,lon and UTM results

... more performance comparisons or here
 
Last edited:
@manitou , what do you mean in this " ESP32 has hardware float, T4 has hardware double and float "
 
Does Teensy4.0 deal with double dataset?
For this link https://www.pjrc.com/store/teensy41.html

says: "Float point math unit, 64 & 32 bits"
Does this mean that it only deals with float datasets ?

All Teensy LC, 3.x, and 4.x microprocessors have both 32-bit float and 64-bit double. The long double is also 64 bits. In terms of how they handle the floating point types:

  • Teensy LC, 3.0, 3.1, and 3.2: both float and double are emulated in software. Floating constants without a suffix are treated as float;
  • Teensy 3.5 and 3.6; float is done in hardware, double is emulated in software. Floating constants without a suffix are treated as float;
  • Teensy 4.0, 4.1, and Micromod: both float and double are done in hardware. Floating constants without a suffix are treated as double.

In the original AVR Arduino processors like the 328p that is used in the Arduino UNO and the 32u4 used in the Arduino Leonardo and the Teensy 2.0/2.0++, both float and double are 32-bits, and are emulated in software.

All ARM based micro-processors should have both 32-bit float and 64-bit double types. Whether the types are emulated in software or done in hardware depends on which microprocessor is used. And whether floating point constants are considered to be 32-bit or 64-bit may depend on who set up the IDE compilation defaults.
 
@Frank B, No, is the same I'm just change the microcontrollers

Yes. and?
If it's abufferoverflow for example, an array ot of range or whatever the memory contents are just random.
Its still a bug.
Manitou showed you that the results are the same.
 
I’ve just found that testing two 64 bit integers for equality returns true sometimes when they are different! (On Teensy 4.1)
To fix the error I had to test the lower 32 bits ‘&&’ the higher 32 bits. This fixed it!
I wonder why?

Malcolm Messiter
 
Please provide a runnable sketch that demonstrates the problem.

What IDE and OS are you using? what optimizations (compiler settings) are you using?
 
Here’s the code:

Code:
void CompareModelsIDs(){ // The saved MacAddress is compared with the one just received from the model ... etc ...
    
    uint8_t SavedModelNumber = ModelNumber;
    if (ModelMatched) return; // must not change when model connected
    GotoFrontView();
    RestoreBrightness();
    if (ModelIdentified) {                                                //  We have both bits of Model ID?      
        if ((ModelsMacUnion.Val32[0] == ModelsMacUnionSaved.Val32[0]) && (ModelsMacUnion.Val32[1] == ModelsMacUnionSaved.Val32[1])) // heer  
            {
                if (AnnounceConnected) {
                    if (AutoModelSelect){
                        PlaySound(MMMATCHED); 
                        DelayWithDog(1500);
                    }   
                }
                ModelMatched = true;                                      //  It's a match so start flying!
                return;
            } else {
                if (AutoModelSelect)
                { //  It's not a match so search for it.
                    ModelNumber = 0;
                    while ((ModelMatched == false) && (ModelNumber < MAXMODELNUMBER - 1)) {   //  Try to match the ID with a saved one
                        ++ModelNumber;
                        ReadOneModel(ModelNumber);
                         if  ((ModelsMacUnion.Val32[0] == ModelsMacUnionSaved.Val32[0]) && (ModelsMacUnion.Val32[1] == ModelsMacUnionSaved.Val32[1])){
                            ModelMatched = true;
                        }
                    }
                    if (ModelMatched) {                                       //  Found it!    
                        UpdateModelsNameEveryWhere();                         //  Use it.
                        if (AnnounceConnected) 
                        {
                            PlaySound(MMFOUND);
                            DelayWithDog(1500);
                        }
                        SaveAllParameters();                                  //  Save it
                        GotoFrontView();
                        
                    }else{                                           
                        ModelNumber = SavedModelNumber; //  Not found, so bind to the restored selected one
                        ReadOneModel(ModelNumber);
                        BindNow();
                        if (AutoModelSelect)
                        {
                            PlaySound(MMSAVED); 
                            DelayWithDog(1700);
                        }   
                    }
                } 
                if (!AutoModelSelect) 
                {     
                    BindNow(); 
                }
        }
    }
    
}
/****************
 
Last edited by a moderator:
64bit integer compare issue

Please provide a runnable sketch that demonstrates the problem.

What IDE and OS are you using? what optimizations (compiler settings) are you using?

I’m using latest Mac OS on M1 Max and PlatformIO in VS Code.
I’ll test in a tiny sketch to see if it can be demonstrated.
I suspect it’s comparing the higher 32 bit and ignoring the lower.
 
No Teensy 4 compare failures with this sketch with arduino 1.8.19 and teensyduino 1.58
Code:
static uint64_t cnt = 0;

void check(uint64_t x, uint64_t y) {
  if (x != y) {
    Serial.printf("ERROR %llx %llx  cnt = %lld\n", x, y, cnt);
  }
}
void setup() {
  while (!Serial);
  uint64_t x = 0x1234567812345678;
  Serial.printf("x = %llx\n", x);

}

void loop() {
  uint64_t a, b;

  a = (uint64_t)(uint32_t)random() << 32 | (uint32_t)random();
  b = a;
  if (++cnt % 1000000 == 0) b++; // trigger fail
  check(a, b);

}
Every million random numbers, "fail" is forced with b++
 
My similar tiny test also refused to fail!
Something as yet untraced is going on here.
I apologise if I wasted your time.
Incidentally, the unique Mac number of each Teensy 4.0 (in each model)
Is used to identify the model. Of course it’s not a full 64 bit number.
I think it’s only 48 bits stored in 64.
The transmitter loads the needed parameters on
Identifying the model - so it’s a pity to get the wrong one!
 
No Teensy 4 compare failures with this sketch with arduino 1.8.19 and teensyduino 1.58

Code in msg #19 is not a valid test. The compiler is able to notice variable b was assigned from variable a, so it knows they will always be equal.

Looking at the generated assembly, loop() has only 1 conditional test, at address 4de.

Code:
000004b0 <loop>:

void loop() {
     4b0:       b5f0            push    {r4, r5, r6, r7, lr}
     4b2:       b085            sub     sp, #20
[COLOR="#008000"]  uint64_t a, b;[/COLOR]

[COLOR="#008000"]  a = (uint64_t)(uint32_t)random() << 32 | (uint32_t)random();[/COLOR]
     4b4:       f002 fe7e       bl      31b4 <random>
     4b8:       4607            mov     r7, r0
     4ba:       f002 fe7b       bl      31b4 <random>
[COLOR="#008000"]  b = a;[/COLOR]
[COLOR="#008000"]  if (++cnt % 1000000 == 0) b++; // trigger fail[/COLOR]
     4be:       4910            ldr     r1, [pc, #64]   ; (500 <loop+0x50>)
     4c0:       4a10            ldr     r2, [pc, #64]   ; (504 <loop+0x54>)
     4c2:       e9d1 5400       ldrd    r5, r4, [r1]
     4c6:       3501            adds    r5, #1
     4c8:       f144 0400       adc.w   r4, r4, #0
     4cc:       e9c1 5400       strd    r5, r4, [r1]
[COLOR="#008000"]  a = (uint64_t)(uint32_t)random() << 32 | (uint32_t)random();[/COLOR]
     4d0:       4606            mov     r6, r0
[COLOR="#008000"]  if (++cnt % 1000000 == 0) b++; // trigger fail[/COLOR]
     4d2:       2300            movs    r3, #0
     4d4:       4628            mov     r0, r5
     4d6:       4621            mov     r1, r4
     4d8:       f001 fc8a       bl      1df0 <__aeabi_uldivmod>
     4dc:       4313            orrs    r3, r2[COLOR="#FF0000"]
     4de:       d10c            bne.n   4fa <loop+0x4a>[/COLOR]
     4e0:       1c73            adds    r3, r6, #1
[COLOR="#008000"]    Serial.printf("ERROR %llx %llx  cnt = %lld\n", x, y, cnt);[/COLOR]
     4e2:       9300            str     r3, [sp, #0]
[COLOR="#008000"]  if (++cnt % 1000000 == 0) b++; // trigger fail[/COLOR]
     4e4:       f147 0300       adc.w   r3, r7, #0
[COLOR="#008000"]    Serial.printf("ERROR %llx %llx  cnt = %lld\n", x, y, cnt);[/COLOR]
     4e8:       9301            str     r3, [sp, #4]
     4ea:       4907            ldr     r1, [pc, #28]   ; (508 <loop+0x58>)
     4ec:       4807            ldr     r0, [pc, #28]   ; (50c <loop+0x5c>)
     4ee:       9502            str     r5, [sp, #8]
     4f0:       9403            str     r4, [sp, #12]
     4f2:       4632            mov     r2, r6
     4f4:       463b            mov     r3, r7
     4f6:       f000 ffdb       bl      14b0 <Print::printf(char const*, ...)>
[COLOR="#008000"]  check(a, b);[/COLOR]

     4fa:       b005            add     sp, #20
     4fc:       bdf0            pop     {r4, r5, r6, r7, pc}
     4fe:       bf00            nop
     500:       1fff0eb8        .word   0x1fff0eb8
     504:       000f4240        .word   0x000f4240
     508:       00008bdc        .word   0x00008bdc
     50c:       1fff0738        .word   0x1fff0738

You can see that conditional test just skips past the rest of the code when cnt % 1000000 is not zero. The rest of the code doesn't do any conditional test. The check() function gets inlined, and the compiler discards the comparison of a and b. In fact, you can see variable b doesn't even exist in the compiled code.

The compiler optimizer is very good at following data dependency and removing redundant code!
 
The full code is over 12,000 lines.
All on GitHub “LockDownRadioControl”

I can take a quick look. But only quickly.

Please give me a link to your github repository? If it has more than 1 branch, tell me exactly which code I need to check out.

Please also be specific with exactly which line number in which file has the conditional test you believe the compiler is not implementing properly?

I will try compiling your code and then find at place in the generated assembly. I can do this pretty quickly. But I'm not going to hunting and searching just to find your code on github and then try to match it up to the code sample you gave on this thread. I need you to be 100% clear about the exact place in the 12000 lines you believe is wrongly implemented.
 
My mistake!

I can take a quick look. But only quickly.

Please give me a link to your github repository? If it has more than 1 branch, tell me exactly which code I need to check out.

Please also be specific with exactly which line number in which file has the conditional test you believe the compiler is not implementing properly?

I will try compiling your code and then find at place in the generated assembly. I can do this pretty quickly. But I'm not going to hunting and searching just to find your code on github and then try to match it up to the code sample you gave on this thread. I need you to be 100% clear about the exact place in the 12000 lines you believe is wrongly implemented.

It’s ok! I have traced the error. It was mine. But not in the code! It was operator error! And I was the operator! Because I had moved receivers around between models, of course the Teensy 4.0 complete with its unique MAC address moved too. I’d forgotten i’d moved it. So the transmitter wrongly identified the model. All is now under control and I apologise for being an idiot!
 
Code in msg #19 is not a valid test. The compiler is able to notice variable b was assigned from variable a, so it knows they will always be equal.

Dang, though the question is mute, maybe using static volatile for a and b will avoid the compiler optimizations;

Code:
static uint64_t cnt = 0;
[COLOR="#0000FF"]static volatile uint64_t a, b;[/COLOR]

void check(uint64_t x, uint64_t y) {
  if (x != y) {
    Serial.printf("ERROR %llx %llx  cnt = %lld\n", x, y, cnt);
  }
}
void setup() {
  while (!Serial);
  uint64_t x = 0x1234567812345678;
  Serial.printf("x = %llx\n", x);

}

void loop() {
  

  a = (uint64_t)(uint32_t)random() << 32 | (uint32_t)random();
  b = a;
  if (++cnt % 1000000 == 0) b++; // trigger fail
  check(a, b);

}

Code:
  check(a, b);
     112:   e9d7 2300   ldrd    r2, r3, [r7]
     116:   e9d6 6700   ldrd    r6, r7, [r6]
  if (x != y) {
     11a:   42bb        cmp r3, r7
     11c:   bf08        it  eq
     11e:   42b2        cmpeq   r2, r6
     120:   d007        beq.n   132 <loop+0x72>
    Serial.printf("ERROR %llx %llx  cnt = %lld\n", x, y, cnt);

The test sketch also suffers from: "absence of proof is not proof of absence."
 
Last edited:
But not in the code! It was operator error!

Glad you found the problem.



maybe using static volatile for a and b will avoid the compiler optimizations;

Yes, that seems to work. Easy to see the full 64 bits really are compared, and nice to see the compiler makes use of the IT (If Then) instruction for efficient usage of the CPU's pipeline.

Just adding volatile on the local variables might also be enough.
 
Back
Top