Should We Bother Using Integer Math with Newer Teensies? Floating Point is Easier.

NewLinuxFan

Well-known member
I noticed that the Teensy 4.0 and 4.1 have a special processor for floating point math. With Arduinos I remember that it was always faster to use integer math, but does it really matter with the Teensy 4's? Sometimes C libraries that do advanced numerical analysis are commonly available in floating point, and it would be extra coding work to adapt them to integer math while being aware of potential rollover and truncation issues. Even correcting Arduino's map function to get even spacing is much simpler using floating point than writing more advanced map functions that use integer math. Do we need to bother with tedious integer coding with the newer more advanced Teensy's? Some people might say it's easy (maybe for you it is), but I think dealing with that stuff sucks and distracts my focus from the projects that I'm working on.

As an example, here's 3 improvements on the map function, copied from the Arduino forum:

C++:
float floatMap(float x, float in_min, float in_max, float out_min, float out_max) // just switching longs with floats
{
  return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}

long idealMap(long x, long in_min, long in_max, long out_min, long out_max) // written by PJRC
{
  long in_range = in_max - in_min;
  long out_range = out_max - out_min;
  if (in_range == 0) return out_min + out_range / 2;
  // compute the numerator
  long num = (x - in_min) * out_range;
  // before dividing, add extra for proper round off (towards zero)
  if (out_range >= 0)
  {
    num += in_range / 2;
  } else
  {
    num -= in_range / 2;
  }
  // divide by input range and add output offset to complete map() compute
  long result = num / in_range + out_min;
  // fix "a strange behaviour with negative numbers" (see ArduinoCore-API issue #51)
  //   this step can be deleted if you don't care about non-linear output
  //   behavior extrapolating slightly beyond the mapped input & output range
  if (out_range >= 0)
  {
    if (in_range * num < 0) return result - 1;
  } else {
    if (in_range * num >= 0) return result + 1;
  }
  return result;
}

long ifloor(long n, long d)
{
  return ((n % d) < 0L) ? (n / d - 1) : n / d;
}

long mapIntervals(long x, long in_min, long in_max, long out_min, long out_max) // written by Phoenix Williams
{
  if (in_min == in_max) return 0x7FFFFFFF; // slope is infinite; return max (long)
  long x1, x2, y1, y2;
  if (((in_max - in_min) < 0) != ((out_max - out_min) < 0)) {
    // Slope is negative
    x1 = min(in_min, in_max) - 1;
    x2 = max(in_min, in_max);
    y1 = max(out_min, out_max) + 1;
    y2 = min(out_min, out_max);
  }
  else {
    // Slope is positive
    x1 = min(in_min, in_max);
    x2 = max(in_min, in_max) + 1;
    y1 = min(out_min, out_max);
    y2 = max(out_min, out_max) + 1;
  }
  long dx = x2 - x1;
  long dy = y2 - y1;
  return ifloor((x - x1) * dy + y1 * dx, dx);
}
 
As with almost any topic in computers, the devil is in the details. And, it depends. Is it much easier to use the FPU? It sure is. So, that's basically what I do on a Teensy 4/4.1/MM. It's there, may as well use it. However, a few points:

1. FPU instructions are not necessarily or even usually one cycle. They take time. They take less time than emulating an FPU in software but more time than integer math. As such, quite often it actually still is faster to do all integer math. It isn't necessarily more accurate or as flexible but it is faster.

2. HOWEVER, the FPU and the ALU (arithmetic logic unit) are separate hardware. As such, it is possible to use both at the same time. So, using the FPU is sort of like having a separate processor that runs in lockstep with the main one and can calculate floating point calculations for you. Because of this, you may be able to get FPU instructions nearly "for free" as you continue to fill the ALU pipeline at the same time.

3. The FPU has its own registers so using FPU ops could lower register contention on the rest of the processor which could speed things up.

Taken together with all the other myriad of corner cases and such, it may or may not be faster. But, using the FPU could be faster and it's way easier to let the hardware auto-range where the decimal point should be.

TL;DR - If you're actually curious, benchmark it both ways and see which one is faster out of the box. Then, if you really want to go into the weeds, try to hand write some assembly with the hardware in mind to try to maximize throughput out of all the hardware. It won't be simple.
 
I must confess my low enthusiasm for more fiddling with the Arduino map() function. I have poured way too much time into map() over the years.

Yes, replacing it with an all-float implementation might be "simpler". But the "simplest" course of action is to leave it alone.

Today on Teensy 4 we have a C++ template that does indeed use float or double, and returns float or double, when the input number is float or double. But when the input variable as any integer type, the return type is long (same as Arduino's map function) and the math is done using that complicated "ideal" version.

Whether the float version gives the same round off behavior as the "ideal" integer version for all the cases would need to be thoroughly tested. There are a lot of cases many people don't initially think about, like mapping a range of negative number to positive, or ranges that both span positive and negative, and whether inputs beyond the mapped input range properly extrapolate to the corresponding larger output range, proper round off, and so on...

On Teensy 4, the float version probably is faster. Maybe a lot faster? I haven't done benchmarking. Would be really interesting to see, if anyone wants to put that work in. But personally, I'm exhausted when it comes to Arduino map(). Just writing this reply is using up the very last of my energy for map().

In the float version does turn out to be faster, anyone can get that speed on Teensy 4 simply by typecasting the input variable to float. The C++ template automatically does all the math as float in that case. Or if you typecast to double, it uses 64 bit double and returns a double result.
 
There's one more thing to consider when weighing up the performance of integer math vs floating point: registers that get saved when an exception (aka interrupt) occurs.
The T4's CPU has a mechanism that mostly avoids the penalty of saving the FPU registers, but that goes out the window if you use FPU code inside ISRs.
 
In general if the processor has an FPU use it... As I understand it with dual-issue support the T4's support single float ops upto 1.2GFlops, and double floats upto 0.6GFlops. Assuming the compiler gets everything just right of course...
 
Back
Top