fast pow function

Status
Not open for further replies.

Projectitis

Well-known member
Hi all,

I found this fast "pow" approximation function for floats:
Code:
inline float fastPow(float a, float b) {
    union {
        float d;
        int x;
    } u = { a };
    u.x = (int)(b * (u.x - 1072632447) + 1072632447);
    return u.d;
}

There is also this related fastPow for doubles (which I haven't tried as I'm working with floats):
Code:
inline double fastPow(double a, double b) {
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;
  return u.d;
}

here is an example where I am raising 2 to the power of a random number 1 <= 10 (just for testing - I realise bit shifting is much faster).
fastPow (incorrect) is on the left and pow (correct) on the right.
Code:
2.00    2.00
2.26    4.00
3.32    64.00
2.53    8.00
2.26    4.00
2.26    4.00
2.00    2.00
3.85    256.00
4.23    512.00
2.26    4.00
3.06    32.00
2.00    2.00
3.59    128.00

I realise that fastPow is only an approximation and will have error, but the results I'm getting are wildly inaccurate.
Does anyone have any idea why fastPow is not working? Could it be an endian issue?

Full code to reproduce:
Code:
#include <Math.h>

inline float fastPow(float a, float b) {
	union {
		float d;
		int x;
	} u = { a };
	u.x = (int)(b * (u.x - 1072632447) + 1072632447);
	return u.d;
}

void setup() {
	// put your setup code here, to run once:
	Serial.begin(9600);
	delay(1000);

	uint32_t i;
	float f, ans;
	
	Serial.println(F("Compare: fastPow, pow"));
	for (i=0; i<100; i++){
		f = random(1,11);
		ans = fastPow( 2, f );
		Serial.print(ans);
		Serial.print(F("    "));
		ans = pow( 2, f );
		Serial.println(ans);
	}
}

void loop() {
  // put your main code here, to run repeatedly:
	delay(100);
}
 

No, you didn't. You found a fast "pow" approximation for double-precision floating-point numbers, whose representation matches IEEE-754 binary64, on a little-endian byte order architecture (little-endian byte order for both integer and floating-point types, that is).

(No, I'm not trying to be snarky: I'm trying to make a very precise point about floating-point types, and this type of approximation functions that manipulate the representation of floating-point values. Do read on, because this line of investigation leads to a solution, and that solution is shown below.)

The integer constant 1072632447 is 00111111 11101111 00010010 01111111 in binary. Comparing to an IEEE-754 binary64, it can be read as 0 01111111110 (1)1111 00010010 01111111 [ 00000000 00000000 00000000 00000000 ], describing numeric value 0.971007823944091796875.

The nearest corresponding value in binary32 is 0.971007823944, which has bit pattern 0 011111101 (1)1110001001001111111000. As an unsigned 32-bit integer, that bit pattern corresponds to value 1064866808. If you use that in your fastPow function, i.e.
Code:
#include <stdint.h>

float fastpow(float a, float b)
{
    union {
        float    f;
        uint32_t u;
    } temp;
    temp.f = a;
    temp.u = (uint32_t)(b * (float)(temp.u - 1064866808) + 1064866808);
    return temp.f;
}
you get much more reasonable approximations: for example, fastpow(2.0f, 2.0f) ≃ 4.23.

Obviously, I don't know whether 0.971007823944 is the best magic constant to use for this. I simply took it as-is from the binary64 approximation. If you have a specific range of values, it should be possible to optimize it a bit. You can even write the exploratory function as
Code:
float fastpowmagic(const float a, const float b, const float c)
{
    union {
        float    f;
        uint32_t u;
    } temp, magic;
    magic.f = c;
    temp.f = a;
    temp.u = (uint32_t)(b * (float)(temp.u - magic.u) + magic.u);
    return temp.f;
}
where the third parameter is the magic constant, and is probably slightly less than one.

The fastpow() and fastpowmagic() functions shown in this post work on all architectures where float uses the IEEE-754 binary32 storage format (basically all, as of 2018), and the byte order is the same for both integers and floating-point types (again, basically all).
 
Thanks for the lesson, very helpful!

It does appear, though, that the author of the "inline float fastPow" function I posted above was attempting to write fastpow for 32-bit float - the differences (critical though they may be?) are that you are casting to unsigned 16bit int whereas they casted to signed, and they used a different magic constant.
 
you are casting to unsigned 16bit int
No, to unsigned 32-bit integer.

whereas they casted to signed
The union type should have an unsigned integer member of the appropriate size (uint32_t for float, uint64_t for double), because that way the binary representation is unambiguous.

Even though temp.u is unsigned, (temp.u - 1064866808) yields a signed integer value, due to C arithmetic conversions (integer promotion rules, to be specific; see C11 6.3.1.8 for details) and int having at least as high rank as uint32_t does.

and they used a different magic constant.
The magic constant is the only difference, if you only consider the results the function yields.

I have a pretty defensive coding style, as I'm used to working with different architectures and OSes (except for Windows, which I don't use, and haven't used in over a decade), so the other changes are basically stylistic. Except that I claim my code is more robust and portable.
 
Status
Not open for further replies.
Back
Top