Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 6 of 6

Thread: fast pow function

  1. #1
    Senior Member Projectitis's Avatar
    Join Date
    Feb 2018
    Location
    New Zealand
    Posts
    146

    fast pow function

    Hi all,

    I found this fast "pow" approximation function for floats:
    Code:
    inline float fastPow(float a, float b) {
        union {
            float d;
            int x;
        } u = { a };
        u.x = (int)(b * (u.x - 1072632447) + 1072632447);
        return u.d;
    }
    There is also this related fastPow for doubles (which I haven't tried as I'm working with floats):
    Code:
    inline double fastPow(double a, double b) {
      union {
        double d;
        int x[2];
      } u = { a };
      u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
      u.x[0] = 0;
      return u.d;
    }
    here is an example where I am raising 2 to the power of a random number 1 <= 10 (just for testing - I realise bit shifting is much faster).
    fastPow (incorrect) is on the left and pow (correct) on the right.
    Code:
    2.00    2.00
    2.26    4.00
    3.32    64.00
    2.53    8.00
    2.26    4.00
    2.26    4.00
    2.00    2.00
    3.85    256.00
    4.23    512.00
    2.26    4.00
    3.06    32.00
    2.00    2.00
    3.59    128.00
    I realise that fastPow is only an approximation and will have error, but the results I'm getting are wildly inaccurate.
    Does anyone have any idea why fastPow is not working? Could it be an endian issue?

    Full code to reproduce:
    Code:
    #include <Math.h>
    
    inline float fastPow(float a, float b) {
    	union {
    		float d;
    		int x;
    	} u = { a };
    	u.x = (int)(b * (u.x - 1072632447) + 1072632447);
    	return u.d;
    }
    
    void setup() {
    	// put your setup code here, to run once:
    	Serial.begin(9600);
    	delay(1000);
    
    	uint32_t i;
    	float f, ans;
    	
    	Serial.println(F("Compare: fastPow, pow"));
    	for (i=0; i<100; i++){
    		f = random(1,11);
    		ans = fastPow( 2, f );
    		Serial.print(ans);
    		Serial.print(F("    "));
    		ans = pow( 2, f );
    		Serial.println(ans);
    	}
    }
    
    void loop() {
      // put your main code here, to run repeatedly:
    	delay(100);
    }

  2. #2
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,250
    Maybe putting a L after the constants helps

  3. #3
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    118
    Quote Originally Posted by Projectitis View Post
    I found this fast "pow" approximation function for floats:
    No, you didn't. You found a fast "pow" approximation for double-precision floating-point numbers, whose representation matches IEEE-754 binary64, on a little-endian byte order architecture (little-endian byte order for both integer and floating-point types, that is).

    (No, I'm not trying to be snarky: I'm trying to make a very precise point about floating-point types, and this type of approximation functions that manipulate the representation of floating-point values. Do read on, because this line of investigation leads to a solution, and that solution is shown below.)

    The integer constant 1072632447 is 00111111 11101111 00010010 01111111 in binary. Comparing to an IEEE-754 binary64, it can be read as 0 01111111110 (1)1111 00010010 01111111 [ 00000000 00000000 00000000 00000000 ], describing numeric value 0.971007823944091796875.

    The nearest corresponding value in binary32 is 0.971007823944, which has bit pattern 0 011111101 (1)1110001001001111111000. As an unsigned 32-bit integer, that bit pattern corresponds to value 1064866808. If you use that in your fastPow function, i.e.
    Code:
    #include <stdint.h>
    
    float fastpow(float a, float b)
    {
        union {
            float    f;
            uint32_t u;
        } temp;
        temp.f = a;
        temp.u = (uint32_t)(b * (float)(temp.u - 1064866808) + 1064866808);
        return temp.f;
    }
    you get much more reasonable approximations: for example, fastpow(2.0f, 2.0f) ≃ 4.23.

    Obviously, I don't know whether 0.971007823944 is the best magic constant to use for this. I simply took it as-is from the binary64 approximation. If you have a specific range of values, it should be possible to optimize it a bit. You can even write the exploratory function as
    Code:
    float fastpowmagic(const float a, const float b, const float c)
    {
        union {
            float    f;
            uint32_t u;
        } temp, magic;
        magic.f = c;
        temp.f = a;
        temp.u = (uint32_t)(b * (float)(temp.u - magic.u) + magic.u);
        return temp.f;
    }
    where the third parameter is the magic constant, and is probably slightly less than one.

    The fastpow() and fastpowmagic() functions shown in this post work on all architectures where float uses the IEEE-754 binary32 storage format (basically all, as of 2018), and the byte order is the same for both integers and floating-point types (again, basically all).

  4. #4
    Senior Member Projectitis's Avatar
    Join Date
    Feb 2018
    Location
    New Zealand
    Posts
    146
    Thanks for the lesson, very helpful!

    It does appear, though, that the author of the "inline float fastPow" function I posted above was attempting to write fastpow for 32-bit float - the differences (critical though they may be?) are that you are casting to unsigned 16bit int whereas they casted to signed, and they used a different magic constant.

  5. #5
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    118
    Quote Originally Posted by Projectitis View Post
    you are casting to unsigned 16bit int
    No, to unsigned 32-bit integer.

    Quote Originally Posted by Projectitis View Post
    whereas they casted to signed
    The union type should have an unsigned integer member of the appropriate size (uint32_t for float, uint64_t for double), because that way the binary representation is unambiguous.

    Even though temp.u is unsigned, (temp.u - 1064866808) yields a signed integer value, due to C arithmetic conversions (integer promotion rules, to be specific; see C11 6.3.1.8 for details) and int having at least as high rank as uint32_t does.

    Quote Originally Posted by Projectitis View Post
    and they used a different magic constant.
    The magic constant is the only difference, if you only consider the results the function yields.

    I have a pretty defensive coding style, as I'm used to working with different architectures and OSes (except for Windows, which I don't use, and haven't used in over a decade), so the other changes are basically stylistic. Except that I claim my code is more robust and portable.

  6. #6
    Senior Member Projectitis's Avatar
    Join Date
    Feb 2018
    Location
    New Zealand
    Posts
    146
    Quote Originally Posted by Nominal Animal View Post
    No, to unsigned 32-bit integer.
    Oh, sorry, that's what I meant

    Well - thank you. Much appreciated. I'll pay around with the magic constant and attempt to get a more accurate approximation for the range of base/exponent for my situation.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •