Forum Rule: Always post complete source code & details to reproduce any issue!
Page 1 of 2 1 2 LastLast
Results 1 to 25 of 34

Thread: Exponential Moving Average - How does it work?

  1. #1

    Exponential Moving Average - How does it work?

    Hello everyone

    I am using an exponential moving average to smooth the readings from an LSM303DLHC compass module and it seems to work. However, I do not understand how it works. Here is the code (reading = raw reading from the module):

    Code:
    uint16_t headingTotal = 0;
    uint8_t K = 2;
    
    int16_t headingAvg(int16_t reading)
    {
      headingTotal += reading;  
      int16_t average = headingTotal + (1 << (K - 1));
      if (reading < 0) {
        average -= 1;
      }
      headingTotal -= average;
      return average >> K;
    }
    To me, the filter: 1) adds the latest value to the headingTotal, 2) adds 2 to headingTotal to give average, 3) subtracts 1 from average if value < 0, 4) subtracts average from headingTotal and finally 5) divides average by 4 to produce the smoothed value. What is the point of adding 2 to headingTotal? How does subtracting 1 from average allow for signed integers?

    I have tried going through the maths on the website where the filter came from, but it is too much for my poor head:

    https://tttapa.github.io/Pages/Mathe...mentation.html
    https://tttapa.github.io/Pages/Mathe...g-Average.html

    Can someone please explain it in simple terms? Thanks.

  2. #2
    Senior Member
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    1,224
    I'm using exponential smoothing quite often. It is very convenient to smooth values on the fly without the need for a buffer. The version you are using seems to be some integer optimized version of the usual float algorithm.

    Code:
     
    constexpr float k = 0.1f;
    
    T_avg = T_avg + k *(T - T_avg);
    In this version it is much easier to understand how it works. You'd call this whenever you get a new value T. The algorithm simply calculates the difference of the new value T to the current average T_avg. It then adds a fraction of this difference to the average.

    If you e.g. start with T_avg = 0 and call the algorithm repetitively with a constant T = 1 you see that T_avg will slowly rise. The more it rises the smaller the difference of T_avg to T and the slower it rises. Actually it can never really reach T but it gets arbitrary close to it. (Actually it would follow some exponential curve, hence the name)

    The smaller k is the better the smoothing but the longer it takes to reach a constant value for a constant input. The larger k is (k < 1) the less smoothing you get. For k = 1 the average simply follows the current value T. For k = 0 the average doesn't change at all.

    Hope that helps.

  3. #3
    Senior Member
    Join Date
    Jul 2020
    Posts
    477
    I'd second that way of coding it as being clear, but it presupposes either fixed point fractions or floats.

    [ The original code using just integer arithmetic and has the problems this creates (needing a scaling down
    with the a shift, bias due to truncation rather than rounding, and the DC gain isn't immediately obvious). ]

    If you have a long time constant (i.e. small value of k) its best to treat the first sample specially and
    set the average directly from it, otherwise it will take a while to settle down.

    Technically this is a first-order digital low-pass filter, equivalent to an RC circuit(*) or such. The single real pole is at
    1-k inside the unit-circle and the gain is scaled to unity at DC. So for instance if k = 0.5, the pole at 0.5 will give a
    3-fold attenuation at the Nyquist limit (alternating 1,-1 samples for instance), as 0.5 is three times closer to +1 than -1.

    For small k the roll-off frequency is about k Fs / (2 pi) where Fs is the sample rate, which means if you want
    a roll-off at f (<< Fs), chose k = 2 pi f / Fs. In the presence of white noise the noise power will reduce by a
    factor of about pi / k, and thus noise amplitude by about √(pi/k)

    (*) more accurately for small values of k.

  4. #4
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    No, OP's code is not right. A correct implementation would be
    Code:
    constexpr int_fast16_t fractional_bits = 2;
    int_fast16_t heading = 0;
    
    int_fast16_t average_heading(int_fast16_t sample)
    {
        int_fast16_t average;
        heading += sample;
        average = (heading - (heading < 0) + (1 << (fractional_bits - 1))) >> fractional_bits;
        heading -= average;
        return average;
    }
    This implements the following maths:
            heading = heading + (sample / 2fractional_bits)
            average = round(heading)
            heading = heading - average
            (returns average)

    The way it does this, is having heading be a fixed point number with fractional_bits bits for the fractional part. Essentially, although it is obviously an integer, it represents a mathematical value heading / 2fractional_bits.

    The tricky part is how average is rounded from heading, noting that average is a normal integer, and not a fixed-point value.

    The (heading < 0) part evaluates to 1 if heading is less than zero, and to zero otherwise. This is standard in C and C++: the result of a comparison decays to integer 1 if true, and to 0 if false.

    The (1 << (fractional_bits - 1)) represents the mathematical value 0.5 in this fixed point number.

    The final >> fractional_bits divides the left side by 2fractional_bits, rounding down. For positive values, it rounds towards zero; for negative values, towards the next smaller (more negative) integer. (As an example, -4>>2 == -3>>2 = -2>>2 = -1>>2 = -1, but -5>>2 == -2.)

    To convert from rounding down to rounding towards nearest integer, we need to add half; and subtract the unit in the least significant place if negative.

    So, the -(heading < 0) fixes the asymmetry when shifting negative integers right; and the +(1<<(fractional_bits-1)) turns the bit shift into rounding operation (because it adjusts the rounding down to rounding up from half).

    Final note: int16_t is sub-optimal on 32-bit Teensies (LC, 3.x, 4.x). The int_fast16_t is a standard type (although many programmers don't know about it), that is at least 16-bit, but larger if the current hardware prefers larger words. On 32-bit architectures, it is usually 32 bits.




    I do not expect anyone to take my word for this, though. I suggest you compile and run the following C program on your computer:
    Code:
    #include <stdlib.h>
    #include <inttypes.h>
    #include <string.h>
    #include <stdio.h>
    #include <math.h>
    
    int main(int argc, char *argv[])
    {
        int  n, nmin, nmax, bits, v;
        double b, d;
        char dummy;
    
        if (argc != 4 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help") || !strcmp(argv[1], "/?")) {
            const char *argv0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
            fprintf(stderr, "\n");
            fprintf(stderr, "Usage: %s [ -h | --help | /? ]\n", argv0);
            fprintf(stderr, "       %s BITS MIN MAX\n", argv0);
            fprintf(stderr, "\n");
            fprintf(stderr, "This program prints\n");
            fprintf(stderr, "    (N - (N < 0) + (N << (BITS - 1))) >> BITS\n");
            fprintf(stderr, "for N = MIN .. MAX, inclusive.\n");
            fprintf(stderr, "\n");
            return EXIT_FAILURE;
        }
    
        if (sscanf(argv[1], " %d %c", &bits, &dummy) != 1 || bits < 1) {
            fprintf(stderr, "%s: BITS must be a positive integer.\n", argv[1]);
            return EXIT_FAILURE;
        }
        if (sscanf(argv[2], " %d %c", &nmin, &dummy) != 1) {
            fprintf(stderr, "%s: MIN must be an integer.\n", argv[2]);
            return EXIT_FAILURE;
        }
        if (sscanf(argv[3], " %d %c", &nmax, &dummy) != 1 || nmax < nmin) {
            fprintf(stderr, "%s: MAX must be an integer, at least as large as MIN.\n", argv[3]);
            return EXIT_FAILURE;
        }
    
        b = pow(2.0, bits);
    
        for (n = nmin; n <= nmax; n++) {
            v = (n - (n < 0) + (1 << (bits - 1))) >> bits;
            d = round( (double)n / b );
            printf("%d %d %.0f\n", n, v, d);
        }
    
        return EXIT_SUCCESS;
    }
    supplying the number of fractional bits to consider, and the minimum and maximum integer value n, it shows the values of n and (n - (n < 0) + (1 << (bits-1))) >> bits and round(n / 2bits). You'll see that the second and third columns match, i.e. that the odd integer expression really does correct rounding.

    (You can safely consider -0 == 0.)

    For example, saving the above as ints.c and compiling it in Linux using gcc -Wall -Wextra -O2 ints.c -lm -o ints and running ./ints 2 -17 17 outputs
    Code:
    -17 -4 -4
    -16 -4 -4
    -15 -4 -4
    -14 -4 -4
    -13 -3 -3
    -12 -3 -3
    -11 -3 -3
    -10 -3 -3
    -9 -2 -2
    -8 -2 -2
    -7 -2 -2
    -6 -2 -2
    -5 -1 -1
    -4 -1 -1
    -3 -1 -1
    -2 -1 -1
    -1 0 -0
    0 0 0
    1 0 0
    2 1 1
    3 1 1
    4 1 1
    5 1 1
    6 2 2
    7 2 2
    8 2 2
    9 2 2
    10 3 3
    11 3 3
    12 3 3
    13 3 3
    14 4 4
    15 4 4
    16 4 4
    17 4 4
    which proves this expression correctly rounds (at least for bits == 2). It does work for other fractional bits counts, from 1 to 30 (on architectures with 32-bit ints) or thereabouts, but proving it all would make this post overly long.
    Last edited by Nominal Animal; 07-18-2020 at 03:36 PM.

  5. #5
    @Nominal Animal

    Brilliant answer, thank you so much for your time. My understanding is a lot clearer now. Also, I swapped your implementation for the one I was using and it works much better (I don't think mine was actually filtering anyway, it just appeared to be). Using fractional_bits = 8 seems to be the best compromise between speed and stability.

    You are right - I hadn't heard of the int_fast16_t before!

    Thanks also to luni and MarkT. It all helps!


  6. #6
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,914
    Quote Originally Posted by Nominal Animal View Post
    No, OP's code is not right. A correct implementation would be
    .....
    Not sure if it is the same, but I implement exp. averaging as

    float:
    avg = avg +aa*(x-avg), where aa is a small number 0<aa<1 and x in the input

    integer:
    avg = (avg*A + (x-avg))/A, where A is an integer >1. in particular A = 1/aa.

    to speed up I choose A to be a power of 2, so I can use right and left shifts.
    So, instead of, say using A=1000, I use A=1024 and formula becomes
    avg = (avg<<10 +(x-avg))>>10

    Obviously, it may be necessary to use say int32_t to average int16_t values, etc.

  7. #7
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    Exponential decay filtering,
            average = C·sample + (1-C)·average
    (where C is a real coefficient smaller than 1) is different to exponential moving average.

    Their main difference is that the decay filter retards the signal (peaks in the filtered signal occur later than in the samples), whereas the windowed filter does not.

    This particular exponential moving average filter is cheap to calculate, but if the source is noisy, a small amount of noise passes through (a few units in the least place). You can mitigate that by averaging a few samples, so the signal retardation is just a few samples, but the output is smooth.

  8. #8
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    733
    If speed is an issue, do use a power of two. But keep your code more readable and let the compiler worry about replacing division/multiply with a shift.

  9. #9
    Senior Member
    Join Date
    Jul 2020
    Posts
    477
    Quote Originally Posted by Nominal Animal View Post
    Exponential decay filtering,
            average = C·sample + (1-C)·average
    (where C is a real coefficient smaller than 1) is different to exponential moving average.
    That's 2 multiplies, whereas the the "average += k * (sample-average)" approach is only one multiply and doesn't risk problems
    when you change C and forget to change 1-C to match. If k is a negative power of 2 its one shift, no multiplies.

  10. #10
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    733
    > that's 2 multiplies, whereas the the "average += k * (sample-average)" approach is only one multiply

    In yet another case of it not being wise to outguess the compiler, the mathematically equivalent "average = C·sample + (1-C)·average" actually runs faster (T4.1, floating point). And there is no need to ever change 1-C when C changes. Even with integers (about the same speed, so why bother), one can just rearrange the formula slightly to avoid (1-C).
    Last edited by jonr; 07-19-2020 at 02:04 AM.

  11. #11
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,914
    So, I learned

    Quote Originally Posted by WMXZ View Post

    integer:
    avg = (avg*A + (x-avg))/A, where A is an integer >1. in particular A = 1/aa.
    is with tmp = avg*A -avg is (I think) equivalent to

    tmp = tmp + x
    avg = tmp/A
    tmp = tmp - avg

    https://tttapa.github.io/Pages/Mathe...mentation.html

  12. #12
    Senior Member
    Join Date
    Jul 2020
    Posts
    477
    Quote Originally Posted by jonr View Post
    > that's 2 multiplies, whereas the the "average += k * (sample-average)" approach is only one multiply

    In yet another case of it not being wise to outguess the compiler
    That's one compiler, at one optimize setting, on a machine with hardware floating point and super-scalar dual-issue, not
    wise to assume this is generally true.

  13. #13
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    Quote Originally Posted by MarkT View Post
    That's 2 multiplies, whereas the the "average += k * (sample-average)" approach is only one multiply and doesn't risk problems
    when you change C and forget to change 1-C to match. If k is a negative power of 2 its one shift, no multiplies.
    Eh, what? I was describing a mathematical formula, not suggesting code.

    There is an important distinction here, by the way.

    The often used mathematical form

            state = (1 - weightstate + weight·sample
            Return state

    is not exactly the same as

            state = state + weight·sample
            average = round(state)
            state = state - weight·average
            Return average

    The first one is equivalent to
            state = state + weight·(sample - state)
    which you can trivially verify by just reordering the terms.
    I call this the exponential decay filter.

    The second one (that returns average), called here the Exponential Moving Average, is similar, but not exactly the same: in this one, the state is a sum of two parts, the first part being equivalent to the state in the first form, and the second part being the fractional bits of the average scaled by weight. This causes the average to cycle between two integer values, even for a constant incoming sample value. For example, when all sample=2.5 and weight=0.5, then average = 2, 3, 2, 3, 2, 3, ... . Similarly, if sample varies between 9.5 and 10.5 (so average 10, with ±0.5 of noise), then average won't stay 10, but occasionally dips to 9 or 11. This has both upsides and downsides; for example, you could say it is both more precise and more noisy.

    This effect remains, even when sample is an integer, but has any noise.




    When you deal with a formula like (1-wa+w·b, the form I recommend one uses depends on the type of the variables used.

    When using integer types, the form a+w·(b-a) is recommended, because it uses only a single multiplication, but also because you can avoid integer overflow by using a+w·(b-a) when b>a, and a-w·(a-b) otherwise. The two are mathematically equivalent.

    When using floating-point types, the form (1-wa+w·b is recommended, because it is numerically more robust. In particular, the form a+w·(b-a) risks domain cancellation error, when b and a differ a lot in magnitude. The first form is robust against domain cancellation error: if you increase w from 0 to 1, the value of the expression will start from a and end at b, even if they differ a lot in magnitude. The second form, if a is much greater than b in magnitude, just goes from a to zero.

  14. #14
    Quote Originally Posted by Nominal Animal View Post
    No, OP's code is not right. A correct implementation would be ...
    Indeed, @xenington where did you get that code from? The pages you linked to seem to list the correct code (https://tttapa.github.io/Pages/Mathe...mentation-in-c).

    Quote Originally Posted by Nominal Animal View Post
    There is an important distinction here, by the way.

    The often used mathematical form

            state = (1 - weightstate + weight·sample
            Return state

    is not exactly the same as

            state = state + weight·sample
            average = round(state)
            state = state - weight·average
            Return average
    Where does the second formulation come from? The code in the original article implements the following:

            state = state + sample
            average = state * weight
            state = state - average
            Return average

    The derivation here shows the equivalence with your first form, using a change of variables, introducing a new variable z. The variable z is essentially a fixed-point representation of the original state y: z = y/weight. The actual state that's saved in the code is state - state * weight = state - average = z - y = state - state * weight = (1-weight) * state. In the next iteration, the input sample is added (no need to multiply by weight, since state is already scaled by weight), and that results in the same formula as your first mathematical form.
    This re-formulation is done to make the integer implementation more efficient.

    You can also quickly verify the equivalence numerically:
    Code:
    import numpy as np
    
    N = 100
    output_1 = np.zeros((N,))
    output_2 = np.zeros((N,))
    
    state_1 = 0
    state_2 = 0
    
    weight = .25
    
    rng = np.random.default_rng()
    
    for i in range(N):
        sample = rng.uniform(0, 1024)
    
        # 1
        state_1 = (1 - weight) * state_1 + weight * sample
        output_1[i] = state_1
    
        # 2
        state_2 += sample
        average = state_2 * weight
        state_2 = state_2 - average
        output_2[i] = average
    
    import matplotlib.pyplot as plt 
    
    plt.step(np.arange(N), output_1, label='1')
    plt.step(np.arange(N), output_2, label='2')
    plt.legend()
    plt.show()
    Pieter

  15. #15
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    733
    The "Exponential Moving Average filter" implementation using integers is slick - on a T4, about 2x faster than the simple float version.

  16. #16
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    Quote Originally Posted by PieterP View Post
    The code in the original article implements the following:

            state = state + sample
            average = state * weight
            state = state - average
            Return average
    Perhaps that was your intention, but fact is, the C++ implementation includes the rounding operation.

    Code to implement the above is different: you need to calculate average in fixed point as well, and round or truncate it only after the subtraction.

    As it stands, the rounding introduces quantization noise, because it (rounding) affects the state. If it didn't, it'd be a different matter.

  17. #17
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    Here's a simple Python3 test bench anyone can experiment with:
    Code:
    from random import Random
    from math import sin, pi
    
    class Source:
        __slots__ = ('prng', 'phase')
    
        def __init__(self, *args, **kwargs):
            self.prng = Random()
            self.phase = 0.0
    
        def __call__(self):
            self.phase = self.phase + pi / 127.0
            return round(17.5 * sin(self.phase) + 7.3 * sin(self.phase * 3.1) + self.prng.uniform(-5.0, +5.0))
    
    
    class Filter:
        __slots__ = ()
    
        def __init__(self, *args, **kwargs):
            pass
    
        def __call__(self, sample):
            return round(sample)
    
    
    class ExponentialFilter(Filter):
        __slots__ = ('state', 'weight')
    
        def __init__(self, weight, state=0):
            self.weight = float(weight)
            self.state = float(state)
    
        def __call__(self, sample):
            self.state = self.state + sample
            average = self.state * self.weight
            self.state = self.state - average
            return round(average)
    
    
    class RoundingFilter(Filter):
        __slots__ = ('state', 'weight')
    
        def __init__(self, weight, state=0):
            self.weight = float(weight)
            self.state = float(state)
    
        def __call__(self, sample):
            self.state = self.state + sample
            average = round(self.state * self.weight)
            self.state = self.state - average
            return average
    
    
    class DecayFilter(Filter):
        __slots__ = ('state', 'weight')
    
        def __init__(self, weight, state=0):
            self.weight = float(weight)
            self.state = float(state)
    
        def __call__(self, sample):
            self.state = (1 - self.weight)*self.state + self.weight*sample
            return round(self.state)
    
    
    if __name__ == '__main__':
        sample = Source()
        weight = 0.125
        filter1 = DecayFilter(weight)
        filter2 = ExponentialFilter(weight)
        filter3 = RoundingFilter(weight)
    
        print("#n x Decay(x) Exponential(x) Rounding(x)")
    
        N = 256
        n12 = 0
        n13 = 0
        n23 = 0
        for i in range(0, N):
            x = sample()
            y1 = filter1(x)
            y2 = filter2(x)
            y3 = filter3(x)
    
            n12 += (y1 == y2)
            n13 += (y1 == y3)
            n23 += (y2 == y3)
    
            print("%3d %3.0f   %3.0f %3.0f %3.0f" % (i, x, y1, y2, y3))
    
        print("# Decay(x) == Exponential(x) in %d of %d samples" % (n12, N))
        print("# Decay(x) == Rounding(x) in %d of %d samples" % (n13, N))
        print("# Exponential(x) == Rounding(x) in %d of %d samples" % (n23, N))
    The Source is the source of the original signal: a dual sinusoidal signal with periods 127.0 and 3.1/127.0 and amplitudes 17.5 and 7.3, respectively, with uniform random noise between -5 and +5 on top.

    Decay and Exponential are the two equivalent mathematical forms for the exponential filter; Rounding implements the "exponential moving average" filter that rounds average before subtracting it from the state.

    You'll see that Decay and Exponential always match (feel free to change the Source form!), but the Rounding filter occasionally differs. For this particular signal and noise level, abot 8% of the filtered samples differ by ±1.

    The output format is suitable for plotting in e.g. gnuplot:
    Code:
    plot 'output' u 1:2 t 'Signal' w lines lc -1, '' u 1:3 t 'Decay' w lines lc 1, '' u 1:4 t 'Exponential' w lines lc 2, '' u 1:5 t 'Rounding' w lines lc 3

  18. #18
    Ah, I see what you mean.
    The difference is of course that both 'Decay' and 'Exponential' use floating point states and operations, whereas the C++ implementation uses only integer arithmetic, so it's not a surprise that the output differs some of the time.

    The reason for the integer implementation is that it's significantly faster than the floating point implementation, especially in MCUs without hardware float support.

    When using integers, rounding the average before subtracting it from the state yields much better results than simply using a flooring division:
    Code:
    import numpy as np
    
    bits = 4
    N = 64
    output_1 = np.zeros((N,))
    output_2 = np.zeros((N,))
    output_3 = np.zeros((N,))
    output_4 = np.zeros((N,))
    
    state_1 = 0
    state_3 = 0
    state_4 = 0
    
    weight = .25
    recip_weight = int(1 / weight)
    
    rng = np.random.default_rng()
    
    for i in range(N):
        sample = rng.integers(2**bits)
    
        # 1: float reference
        state_1 = (1 - weight) * state_1 + weight * sample
        output_1[i] = state_1
    
        # 2: round(float reference)
        output_2[i] = round(output_1[i])
    
        # 3: integer state without rounding, output with rounding
        state_3 += sample
        average_round = (state_3 + recip_weight // 2) // recip_weight
        average = state_3 // recip_weight
        state_3 = state_3 - average
        output_3[i] = average_round
    
        # 4: integer state with rounding, output with rounding
        state_4 += sample
        average_round = (state_4 + recip_weight // 2) // recip_weight
        state_4 = state_4 - average_round
        output_4[i] = average_round
    
    mae12 = np.abs(output_1 - output_2).mean()
    mae13 = np.abs(output_1 - output_3).mean()
    mae14 = np.abs(output_1 - output_4).mean()
    print(f'{mae12=}')
    print(f'{mae13=}')
    print(f'{mae14=}')
    mse23 = (np.square(output_2 - output_3)).mean()
    mse24 = (np.square(output_2 - output_4)).mean()
    print(f'{mse23=}')
    print(f'{mse24=}')
    err_cnt23 = 100. * np.count_nonzero(output_2 - output_3) / N
    err_cnt24 = 100. * np.count_nonzero(output_2 - output_4) / N
    print(f'{err_cnt23=}%')
    print(f'{err_cnt24=}%')
    
    import matplotlib.pyplot as plt 
    
    plt.step(np.arange(N), output_1, where='post',
             label='1: float reference')
    plt.step(np.arange(N), output_2, where='post',
             label='2: round(float reference)')
    plt.step(np.arange(N), output_3, '--', where='post',
             label='3: integer state without rounding, output with rounding')
    plt.step(np.arange(N), output_4, '--', where='post',
             label='4: integer state with rounding, output with rounding')
    plt.legend()
    plt.xlim((0, N-1))
    plt.ylim((0, 2**bits))
    plt.show()
    Click image for larger version. 

Name:	Figure_1.png 
Views:	28 
Size:	45.7 KB 
ID:	21063
    The blue line is the "exact" floating point output. The orange line is the "exact" output rounded to the nearest integer. This orange line is used as the reference to compare the integer implementations with.

    The first integer implementation is (if I understood your reply correctly) what you propose: updating the state without rounding, only rounding the output.
    It is shown as the green line in the graph, and is often too high (because the flooring division doesn't subtract enough from the state, about half of the time).

    The second integer implementation first rounds the output, and then subtracts it from the state, it is the same as the C++ implementation proposed in the Exponential Moving Average pages. It's shown in red on the graph. As you can see, it follows the orange reference much more accurately.

    The code I posted above also computes the mean squared error between the two integer implementations and the rounded floating point reference. If you simulate with a uniform random input for a long time, you'll see that it settles to an MSE of 0.500 for the first integer implementation, and a MSE of 0.063 for the second implementation (the one that rounds before updating the state). This corresponds to an incorrect output 50% of the time vs 6.3% of the time, since the error is always ±1.

    I'm sure you could use more fractional bits in the fixed point representation to get a more accurate result, but that requires more shifting (which is slower, especially on AVR) and larger integer types. Besides, the second implementation is accurate enough for 99% of smoothing applications, occasionally being one bit off on a 10-bit ADC value is usually not an issue.

    Especially if you compare the mean absolute error between the rounded floating point output and the "exact" floating point output and the mean absolute error between the second integer implementation and the "exact" floating point output:
    Code:
    mae12=0.24995945723741
    mae13=0.5026141932125929
    mae14=0.255913320189325
    The additional error introduced by using an integer state variable instead of a floating point state variable is negligible (0.250 vs 0.256).
    Last edited by PieterP; 07-21-2020 at 01:54 PM.

  19. #19
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    Do ignore my post #7: when I wrote that, I was obviously confusing a sliding window filter with the exponential moving average. (I'd edit it to correct it, but the grace period is over.)

    Quote Originally Posted by PieterP View Post
    When using integers, rounding the average before subtracting it from the state yields much better results than simply using a flooring division
    I generated some signals and did an FFT from them, but couldn't really see any difference; the mean squared error to the "exact" (floating-point) filtered value is smaller, but since even the truncating version is within ±1, I am not certain if it is meaningful in the real world. (And I mean literally not sure: I do not know.)

    Would I use rounding myself? Sure; but, if I was running out of CPU oomph, I'd use truncation instead.

    There was one thing in your derivation that kept me scratching my head. The recurrence relation is
            si = si-1 + weight·(xi - si-1)
    for 0 < weight < 1. In fixed point format with Q fractional bits in the internal state S, that becomes
            Si = Si-1 + xi - round(Si-1 / 2Q)
            si = round(Si / 2Q)
    It took me a while (until now, that is) to verify that it is equivalent to your
            Si = Zi-1 + xi
            si = round(Zi / 2Q)
            Zi = Si - si
    for i ≥ 1 iff S0 = 0, because your internal state Zi is a fractional subtraction "ahead" of the fixed-point filtered value Si. Clever!
    If one initializes S0 = x0·2Q, and Z0 = x0·2Q - x0, then the filters are equivalent.

    Quote Originally Posted by PieterP View Post
    I'm sure you could use more fractional bits in the fixed point representation to get a more accurate result, but that requires more shifting (which is slower, especially on AVR) and larger integer types. Besides, the second implementation is accurate enough for 99% of smoothing applications, occasionally being one bit off on a 10-bit ADC value is usually not an issue.
    True.

    You can do unsigned integer arithmetic, too, if you offset the sample by the magnitude of the largest negative sample:
    Code:
    #include <stdint.h>
    
    const uint_fast8_t  filter16_shift = 5;
    uint_fast16_t       filter16_state = 32768 - (32768 >> filter16_shift); /* == zero */
    /* Initial state for x is (32768 - x + (x<<filter16_shift) - (32768>>filter16_shift)) */
    
    int_fast16_t  filter16(int_fast16_t  sample)
    {
        const uint_fast16_t  value = sample + (32768 >> filter16_shift);
        filter16_state += value;
        uint_fast16_t  result = (filter16_state + (1 << (filter16_shift - 1))) >> filter16_shift;
        filter16_state -= result;
        return result - (32768 >> filter16_shift);
    }
    which on GCC-AVR 5.4.0 with -O2 compiles to
    Code:
    filter16(int):
            lds  r18, filter16_state
            lds  r19, filter16_state+1
            subi r19, -4
            add  r18, r24
            adc  r19, r25
            mov  r24, r18
            mov  r25, r19
            adiw r24, 16
            lsr  r25
            ror  r24
            swap r25
            swap r24
            andi r24, 0x0f
            eor  r24, r25
            andi r25, 0x0f
            eor  r24, r25
            sub  r18, r24
            sbc  r19, r25
            sts  filter16_state+1, r19
            sts  filter16_state, r18
            subi r25, 4
            ret
    
    filter16_state:
            .word   31744
    and something ridiculously fast on ARMs..

    This produces the exact same results (including correct rounding), and is suitable for filtering samples between -1024 and 1023, inclusive, without overflowing.
    (In general, sequences of samples between -(32768>>filter16_shift) and ((32768>>filter16_shift)-1), inclusive, cannot overflow.)
    The smaller shifts (4, 3, 2, or 1) yield even shorter/faster code (but not by much).

    This could be turned into a C++ template that takes the minimum sample value as a parameter, and uses that as the offset, replacing (32768>>filter16_shift), if someone has asymmetric sample ranges, but wants to do filtering.

  20. #20
    Quote Originally Posted by PieterP View Post
    Indeed, @xenington where did you get that code from? The pages you linked to seem to list the correct code (https://tttapa.github.io/Pages/Mathe...mentation-in-c).
    Hi Pieter

    I tried to insert the signed division implementation into the C++ example (for unsigned division):

    Code:
    Implementation of Signed Division by a Multiple of Two
    
    constexpr unsigned int K = 3;
    
    signed int div_s1(signed int val) {
        int round = val + (1 << (K - 1));
        if (val < 0)
            round -= 1;
        return round >> K;
    }
    Code:
    Implementation in C++
    
    We now have everything in place to write an implementation of the EMA in C++:
    EMA.cpp
    
    #include <stdint.h>
    
    template <uint8_t K, class uint_t = uint16_t>
    class EMA {
      public:
        uint_t operator()(uint_t x) {
            z += x;
            uint_t y = (z + (1 << (K - 1))) >> K;
            z -= y;
            return y;
        }
    
        static_assert(
            uint_t(0) < uint_t(-1),  // Check that `uint_t` is an unsigned type
            "Error: the uint_t type should be an unsigned integer, otherwise, "
            "the division using bit shifts is invalid.");
    
      private:
        uint_t z = 0;
    };
    It didn't work because I didn't really understand what I was doing. The result was a simple division and not an average.

  21. #21
    Quote Originally Posted by Nominal Animal View Post
    I generated some signals and did an FFT from them, but couldn't really see any difference; the mean squared error to the "exact" (floating-point) filtered value is smaller, but since even the truncating version is within ±1, I am not certain if it is meaningful in the real world. (And I mean literally not sure: I do not know.)

    Would I use rounding myself? Sure; but, if I was running out of CPU oomph, I'd use truncation instead.
    For truly random inputs, it probably doesn't matter, but the problem with truncation is that it makes it impossible to reach zero again (assuming non-negative inputs).

    Using a rectangular input from 0 to 16 and back to 0, for instance:
    Click image for larger version. 

Name:	Figure_2.png 
Views:	19 
Size:	36.7 KB 
ID:	21086

    Quote Originally Posted by Nominal Animal View Post
    You can do unsigned integer arithmetic, too, if you offset the sample by the magnitude of the largest negative sample:
    Nice, that's indeed a better solution than the conditional correction for negative values.

    I think you can even optimize it further by only subtracting/adding the offset only once:
    Code:
    const uint_fast8_t  filter16_shift   = 5;
    uint_fast16_t       filter16_state_2 = 32768; /* == zero */
    
    int_fast16_t  filter16_2(int_fast16_t  sample)
    {
        const uint_fast16_t  value = sample;
        filter16_state_2 += value;
        uint_fast16_t  result = (filter16_state_2 + (1 << (filter16_shift - 1))) >> filter16_shift;
        result -= 32768 >> filter16_shift;
        filter16_state_2 -= result;
        return result;
    }
    It saves a single subtraction: https://godbolt.org/z/14457Y

    Or as a C++ class that handles both signed and unsigned integers: https://godbolt.org/z/nj54Gs

    Or as an Arduino sketch for OP:
    Code:
    #include <type_traits> // std::make_unsigned_t, is_unsigned
    #include <limits>      // std::numeric_limits
    #include <cstdint>     // uint_fast16_t
    
    template <uint8_t K,
              class input_t = uint_fast16_t,
              class state_t = std::make_unsigned_t<input_t>>
    class EMA {
      public:
        EMA(input_t initial = input_t(0))
          : state(zero + (initial << K) - initial) {}
    
        input_t operator()(input_t input) {
          state         += state_t(input);
          state_t output = (state + half) >> K;
          output        -= zero >> K;
          state         -= output;
          return input_t(output);
        }
    
      constexpr static state_t 
        max_state = std::numeric_limits<state_t>::max(),
        zero      = std::is_unsigned<input_t>::value ? 0 : max_state / 2 + 1,
        half      = state_t(1) << (K - 1);
        
      static_assert(std::is_unsigned<state_t>::value, 
                    "State type should be unsigned");
    
      private:
        state_t state;
    };
    
    // -----
    
    EMA<5, int_fast16_t> filter;
    
    int_fast16_t sample() {
      return 2 * analogRead(A1) - 1024;
    }
    
    void setup() {
      Serial.begin(115200);
      while (!Serial);
    }
    
    void loop() {
      const unsigned long period = 1000; // 1000 µs = 1 kHz
      static unsigned long timer = micros() - period;
      if (micros() - timer >= period) {
        Serial.println(filter(sample()));
        timer += period;
      }
    }

  22. #22
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    733
    Be careful when using 16 bits - your input sample range is limited. For example, -1024 to 1023 (??) for the code in #21. Some comments or checks would be a good idea.
    Last edited by jonr; 07-22-2020 at 07:54 PM.

  23. #23
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,914
    if data are 16 bit, then IMO the state variable (effectively sample<<K) should be declared as 32 bit

  24. #24
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    Note: It looks like the state has to be an exact-width type (so uintN_t and not uint_fastN_t), as any extra bits will mess up the modulo arithmetic.

    The 16-bit state is intended for input samples within an N-bit range and Q-bit shift, where N+Q ≤ 16. On 8-bit microcontrollers, this yields much, much faster code than using 32-bit state.

    On 32-bit architectures,
    Code:
    #include <stdint.h>
    
    static const uint_fast8_t  filter32_shift = 4;
    uint32_t                   filter32_state;
    
    void  filter32_init(const int32_t  value)
    {
        filter32_state = (1 << 31) - value + (value << filter32_shift);
    }
    
    int32_t  filter32(const int32_t  sample)
    {
        filter32_state += sample;
        uint_fast32_t  result = ((filter32_state + (1 << (filter32_shift - 1))) >> filter32_shift) - (1 << (31 - filter32_shift));
        filter32_state -= result;
        return (int32_t)result;
    }
    for example arm-gcc-7.3 -O2 produces
    Code:
    filter32_init:
            rsb     r0, r0, r0, lsl #4
            ldr     r3, .L3
            add     r0, r0, #-2147483648
            str     r0, [r3]
            bx      lr
    .L3:
            .word   filter32_state
    
    filter32:
            ldr     r2, .L6
            ldr     r3, [r2]
            add     r3, r0, r3
            add     r0, r3, #8
            lsr     r0, r0, #4
            add     r0, r0, #-134217728
            sub     r3, r3, r0
            str     r3, [r2]
            bx      lr
    .L6:
            .word   filter32_state
    and marking the functions static inline should mean the per-sample filtering overhead is minimal, just a handful of cycles.

    On the other hand, on avr-gcc-5.4.0, we get
    Code:
    filter32_init(long):
            push r12
            push r13
            push r14
            push r15
            push r16
            push r17
            mov r16,r22
            mov r17,r23
            mov r18,r24
            mov r19,r25
            lsl r16
            rol r17
            rol r18
            rol r19
            lsl r16
            rol r17
            rol r18
            rol r19
            lsl r16
            rol r17
            rol r18
            rol r19
            lsl r16
            rol r17
            rol r18
            rol r19
            mov r12,r16
            mov r13,r17
            mov r14,r18
            mov r15,r19
            sub r12,r22
            sbc r13,r23
            sbc r14,r24
            sbc r15,r25
            sts filter32_state,r12
            sts filter32_state+1,r13
            sts filter32_state+2,r14
            sts filter32_state+3,r15
            pop r17
            pop r16
            pop r15
            pop r14
            pop r13
            pop r12
            ret
    
    filter32(long):
            push r16
            push r17
            lds r16,filter32_state
            lds r17,filter32_state+1
            lds r18,filter32_state+2
            lds r19,filter32_state+3
            add r16,r22
            adc r17,r23
            adc r18,r24
            adc r19,r25
            mov r23,r19
            mov r22,r18
            mov r21,r17
            mov r20,r16
            subi r20,-8
            sbci r21,-1
            sbci r22,-1
            sbci r23,-1
            mov r25,r23
            mov r24,r22
            mov r23,r21
            mov r22,r20
            ldi r20,4
            1:
            lsr r25
            ror r24
            ror r23
            ror r22
            dec r20
            brne 1b
            sub r16,r22
            sbc r17,r23
            sbc r18,r24
            sbc r19,r25
            sts filter32_state,r16
            sts filter32_state+1,r17
            sts filter32_state+2,r18
            sts filter32_state+3,r19
            pop r17
            pop r16
            ret
    
    filter32_state:
            .zero   4
    which is not really something I'd like to use.

    The equivalent but limited to 16-filter16_shift -bit sample range, with minimum sample value -filter16_offset (you can set that to whatever 16-bit value you need before initializing the filter), is
    Code:
    #include <stdint.h>
    
    static const uint_fast8_t  filter16_shift = 4;
    static const int16_t       filter16_offset = 1 << (15 - filter16_shift);
    static const uint16_t      filter16_half = 1 << (filter16_shift - 1);
    uint16_t                   filter16_state;
    
    void  filter16_init(const int16_t  value)
    {
        filter16_state = ((value + filter16_offset) << filter16_shift) - value;
    }
    
    int16_t  filter16(const int16_t  sample)
    {
        filter16_state += sample;
        uint_fast16_t  result = ((filter16_state + filter16_half) >> filter16_shift) - filter16_offset;
        filter16_state -= result;
        return (int16_t)result;
    }
    which compiles on avr-gcc-5.4 -O2 to a much nicer
    Code:
    filter16_init(int):
            mov r18,r24
            mov r19,r25
            subi r19,-8
            swap r18
            swap r19
            andi r19,0xf0
            eor r19,r18
            andi r18,0xf0
            eor r19,r18
            mov r20,r18
            mov r21,r19
            sub r20,r24
            sbc r21,r25
            sts filter16_state+1,r21
            sts filter16_state,r20
            ret
    
    filter16(int):
            lds r18,filter16_state
            lds r19,filter16_state+1
            add r18,r24
            adc r19,r25
            mov r24,r18
            mov r25,r19
            adiw r24,8
            swap r25
            swap r24
            andi r24,0x0f
            eor r24,r25
            andi r25,0x0f
            eor r24,r25
            subi r25,8
            sub r18,r24
            sbc r19,r25
            sts filter16_state+1,r19
            sts filter16_state,r18
            ret
    
    filter16_state:
            .zero   2
    which is much, much preferable.

    I did verify the above produce the correct results, but since this is just idle exploration, brainfarts/typos may have slipped in.

  25. #25
    Senior Member
    Join Date
    Feb 2015
    Location
    Finland
    Posts
    224
    In C (as opposed to C++), one can use the following pattern. First, filter16.h:
    Code:
    #if !defined(FILTER_NAME)
    #error Define FILTER_NAME as the name of the filter.
    #elif !defined(FILTER_MIN_SAMPLE) || !defined(FILTER_MAX_SAMPLE) || (FILTER_MIN_SAMPLE) >= (FILTER_MAX_SAMPLE)
    #error Define FILTER_MIN_SAMPLE and FILTER_MAX_SAMPLE as the minimum and maximum samples to be filtered, inclusive.
    #elif !defined(FILTER_SHIFT)
    #error Define FILTER_SHIFT as the power of two the filter implements.
    #elif FILTER_SHIFT < 1 || FILTER_SHIFT > 14
    #error FILTER_SHIFT must be a small positive number!
    #elif (((FILTER_MAX_SAMPLE - (FILTER_MIN_SAMPLE)) << FILTER_SHIFT) > 65535)
    #error Sample range and filter shift exceed 16 bits.
    #elif FILTER_MIN_SAMPLE < -32768
    #error Minimum sample value is too small.
    #elif (FILTER_MAX_SAMPLE - (FILTER_MIN_SAMPLE)) > 65535
    #error Sample range does not fit in 16 bits.
    #else
    
    #define MERGE2_(a, b)  a ## b
    #define MERGE2(a, b)   MERGE2_(a, b)
    
    #define FILTER_STATE   MERGE2(FILTER_NAME, _state)
    #define FILTER_INIT    MERGE2(FILTER_NAME, _init)
    
    #if FILTER_MIN_SAMPLE < 0
    #define FILTER_SAMPLE_TYPE  int16_t
    #if FILTER_MAX_SAMPLE > 32767
    #error Your sample values do not fit in 16 bits.
    #endif
    #else
    #define FILTER_SAMPLE_TYPE  uint16_t
    #if FILTER_MAX_SAMPLE > 65535
    #error Your sample values do not fit in 16 bits.
    #endif
    #endif
    
    static uint16_t  FILTER_STATE;
    
    static inline void FILTER_INIT (FILTER_SAMPLE_TYPE value)
    {
        FILTER_STATE = ((value - (FILTER_MIN_SAMPLE)) << FILTER_SHIFT) - value;
    }
    
    static inline FILTER_SAMPLE_TYPE FILTER_NAME (FILTER_SAMPLE_TYPE value)
    {
        FILTER_STATE += value;
        uint16_t  result = ((FILTER_STATE + (1 << (FILTER_SHIFT - 1))) >> FILTER_SHIFT) + (FILTER_MIN_SAMPLE);
        FILTER_STATE -= result;
        return (FILTER_SAMPLE_TYPE)result;
    }
    
    #undef   FILTER_SAMPLE_TYPE
    #undef   FILTER_STATE
    #undef   FILTER_INIT
    #undef   MERGE2_
    #undef   MERGE2
    #endif
    #undef   FILTER_MIN_SAMPLE
    #undef   FILTER_MAX_SAMPLE
    #undef   FILTER_SHIFT
    #undef   FILTER_NAME
    Then, if one wants a 16-bit filter with shift 5 that can handle samples from 0 to 2047, inclusive, and another with shift 4 that can handle samples from -15 to 4402, inclusive:
    Code:
    #include <stdint.h>
    
    #define  FILTER_NAME        filter1
    #define  FILTER_MIN_SAMPLE  0
    #define  FILTER_MAX_SAMPLE  2047
    #define  FILTER_SHIFT       5
    #include "filter16.h"
    
    #define  FILTER_NAME        filter2
    #define  FILTER_MIN_SAMPLE  -15
    #define  FILTER_MAX_SAMPLE  4402
    #define  FILTER_SHIFT       3
    #include "filter16.h"
    The first #include causes the state variable filter1_state and functions filter1_init() and filter1() to be defined, and the second filter2_state, filter2_init(), and filter2(). They are local to the current compilation unit (because the functions are marked static inline), but that way the compiler should have the best opportunities for optimization.

    This may look like gobbledygook, but is actually just standard C preprocessor shenanigans. I'm not sure why anyone would use this instead of C++ templates, though, except that I do like how I can do quite careful compile-time checks: this will error out if the specified sample range or shift is problematic. You could easily add bounds checking/limiting, for example, if say FILTER_BOUNDS is defined when the file is included.

    (If it matters, I consider this so obvious it is not copyrightable, but if anyone disagrees, then it is licensed under CC0-1.0, i.e. in Public Domain.)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •