You're probably going to have to replace these floating point numbers with fixed point.

Code:

```
float mu = 0.0001;
float mu0 = 0;
float psi = 0.0001;
float w[numTaps];
float *pw; // pointer to w
float yhat = 0;
float xtdl = 0;
```

Before beginning this journey, you really should try to verify the math is connect with the floats. After converting to fixed point, it's much harder to change the math. In fact, some sorts of algorithm changes are probably best done by going back to floats, and then redoing the float to fixed conversion after you're sure the algorithm is correct.

The first step is determining the numerical ranges your floats actually use. The w[] array is probably the most important. The 2 ways involve carefully analyzing the equations, or just running a large amount of real data and collecting min/max with a little extra code. In extreme cases, you might log all the numbers and plot a histogram, to learn whether the numbers are uniformly distributed or if there's some unusual usage, like some cases really large numbers, others very small.

After you know the range, you choose an integer that will represent the number. For example, if you learn those w[] coefficients range from -1.7 to +1.7 with fairly uniform distribution, perhaps you'd choose a 3.13 format. The 3 top bits of an int16_t would represent the numbers +1 to -2. The bottom 13 bit represent the fraction. So the number 1.4 would be 001 in the top 3 bits, and 0110011001101 (integer 3277) in the lower 13 bits. To understand the fractional part, consider 13 bits is 8192 integers. Divide 3277 by 8192 and you get 0.4000244. Or if you divide the whole 16 bit number (11469) by 8192, you get 1.4000244. Of course, you can choose scaling by any arbitrary integer, but powers of 2 are commonly used so you know certain bits are the whole number part and the other bits are the fraction part.

Generally you'd try to use 16 bit numbers for the coefficients, if the round-off errors are tolerable. Usually you'd use int32_t or even int64_t for calculated values. For example:

Code:

` yhat = yhat+x[j+h]*pw[gg-j];`

For this code, you'd almost certainly use int64_t for yhat. The multiply will produce an int32_t, and when you add 128 of them together the result can easily become larger than 32 bits. Fortunately, Teensy has fast instructions for accumulating to 64 bit integers.

Then after the loop, you'd add code to convert the 64 bit result to something more useful. Often the conversion is as simple as right shifting to throw away the lower bits.

Usually when doing this sort of conversion, it helps to keep the float code and do both calculations. That's even slower, but the idea is you can do them both and add a little code to compare if they ended up with the same answer (plus or minus a tiny extra error due to round-off issues). Then when it's all working nice, delete or comment out the float code.

After the integer stuff is working, there's a second round of possible optimizations you can do with the Cortex-M4 DSP extensions. But that's quite complex, so usually your first goal should be converting from float to all int in the simplest way you can.

Hopefully this help?

Do you believe this echo cancel might be worth contributing to the audio library? Is the code open source?