IFFT and the Audio Library

Ray.E

Member
Hi All,
I've been attempting to add a basic IFFT stage to the fft1024 library function to see how it sounds, but it it seems harder than I thought it would be. Tried a few different approaches, but the general idea is to try and do it in-place with the original fft1024 buffer result but adding the different arm instance, initialisation and function call etc.
I'm just not getting sensible results though, including always getting real and complex components back, which can't be right. The software also seems ansty about having two arm_cfft data structures declared in the .h file at the same time (different pointer names though).
Maybe the whole in-place approach is misguided? It would be a lot efficient for many purposes though.

I'm not advanced in C or DSP programming, so rather than probably embarrassing myself with any details of my failures, at this stage can I ask if anyone has had a look at doing IFFT based on the Audio Library?

Any examples of this or a similar thing being done would be a great help.

If not, I can provide more details on what I have done so far.

Thank you.
 
You need to realize that the FFT -> IFFT sequence is based on strict complex to complex data types and that the audio library FFT functions compute the magnitude only of the FFT . In order to use an IFFT you need to provide both the real and imaginary components of the FFT.

There's another problem - both the 256 and the 1024 FFts in the audio library overlap the input blocks, which is not suitable for playback of a continual output stream.

I'm currently writing an extensive FFT library, so if you have a need for FFT processing let me know and I may be able to help with what I have already...
 
Last edited:
Thank you very much Derek. I have been working with the real and complex components of the FFT ouput (making mods to the fft1024.cpp code), but I hadn't thought of the Audiostream input overwriting the in-place buffer while I was trying to do the IFFT (and vice versa presumably).
My needs are based around having the capability of access to streamed raw fft data in real time, and using this to achieve extreme time-stretching (up to and including clean audio "freeze" of a "grain"), blending bins between FFT frames (e.g. blurred transitions), pitch manipulation etc, with (near) real-time audio output via IFFT. The ultimate aim is for realtime music performance control. Sounds simple, what could go wrong?
I can do audio input and output via the AudioLibrary input and output queue objects, so access to reasonably raw FFT and IFFT functions would be the perfect magic keys. Hopefully the new teensy's will help in keeping up with the speeds needed..

Access to any of your work in this area would be very much appreciated. Thanks again.
 
Here's another problem that could/should be a deal breaker in the use of the arm_math FFT functions (as in the audio library) for signal processing: they suffer from a VERY significant loss of output precision that is a function of the length.

For example the arm_cfft_radix4_q15() function used in the audio objects takes a q1.15 input, and produces a q9.7 output precision (only 7 bits of precision) for a 256 length FFT, and only q11.5 output precision (5 bits) for the length 1024 length! There's nothing you can do about it - it's described in the documentation on keil.com.

Here's a simple sketch to demonstrate this:
Code:
//----------------------------------------------------------------------
// Simple sketch to demonstrate the loss of precision in the
// arm_math FFT functions,
// See the documentation at keil.com
//----------------------------------------------------------------------
#define FFTlength 1024             // Use 256 or 1024 (must be radix-4)
#include "arm_math.h"
int16_t buffer[2*FFTlength];
arm_cfft_radix4_instance_q15 forwardFFT;
arm_cfft_radix4_instance_q15 inverseFFT;
//----------------------------------------------------------------------
void setup() {
  Serial.begin(9600);
  delay(2000);
  arm_cfft_radix4_init_q15(&forwardFFT, FFTlength, 0, 1);     // Initialize forward FFT
  arm_cfft_radix4_init_q15(&inverseFFT, FFTlength, 1, 1);     // Initialize inverse FFT
  for (int i=0; i<2*FFTlength; i+=2){
    buffer[i]   = int16_t(sin(2*3.14159*i/32)*32767);   // Real data in q1.15 format
    buffer[i+1] = 0;                                    // Imaginary data = 0
  }
// Take forward and inverse FFTs and output the data  
  arm_cfft_radix4_q15(&forwardFFT, buffer);             // Forward FFT
  arm_cfft_radix4_q15(&inverseFFT, buffer);             // inverse FFT
// Print real and imaginary outputs - shows loss of 8 bits of precision
  for (int i=0; i<FFTlength; i+=2){
    Serial.print(buffer[i]), Serial.print("\t"); Serial.println(buffer[i+1]); 
  }
} 
//-------------------------------------------------------------------------------------
void loop() {}
//-------------------------------------------------------------------------------------
If you run this you will find that the sinusoidal output lies within -128 <= output < 128 if FFTlength = 256, and -32 <= output < 32 for FFTlength = 1024, while the input lies within -32768 <= input <32768. Hardly acceptable!
FWIW - I have tried the radix-2 functions and they are even worse...
 
Thank you very much for the code. It was such a relief to see the core elements done so efficiently and clearly! I've been getting so bogged down in the Audiostream and Library infrastructure that I was losing sight of the woods etc.

Yes I see what you mean about loss of resolution. I had seen the documentation on the 11.5 format but I wasn't sure, and besides when I looked at the actual data after an FFT during testing, it didn't look at all like 11.5 format, so I had some hope from that as well.

As I understand it, to get back to the 1.15 format for input to the IFFT, you would need to shift 10 bits left, but from your code I can clearly see the output is actually already converted to 1.15 by the arm_cfft function. And as your code demonstrates, the shift is not required before the IFFT function. I think maybe the docomentation on this is not very clearly written.

Anyway, 6 bits depth for audio is not good, although I'll still give it go see what it sounds like. Maybe I can make a feature of the problem somehow. Rock and Roll!


Key Question: Do you think there is a way around this using Teensy with or without the Audio board? Maybe some sort of 32 bit/floating point approach? Or using some alternative to the arm_math functions?



Just a quick question on your code - the keyword 'buffer' is used, presumably defined somewhere in Library.
Can I have more than one of these separately in sketch or can there only be one arm_cfft 'buffer' per sketch?
Is there the danger that if say the 1024fft function is used in the sketch with an input data stream, it could overwrite this same 'buffer'?

Thank you again for your help. It's very much appreciated.
 
Last edited:
Use the "Serial Plotter" in Arduino IDE for this sketch which will plot the orignal and ifft output waveforms together. It adds two waveforms together takes the FFT->iFFT and rescales the output. Play with the amplitude to see what the output precision does to the output waveform.
Code:
#define FFTlength 1024
#include "arm_math.h"


// Real and Imaginary buffer structure
typedef struct {
  int16_t re;
  int16_t im;
} buffer_t;




buffer_t  orignalBuffer[FFTlength]  __attribute__ ((aligned (4)));
buffer_t  fftInputBuffer[FFTlength]  __attribute__ ((aligned (4)));
buffer_t  fftBuffer300Hz[FFTlength]  __attribute__ ((aligned (4)));
buffer_t  fftBuffer900Hz[FFTlength]  __attribute__ ((aligned (4)));


arm_cfft_radix4_instance_q15 forwardFFT;
arm_cfft_radix4_instance_q15 inverseFFT;


/*
 * Mess with these values.
 */
#define AMPLITUDE     14000
#define FREQUENCY_300 300
#define FREQUENCY_900 900
#define SAMPLE_RATE   44100
//----------------------------------------------------------------------
void setup() {
  while (!Serial);
  delay(100);
  arm_cfft_radix4_init_q15(&forwardFFT, FFTlength, 0, 1);
  arm_cfft_radix4_init_q15(&inverseFFT, FFTlength, 1, 1);
  
  // buffer, frequency, amplitude, sample rate
  createWaveform(fftBuffer300Hz, FREQUENCY_300, AMPLITUDE, SAMPLE_RATE);
  createWaveform(fftBuffer900Hz, FREQUENCY_900, AMPLITUDE, SAMPLE_RATE);


  addWaveforms(fftBuffer300Hz, fftBuffer900Hz, fftInputBuffer);
  copyWaveform(fftInputBuffer, orignalBuffer);
  
  arm_cfft_radix4_q15(&forwardFFT, (int16_t*)fftInputBuffer);
  arm_cfft_radix4_q15(&inverseFFT, (int16_t*)fftInputBuffer);


  for (int i = 0; i < FFTlength; i += 1) {
    Serial.print(orignalBuffer[i].re);
    Serial.print(" ");
    Serial.println(fftInputBuffer[i].re << 10);
    delay(10);
  }
}
//-------------------------------------------------------------------------------------
void loop() {}
//-------------------------------------------------------------------------------------
void createWaveform(buffer_t* buffer, float freq, uint16_t amplitude, uint32_t sampleRate) {
  for (int i = 0; i < FFTlength; i += 1) {
    buffer[i].re = int16_t(sin((float)freq * 2 * (float)3.14159 * i / sampleRate) * amplitude); // Real data in q1.15 format
    buffer[i].im = 0;
  }
}
//-------------------------------------------------------------------------------------
void addWaveforms(buffer_t* src1, buffer_t* src2, buffer_t* dst) {
  for (int i = 0; i < FFTlength; i += 1) {
    dst[i].re = src1[i].re + src2[i].re;
    dst[i].im = src1[i].im + src2[i].im;
  }
}
//-------------------------------------------------------------------------------------
void copyWaveform(buffer_t* src, buffer_t* dst) {
  for (int i = 0; i < FFTlength; i += 1) {
    dst[i].re = src[i].re;
    dst[i].im = src[i].im;
  }
}

 
I may have been a bit misleading in my original comments on the loss of resolution in the arm_math FFT functions. Let me clarify. What we want is for a data sequence {f_n}, that
IFFT(FFT({f_n})) = {f_n}
in other words that the original data is returned. With the arm_math this does not happen, and the returned sequence after the IFFT is scaled in value and suffers from the lack of resolution. So the question is: what's going on, and what can be done about it?

The definition of the DFT/FFT transforms are
DFT: F_m = sum_n(f_n*exp(-j*2pi*m*n/N)) , IFFT: f_n = (1/N) * sum_m(F_m*exp(j*2pi*m*n/N))
for a sequence of length N, and where j = sqrt(-1).

There is generally not a problem with floating point evaluation of either the FFT or the IFFT, but fixed point computation is a entirely different story! Let's take a very simple example - compute the FFT of a constant (DC) sequence of length N with value f_dc. If you compute F_0 from the formula you will find F_0 = N*f_dc. For example if N=256 and f_dc = 10000, then F_0 = 2,560,000. There is no way that int16 arithmetic can handle this internally - or return an int16 result!

This is the problem that the arm_math FFT functions face, and why arm_cfft_radix4_q15() with N=256, finds it necessary to internally scale the output from q1.15 down to q9.7. You still get 16 bits of precision in the answer, but the lost 8 bits can be very significant in your subsequent processing, and you need to keep track of the "q" value as you go along.

The arm_cfft_radix4_q15() implementation of the IFFT is an example of where things can get messed up, and where the loss of precision takes place. in our above example with F_0 = 2,560,000, the forward arm_cfft_radix4_q15() will return F_0 = 10000, and when that is input to the IFFT the value returned is in error.

I have written many FFT routines over the years, and have found that with int16 inputs you need to do some of the internal computations in int64 to prevent overflow. I'm currently working on a pair of Teensy functions that will take int16 inputs and do the internal arithmetic in int32 and int64. Right now I am looking to see if I can speed them up using Paul's dsp library. I'll post it if it looks good. The speed is currently comparable with arm_math q31 FFT.
 
Last edited:
This is very interesting .... thankyou for taking the time to look at this area of teensy programming .... is this primarily a function of fixed point maths? the teensy 3.6 is floating point I think .... will that make a difference? or is a function of 'word length'. I know in the audio world a lot of processing is done with 64 bit 'words'!! or greater!
 
Is this new "better" fft library an Audio object? Right now 1024 fft object takes about 50% of the cpu cycles.
 
This sounds very promising for me Derek. I tried the 6-bit depth sound and it it just doesn't cut it for the quality of sound I want. Maybe the new Teensy 3.6 FPU can also help with speed? Single precision only I gather though.

Please keep me in the loop with any code you make available.
 
Maybe the new Teensy 3.6 FPU can also help with speed? Single precision only I gather though.

I just did a bunch of benchmarking of FFT routines on a variety of different platforms, including Arduino and Teensy 3.2. I also did the same tests on a K66 board, which is the same chip as is on the Teensy 3.6. I did the tests for both integer (int16 and int32) and floating point data (float32) types.

https://openaudio.blogspot.com/2016/09/benchmarking-fft-speed.html

Given that IFFT and FFT are the same computational load, I think that you'll find that my results answer your question...the K66 (ie, Teensy 3.6) should totally rock your FFT/IFFT tasks.

Chip
 
Hi all.

So, this and a similar thread sort of ended late last year. Wondering if there was any conclusion about the ability of T3.6 (K66) to do real time audio processing in the frequency domain. And, could it be implemented in such a way that (at most) users would only have to create standard Audio library objects to take input data from an FFT object, manipulate it, and then output it to an IFFT object during their update cycle?

Don’t really have an application at this time. I just find the math interesting, especially reading about the implications of integer verses float verses (theoretical) infinite precision math written by folks with extensive experience implementing these algorithms.
 
I'd love to do pitch shifting, and if this could work with the 12ms-or-so delay of the current FFT1024 audio object, that would be great.
A pair of library objects would be great.

I've found that the time taken by a current FFT1024 cycle is 1/2 of the time take to collect 1024 samples (at 44,100Hz, this takes 23ms, and Paul reuses the last half of one sample set to be the first half of the next to get an FFT every 11.5ms). The FFT calculations themselves seem to me miniscule.

So, I wonder how much the FFT-->iFFT be delayed, if the math calcs were insignificant?
Just the 11-12 ms? The 1st sample in could be played out roughly 12 ms after it arrives?
 
So, I wonder how much the FFT-->iFFT be delayed, if the math calcs were insignificant?
Just the 11-12 ms? The 1st sample in could be played out roughly 12 ms after it arrives?

This is the best description I've seen of the algorithm.

http://www.guitarpitchshifter.com/pitchshifting.html

Using four 75% overlapped 1024 point FFTs and IFFTs would seem to suggest a latency somewhere between 1024 to 2048 samples, or about 2X to 4X your desired amount. Or maybe I'm taking an overly pessimistic view of this? Maybe the average delay is 512 in the first FFT, and the other 3 could be 768 and 1024 and 1280 samples? I honestly don't know exactly how this algorithm performs, but I'm pretty sure it's not simple "linear phase" where all signals propagate through with uniform delay for all frequency components.
 
Last edited:
This is the best description I've seen of the algorithm.

http://www.guitarpitchshifter.com/pitchshifting.html

Using four 75% overlapped 1024 point FFTs and IFFTs would seem to suggest a latency somewhere between 1024 to 2048 samples, or about 2X to 4X your desired amount. Or maybe I'm taking an overly pessimistic view of this? Maybe the average delay is 512 in the first FFT, and the other 3 could be 768 and 1024 and 1280 samples? I honestly don't know exactly how this algorithm performs, but I'm pretty sure it's not simple "linear phase" where all signals propagate through with uniform delay for all frequency components.

Yes, the phase shift is linear (180°) for all frequencies.
 
Use the "Serial Plotter" in Arduino IDE for this sketch which will plot the orignal and ifft output waveforms together. It adds two waveforms together takes the FFT->iFFT and rescales the output. Play with the amplitude to see what the output precision does to the output waveform.
Code:
#define FFTlength 1024
#include "arm_math.h"


// Real and Imaginary buffer structure
typedef struct {
  int16_t re;
  int16_t im;
} buffer_t;




buffer_t  orignalBuffer[FFTlength]  __attribute__ ((aligned (4)));
buffer_t  fftInputBuffer[FFTlength]  __attribute__ ((aligned (4)));
buffer_t  fftBuffer300Hz[FFTlength]  __attribute__ ((aligned (4)));
buffer_t  fftBuffer900Hz[FFTlength]  __attribute__ ((aligned (4)));


arm_cfft_radix4_instance_q15 forwardFFT;
arm_cfft_radix4_instance_q15 inverseFFT;


/*
 * Mess with these values.
 */
#define AMPLITUDE     14000
#define FREQUENCY_300 300
#define FREQUENCY_900 900
#define SAMPLE_RATE   44100
//----------------------------------------------------------------------
void setup() {
  while (!Serial);
  delay(100);
  arm_cfft_radix4_init_q15(&forwardFFT, FFTlength, 0, 1);
  arm_cfft_radix4_init_q15(&inverseFFT, FFTlength, 1, 1);
  
  // buffer, frequency, amplitude, sample rate
  createWaveform(fftBuffer300Hz, FREQUENCY_300, AMPLITUDE, SAMPLE_RATE);
  createWaveform(fftBuffer900Hz, FREQUENCY_900, AMPLITUDE, SAMPLE_RATE);


  addWaveforms(fftBuffer300Hz, fftBuffer900Hz, fftInputBuffer);
  copyWaveform(fftInputBuffer, orignalBuffer);
  
  arm_cfft_radix4_q15(&forwardFFT, (int16_t*)fftInputBuffer);
  arm_cfft_radix4_q15(&inverseFFT, (int16_t*)fftInputBuffer);


  for (int i = 0; i < FFTlength; i += 1) {
    Serial.print(orignalBuffer[i].re);
    Serial.print(" ");
    Serial.println(fftInputBuffer[i].re << 10);
    delay(10);
  }
}
//-------------------------------------------------------------------------------------
void loop() {}
//-------------------------------------------------------------------------------------
void createWaveform(buffer_t* buffer, float freq, uint16_t amplitude, uint32_t sampleRate) {
  for (int i = 0; i < FFTlength; i += 1) {
    buffer[i].re = int16_t(sin((float)freq * 2 * (float)3.14159 * i / sampleRate) * amplitude); // Real data in q1.15 format
    buffer[i].im = 0;
  }
}
//-------------------------------------------------------------------------------------
void addWaveforms(buffer_t* src1, buffer_t* src2, buffer_t* dst) {
  for (int i = 0; i < FFTlength; i += 1) {
    dst[i].re = src1[i].re + src2[i].re;
    dst[i].im = src1[i].im + src2[i].im;
  }
}
//-------------------------------------------------------------------------------------
void copyWaveform(buffer_t* src, buffer_t* dst) {
  for (int i = 0; i < FFTlength; i += 1) {
    dst[i].re = src[i].re;
    dst[i].im = src[i].im;
  }
}


I tested this code and working nice, I just wondering how to do with signal from analogRead and plot the result. I still not really understand all functions and not really know how to re write these functions. Could you give me some ideas?
Thx.
 
Back
Top