Audio Library and 4096 point FFT on Teensy 3.6

Status
Not open for further replies.
Heyo, so after my Teensy 3.2 randomly decided to fling off into outer space without leaving a notice as to why, I figured I'd give the 3.6 a shot.

Particularly intriguing was the speed and the support for CMSIS 4.5. I did what was mentioned in the "Excellent results with Floating Point FFT/IFFT Processing and Teensy 3.6" thread Here and Here

I however, can not get the 4096-point fft working using these new functions, I am probably doing everything wrong because I have no idea what I am doing, I'm trying though.

I simply went and modified analyze_fft1024.cpp and analyze_fft1024, and added #include "analyze_fft4096.h" to Audio.h

The code compiles and uploads without errors, but it just does nothing. Very enjoyable to troubleshoot.

Sketch:
Code:
#include <Audio.h>
#include <avr/power.h>

AudioInputAnalog         adc1(A2);
AudioAnalyzeFFT4096      fft1;
AudioConnection          patchCord2(adc1, fft1);


void setup() {
  AudioMemory(16);
  Serial.begin(9600);
  fft1.averageTogether(8);
}

void loop() { 
  if (fft1.available()) {
    for (uint16_t i = 0; i < 4096; i += 1) {
      int32_t n = fft1.read(i);
      Serial.print(n);
      Serial.print(" ");
    }
    Serial.println();
  }
}

analyze_fft4096.cpp:
Code:
/* Audio Library for Teensy 3.X
 * Copyright (c) 2014, Paul Stoffregen, paul@pjrc.com
 *
 * Development of this audio library was funded by PJRC.COM, LLC by sales of
 * Teensy and Audio Adaptor boards.  Please support PJRC's efforts to develop
 * open source software by purchasing Teensy or other PJRC products.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice, development funding notice, and this permission
 * notice shall be included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

#include "analyze_fft4096.h"
#include "sqrt_integer.h"
#include "utility/dspinst.h"


// 140312 - PAH - slightly faster copy
static void copy_to_fft_buffer(void *destination, const void *source)
{
	const uint32_t *src = (const uint32_t *)source;
	uint32_t *dst = (uint32_t *)destination;

	for (int i=0; i < AUDIO_BLOCK_SAMPLES; i++) {
		*dst++ = *src++;  // real sample plus a zero for imaginary
	}
}

static void apply_window_to_fft_buffer(void *buffer, const void *window)
{
	int32_t *buf = (int32_t *)buffer;
	const int32_t *win = (int32_t *)window;;

	for (int i=0; i < 4096; i++) {
		int32_t val = *buf * *win++;
		//*buf = signed_saturate_rshift(val, 16, 15);
		*buf = val >> 15;
		buf += 2;
	}

}

void AudioAnalyzeFFT4096::update(void)
{
	audio_block_t *block;

	block = receiveReadOnly();
	if (!block) return;

#if defined(KINETISK)
	switch (state) {
	case 0:
		blocklist[0] = block;
		state = 1;
		break;
	case 1:
		blocklist[1] = block;
		state = 2;
		break;
	case 2:
		blocklist[2] = block;
		state = 3;
		break;
	case 3:
		blocklist[3] = block;
		state = 4;
		break;
	case 4:
		blocklist[4] = block;
		state = 5;
		break;
	case 5:
		blocklist[5] = block;
		state = 6;
		break;
	case 6:
		blocklist[6] = block;
		state = 7;
		break;
	case 7:
		blocklist[7] = block;
		state = 8;
		break;
	case 8:
		blocklist[8] = block;
		state = 9;
		break;
	case 9:
		blocklist[9] = block;
		state = 10;
		break;
	case 10:
		blocklist[10] = block;
		state = 11;
		break;
	case 11:
		blocklist[11] = block;
		state = 12;
		break;
	case 12:
		blocklist[12] = block;
		state = 13;
		break;
	case 13:
		blocklist[13] = block;
		state = 14;
		break;
	case 14:
		blocklist[14] = block;
		state = 15;
		break;
	case 15:
		blocklist[15] = block;
		state = 16;
		break;
	case 16:
		blocklist[16] = block;
		state = 17;
		break;
	case 17:
		blocklist[17] = block;
		state = 18;
		break;
	case 18:
		blocklist[18] = block;
		state = 19;
		break;
	case 19:
		blocklist[19] = block;
		// TODO: perhaps distribute the work over multiple update() ??
		//       github pull requsts welcome......
		copy_to_fft_buffer(buffer+0x000, blocklist[0]->data);
		copy_to_fft_buffer(buffer+0x100, blocklist[1]->data);
		copy_to_fft_buffer(buffer+0x200, blocklist[2]->data);
		copy_to_fft_buffer(buffer+0x300, blocklist[3]->data);
		copy_to_fft_buffer(buffer+0x400, blocklist[4]->data);
		copy_to_fft_buffer(buffer+0x500, blocklist[5]->data);
		copy_to_fft_buffer(buffer+0x600, blocklist[6]->data);
		copy_to_fft_buffer(buffer+0x700, blocklist[7]->data);
		copy_to_fft_buffer(buffer+0x800, blocklist[8]->data);
		copy_to_fft_buffer(buffer+0x900, blocklist[9]->data);
		copy_to_fft_buffer(buffer+0x1000, blocklist[10]->data);
		copy_to_fft_buffer(buffer+0x1100, blocklist[11]->data);
		copy_to_fft_buffer(buffer+0x1100, blocklist[12]->data);
		copy_to_fft_buffer(buffer+0x1200, blocklist[12]->data);
		copy_to_fft_buffer(buffer+0x1300, blocklist[13]->data);
		copy_to_fft_buffer(buffer+0x1400, blocklist[14]->data);
		copy_to_fft_buffer(buffer+0x1500, blocklist[15]->data);
		copy_to_fft_buffer(buffer+0x1600, blocklist[16]->data);
		copy_to_fft_buffer(buffer+0x1700, blocklist[17]->data);
		copy_to_fft_buffer(buffer+0x1800, blocklist[18]->data);
		copy_to_fft_buffer(buffer+0x1900, blocklist[19]->data);
		if (window) apply_window_to_fft_buffer(buffer, window);
		arm_cfft_radix4_q31(&fft_inst, buffer);
		// TODO: support averaging multiple copies
		for (int i=0; i < 2048; i++) {
			uint32_t tmp = *((uint32_t *)buffer + i); // real & imag
			uint32_t magsq = multiply_16tx16t_add_16bx16b(tmp, tmp);
			output[i] = sqrt_uint32_approx(magsq);
		}
		outputflag = true;
		release(blocklist[0]);
		release(blocklist[1]);
		release(blocklist[2]);
		release(blocklist[3]);
		release(blocklist[4]);
		release(blocklist[5]);
		release(blocklist[6]);
		release(blocklist[7]);
		release(blocklist[8]);
		release(blocklist[9]);
		blocklist[0] = blocklist[10];
		blocklist[1] = blocklist[11];
		blocklist[2] = blocklist[12];
		blocklist[3] = blocklist[13];
		blocklist[4] = blocklist[14];
		blocklist[5] = blocklist[15];
		blocklist[6] = blocklist[16];
		blocklist[7] = blocklist[17];
		blocklist[8] = blocklist[18];
		blocklist[9] = blocklist[19];
		state = 10;
		break;
	}
#else
	release(block);
#endif
}
analyze_fft4096.h
Code:
/* Audio Library for Teensy 3.X
 * Copyright (c) 2014, Paul Stoffregen, paul@pjrc.com
 *
 * Development of this audio library was funded by PJRC.COM, LLC by sales of
 * Teensy and Audio Adaptor boards.  Please support PJRC's efforts to develop
 * open source software by purchasing Teensy or other PJRC products.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice, development funding notice, and this permission
 * notice shall be included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

#ifndef analyze_fft4096_h_
#define analyze_fft4096_h_

#include "Arduino.h"
#include "AudioStream.h"
#include "arm_math.h"

// windows.c
extern "C" {
extern const int32_t AudioWindowHanning4096[];
}

class AudioAnalyzeFFT4096 : public AudioStream
{
public:
	AudioAnalyzeFFT4096() : AudioStream(1, inputQueueArray),
	  window(AudioWindowHanning4096), state(0), outputflag(false) {
		arm_cfft_radix4_init_q31(&fft_inst, 4096, 0, 1);
	}
	bool available() {
		if (outputflag == true) {
			outputflag = false;
			return true;
		}
		return false;
	}
	float read(unsigned int binNumber) {
		if (binNumber > 2047) return 0.0;
		return (float)(output[binNumber]) * (1.0 / 16384.0);
	}
	float read(unsigned int binFirst, unsigned int binLast) {
		if (binFirst > binLast) {
			unsigned int tmp = binLast;
			binLast = binFirst;
			binFirst = tmp;
		}
		if (binFirst > 2047) return 0.0;
		if (binLast > 2047) binLast = 2047;
		uint32_t sum = 0;
		do {
			sum += output[binFirst++];
		} while (binFirst <= binLast);
		return (float)sum * (1.0 / 16384.0);
	}
	void averageTogether(uint8_t n) {
		// not implemented yet (may never be, 86 Hz output rate is ok)
	}
	void windowFunction(const int32_t *w) {
		window = w;
	}
	virtual void update(void);
	uint32_t output[2048] __attribute__ ((aligned (4)));
private:
	void init(void);
	const int32_t *window;
	audio_block_t *blocklist[8];
	int32_t buffer[8192] __attribute__ ((aligned (4)));
	//uint32_t sum[2048];
	//uint8_t count;
	uint8_t state;
	//uint8_t naverage;
	volatile bool outputflag;
	audio_block_t *inputQueueArray[1];
	arm_cfft_radix4_instance_q31 fft_inst;
};

#endif

I am aware there are other methods to implement this, but honestly, the audio library makes using this stuff so exceptionally convenient.
 
The current version CMSIS-DSP used with Teensy firmware does not support 4096 point FFT only 16, 64, 256, 1024.
 
Actually I don't think the current version of CMSIS-DSP complex FFT's support beyond 1024 point FFT, you need to use the REAL version.
 

From the link you provided under "arm_cfft_radix4_init_q31" which you use in your code:
The parameter fftLen Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.

You need to use the new structures that CMSIS provides, I believe the old ones are still useable for legacy code. The new CFFT has the 4096 point FFT but you will have figure out how to use them.
 
Oh jesus you are right, I didn't even pay attention to that, I just went in in there zombie mode changing all the stuff over.

Thanks for the hint
 
Yeah I've got that, forgot to mention that one in my earlier post, Thanks for paying attention though.
I was looking at that whole setup earlier, thought I will implement it, and then completely ignored that thought and went on to brainless editing.
Dropped what I was doing for today, no brain activity possible right now. Will investigate this whole thing at a later time when I am more focused.
 
Can you guys give the 256 point FFT a try in the newer library? How much code space does it consume?

The main reason we're still on version 1.1.0 is a massive increase in code size for even the smallest FFTs.
 
@Paul
Code:
#include <Audio.h>
#include <avr/power.h>

AudioInputAnalog         adc1(A2);
AudioAnalyzeFFT256       fft1;
AudioConnection          patchCord2(adc1, fft1);


void setup() {
  AudioMemory(16);
  Serial.begin(9600);
  fft1.averageTogether(8);
}

void loop() { 
  if (fft1.available()) {
    for (uint16_t i = 0; i < 4096; i += 1) {
      int32_t n = fft1.read(i);
      Serial.print(n);
      Serial.print(" ");
    }
    Serial.println();
  }
}


Sketch uses 48664 bytes (4%) of program storage space. Maximum is 1048576 bytes.
Global variables use 13228 bytes (5%) of dynamic memory, leaving 248916 bytes for local variables. Maximum is 262144 bytes.

This is with the above mentioned files exchanged
 
Using CMSIS-DSP v1.4.5:

FFT256 worked with less RAM and much less FLASH than v1.1.0 dsp, FFT example:
Sketch uses 26,792 bytes (10%) of program storage space. Maximum is 262,144 bytes.
Global variables use 9,524 bytes (14%) of dynamic memory, leaving 56,012 bytes for local variables. Maximum is 65,536 bytes.

Actually FFT1024 used less cpu by 10% on my T3.2 96 MHz.

Also just compiled all the examples in the Effects folder with no issues.
 
I second this, if I compile the same sketch as above with the old CMSIS is does this
Sketch uses 99072 bytes (9%) of program storage space. Maximum is 1048576 bytes.
Global variables use 13228 bytes (5%) of dynamic memory, leaving 248916 bytes for local variables. Maximum is 262144 bytes.
 
Ok, I've put a CMSIS DSP update on my todo list. Realistically, it's not going to happen until after USB host is stable and I've looked into a couple dozen problems that have been accumulating on my list while focusing on the USB host stuff.

Does 1.4.5 use Apache license? Or is a newer version needed for that?
 
Not sure what copyright this is: (from arm_math.h -> v1.4.5)
/* ----------------------------------------------------------------------
* Copyright (C) 2010-2015 ARM Limited. All rights reserved.
*
* $Date: 20. October 2015
* $Revision: V1.4.5 b
*
* Project: CMSIS DSP Library
* Title: arm_math.h
*
* Description: Public header file for CMSIS DSP Library
*
* Target Processor: Cortex-M7/Cortex-M4/Cortex-M3/Cortex-M0
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of ARM LIMITED nor the names of its contributors
* may be used to endorse or promote products derived from this
* software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
* -------------------------------------------------------------------- */
On a side note the newest CMSIS-DSP v1.5.1, I couldn't get it to work. It looks like they changed quite a lot so I would recommend sticking with v1.4.5 since it's almost drop in replacement though I must say I haven't looked all the dsp functions that the Audio library uses to see if there are conflicts but if I find any I'll update this thread.
 
What's the timeframe for updating CMSIS version? I don't have any particular need for newer versions, but I guess it would be nice to know what version's docs I should be reading, or if an update is forthcoming shortly, if I should just hold off until it is done.
 
duff,

What did you do to get the teensyduino environment to use the newer version of the CMSIS-DSP v1.4.5?

I'm very new to Arduino and one of the biggest challenges is figuring out which library is actually being used?
I seem to have quite a few versions of CMSIS-DSP on my PC.

Also, do you think a T3.6 will be have the hp to do 4096 lines of FFT with Paul's Audio Shield?

Thanks
 
duff,
What did you do to get the teensyduino environment to use the newer version of the CMSIS-DSP v1.4.5?
Thanks
There is a post on the forum on how you can update the CMSIS version for Arduino, I don't have it right now but it's their, maybe in the tips and tricks section?

duff,
I'm very new to Arduino and one of the biggest challenges is figuring out which library is actually being used?
I seem to have quite a few versions of CMSIS-DSP on my PC.
Thanks
The CMSIS files are all located internally in each copy of Arduino you have, those get used for each the version of Arduino/Teensyduino you have nothing externally.


duff,
Also, do you think a T3.6 will be have the hp to do 4096 lines of FFT with Paul's Audio Shield?
Thanks
I don't know but I assume the stock CMSIS version used in the Audio library would probably take up most of the CPU cycles on the T3.6. This would require a rewrite of the FFT Object also, not impossible though.
 
Hi!
I use a newer CMSIS lib for the Teensy Convolution SDR for the floating point versions of complex FFT.

Find a description how to install it here:
https://forum.pjrc.com/threads/4059...Defined-Radio)?p=129081&viewfull=1#post129081

I think, originally it was you, Duff, if I remember correctly, who pointed me to this possibility of using the new lib! Thanks again for that!

However, please note that you cannot use the floating point versions together with the audio lib (which uses integers). You have to first queue your audio data with the audio lib queue function and then process everything "by hand" in your code.
But the new CMSIS functions are very powerful, a 4096 point FFT should be possible with 44.1ksps processing. (but never use double, use floats and the corresponding functions, eg. cosf, sqrtf etc. :))
I do not know why you need a 4096 point FFT, but in most cases, this would be pure overkill, but if you have a good reason?

In my Teensy Convolution SDR I run the radio (in real-time) with a (floating-point) 1024-point FFT and subsequent inverse FFT1024 with all sorts of simultaneous audio processing, filtering, demodulation, noise reduction, decimation/interpolation and stuff in 96ksps and it works very well with about 60-70% processor load in most situations.

Good luck,
Frank

https://github.com/DD4WH/Teensy-ConvolutionSDR
 
Brian also has written an excellent module for doing convolution.

https://github.com/bmillier/Teensy-FFT-Convolution-Filter

Maybe that is better suited for you, as it has all the components prepared in audio lib style, you would just have to take the bits and put them together for your purpose of a 4096-point FFT.
He also takes the integer samples from the audio lib, converts them to float, does the FFT (in his case also a complex multiplication and the inverse FFT) and the converts back from float to int.

You would also need the new CMSIS lib for this module.
 
Frank,

Thanks for the response. You have definitely helped point me in the right direction.

I think I need to use the Zoom FFT feature that you utilize in your SDR project. I'm trying to get a 10Hz resolution within a 1k to 8kHz frequency window.
 
Hi gohlhausen,

thanks for your work!

Did you test, how your FFT performs if applied to audio? In my experience, when used for convolution (FFT --> complex-multiply with filter mask --> inverse FFT), the fixed point versions of the recent version of CMSIS do not produce acceptable results (distorted audio). However, if I use the floating point version from a never CMSIS version, the audio is perfectly OK.

For what purpose do you use such a big FFT? You use a large proportion of your resources for the calculation of the FFT alone. For audio filtering purposes, a 1024-point-FFT is big enough for most purposes. If you use the FFT for analysis and high frequency resolution, I would suggest to consider "ZoomFFT" as described by Rick Lyons in "Understanding Digital Signal Processing". It can deliver extremely high frequency resolution for a small frequency window of your choice and the resource use is much lower. I get frequency resolutions in the 0.1Hz-range using ZoomFFT and 96ksps sample rate with the T3.6 and audio board:

https://github.com/df8oe/UHSDR/wiki/Spectrum-display-Magnify-mode-=-Zoom-FFT

https://github.com/DD4WH/Teensy-ConvolutionSDR

For audio filtering/speaker simulations with large filter responses and low latency you could also try uniformly partitioned convolution, which uses relatively small FFT sizes for low latency and partitions the long filter responses into pieces.

But maybe you have a different application where you really need such a large FFT size, would be interesting to know what you are doing with your FFT and how the fixed point version performs in real-time with your application.
 
Status
Not open for further replies.
Back
Top