Excellent results with Floating Point FFT/IFFT Processing and Teensy 3.6

Status
Not open for further replies.
* 120 MHz: plays OK
* 180 MHz: plays OK
* 192 MHz: plays OK
* 216 MHz: the song is recognizable but the pitches warble hilariously. Funny! And, it stopped playing unexpectedly part way through.
* 240 MHz: plays OK

Ok, I see this "funny" pitch, too. (+ with 240MHz, in some cases with a "crash")
I guess there is a problem with the I2S Device at high speeds - which is more obvious with I2S Input.
 
Ok, I see this "funny" pitch, too. (+ with 240MHz, in some cases with a "crash")
I guess there is a problem with the I2S Device at high speeds - which is more obvious with I2S Input.

I don't care so much about the overclocking...I'd be quite happy with audio pass-through working correctly at 180 MHz.
 
It is not about overclocking... for me, it indicates, that there might be a problem with lower freqs, too. There is almost no security-margin for higher speeds, and i guess that is not intentional.

Let's hope that we find a way for a working I2S Input with more than 120MHz..
 
Very interesting results with floating point FFT & IFFT, Ray! I ordered a Teensy 3.5 yesterday (as 3.6 was sold out ;-)) and hope it will arrive soon.

I am not a DSP expert at all, but I wonder if the accuracy of the FFT/IFFT procedure could be better even now, if we used the recent/newest arm_math routines! The version in Teensyduino is very old (version 1.1.0 of 2012 !!!) and the arm_cfft_radix4 should not be used any more ("deprecated" according to the Keil homepage: https://www.keil.com/pack/doc/CMSIS...f_f_t.html#gaf336459f684f0b17bfae539ef1b1b78a). Instead the arm_cfft_f32 should be used, which is included in the newer versions of the Keil CMSIS pack.

It can make a huge difference to use the new arm_cfft_f32 in my experience: I have used the old radix 4 ARM floating point routine to estimate carrier frequency in the mcHF software defined radio (on an STM32F4) and it was not as accurate as I hoped for. However, after changing to the new arm_cfft_f32 routine (in the newer Keil pack), the accuracy increased dramatically! Do not ask me why, but that was my experience . . . ;-).

So, is it possible and do you think it would be worth to include the newest Keil CMSIS pack in the next version of Teensyduino, so we can use the new cfft?

@Ray: my sophisticated guess is that it will significantly increase the accuracy of your FFT/IFFT test and maybe eliminate more of your spurs.

All the best,

Frank DD4WH
 
Hi,
Just making sure.
This is all referring to the Audio Adaptor Board connected to the Teensy 3.6 (just got my2 boards :) )?
Thanks,
Nahum.
 
Very interesting results with floating point FFT & IFFT, Ray! I ordered a Teensy 3.5 yesterday (as 3.6 was sold out ;-)) and hope it will arrive soon.

I am not a DSP expert at all, but I wonder if the accuracy of the FFT/IFFT procedure could be better even now, if we used the recent/newest arm_math routines! The version in Teensyduino is very old (version 1.1.0 of 2012 !!!) and the arm_cfft_radix4 should not be used any more ("deprecated" according to the Keil homepage: https://www.keil.com/pack/doc/CMSIS...f_f_t.html#gaf336459f684f0b17bfae539ef1b1b78a). Instead the arm_cfft_f32 should be used, which is included in the newer versions of the Keil CMSIS pack.

It can make a huge difference to use the new arm_cfft_f32 in my experience: I have used the old radix 4 ARM floating point routine to estimate carrier frequency in the mcHF software defined radio (on an STM32F4) and it was not as accurate as I hoped for. However, after changing to the new arm_cfft_f32 routine (in the newer Keil pack), the accuracy increased dramatically! Do not ask me why, but that was my experience . . . ;-).

So, is it possible and do you think it would be worth to include the newest Keil CMSIS pack in the next version of Teensyduino, so we can use the new cfft?

@Ray: my sophisticated guess is that it will significantly increase the accuracy of your FFT/IFFT test and maybe eliminate more of your spurs.

All the best,

Frank DD4WH

Others had commented that using the latest CMSIS pack (whatever it was at the time) resulted in a huge explosion in the size of the program code, at least as implemented through the Teensy Audio library. It has something to do with all of the const vectors for all the FFT sizes getting included, even if one is only using a single FFT size. The older version of the CMSIS library didn't have this issue, so the older version was retained.

This reply of mine is basically hearsay, though, so a proper investigation (or a proper response by a knowledgeable person) would be a good next step.

Chip
 
Last edited:
For reference, for a 128 length Audio block passthrough via Library 'queue' objects without any FFT/IFFT processing, the latency measures as 9.25 ms (2.9 ms of that would be serialisation delay)

I just did my own first attempt at a latency measurement. For a simple audio passthrough from line_in to headphone_out, I measure 9.27 ms, which is good agreement with what Ray measured.

This is using the default audio library, which has a block size of 128 and a sample rate of 44.1 kHz. As a result, this 9.27 ms corresponds to a latency of about 409 samples, which is kind of an odd number. I might have expected a multiple of 128 samples...so 128, 256, 384, or 512 samples. Any thoughts as to why it might be 409?

Chip
 
Last edited:
FWIW, on my very long-term todo list is looking into whether we really need to buffer 2 packets in the output objects. This isn't a simple thing to test, since performance can depend on how much CPU time all the other objects are consuming.

If the output objects could buffer only 1 packet, we'd reduce the total latency by 128 samples.

Also on my long-term list is making more of the library able to scale to smaller buffer sizes. Much of the code already scales to any multiple of 4, 8 or 16, but some places are hard-coded for 128.
 
FWIW, on my very long-term todo list is looking into whether we really need to buffer 2 packets in the output objects. This isn't a simple thing to test, since performance can depend on how much CPU time all the other objects are consuming.

If the output objects could buffer only 1 packet, we'd reduce the total latency by 128 samples.

Also on my long-term list is making more of the library able to scale to smaller buffer sizes. Much of the code already scales to any multiple of 4, 8 or 16, but some places are hard-coded for 128.

As my work is aimed at hearing-aid style audio processing, minimizing latency is crucial. I am likely to investigate and work with these areas of the code. Lucky for me, my work can focus on a particular hardware configuration whereas you likely need to keep a broader spectrum of hardware in mind. Good for me, but a challenge for you.

Chip
 
yes, we had that explosion of code size when using the newest CMSIS lib (AND using the old functions) as well on the STM32F4.

But if you use the new functions and refrain from using the old deprecated functions, the code size is normal again.

So, it seems to me to be an issue of using the NEW version of the CMSIS library together with the OLD functions.

By using the NEW version of the CMSIS library together with the NEW functions it works very well with approx. the same code size (at least in our STM32F4 setup with the mcHF). [source code for that SDR is here: https://github.com/df8oe/mchf-github]

So, I would love to see the new version of the CMSIS lib being used, because I guess it would make a significant contribution to the accuracy of the results of the FFTs. However, I see that this would mean to substitute all the old arm_fft_radix4 functions by the new cfft functions in the audio lib.

If I were to use the new lib here on my machine, is there a way to use the newest CMSIS lib by just copying the newest arm_math.h into the program/arduino/... folder? Or are there other files I would need to substitute/install? Sorry if this is a silly question.

Frank
 
are there other files I would need to substitute/install
I briefly played with the new CMSIS library last night. There's more to installing it than just copying the arm_math.h
There are a couple of other files which either need to be replaced or copied in. arm_common_tables.h, arm_const_structs.h
Even then my test code didn't compile. There's at least one define which needs to be added to teensy36.build.flags.defs in boards.txt
I'll be digging into it more today.

Pete
 
I briefly played with the new CMSIS library last night. There's more to installing it than just copying the arm_math.h
There are a couple of other files which either need to be replaced or copied in. arm_common_tables.h, arm_const_structs.h
Even then my test code didn't compile. There's at least one define which needs to be added to teensy36.build.flags.defs in boards.txt
I'll be digging into it more today.

Pete
I got it complied with CMSIS-DSP v1.4.7, Teensy 3.2

Files I used:
  1. libarm_cortexM4l_math.a
  2. arm_math.h
  3. arm_const_structs.h
  4. arm_common_tables.h

In arm_math.h insert this:
Code:
[COLOR=#D12F1B][FONT=Menlo][COLOR=#78492a]#include [/COLOR]<stdint.h>[/FONT][/COLOR]
[COLOR=#78492A][FONT=Menlo]#define __ASM        __asm[/FONT][/COLOR]
[COLOR=#78492A][FONT=Menlo]#define __INLINE    inline[/FONT][/COLOR]
[COLOR=#78492A][FONT=Menlo]#define __STATIC_INLINE    static inline[/FONT][/COLOR]
[COLOR=#78492A][FONT=Menlo]#define __CORTEX_M    [COLOR=#272ad8]4[/COLOR][/FONT][/COLOR]
[COLOR=#78492A][FONT=Menlo]#define __FPU_USED    [COLOR=#272ad8]0[/COLOR][/FONT][/COLOR]
[COLOR=#78492A][FONT=Menlo]#define ARM_MATH_CM4[/FONT][/COLOR]
[COLOR=#D12F1B][FONT=Menlo][COLOR=#78492a]#include [/COLOR]"core_cmInstr.h"[/FONT][/COLOR]
[COLOR=#D12F1B][FONT=Menlo][COLOR=#78492a]#include [/COLOR]"core_cm4_simd.h"[/FONT][/COLOR]

and commit out this in arm_math.h:
Code:
[COLOR=#008400][FONT=Menlo]/*#if defined(ARM_MATH_CM7)[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #include "core_cm7.h"[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#elif defined (ARM_MATH_CM4)[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #include "core_cm4.h"[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#elif defined (ARM_MATH_CM3)[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #include "core_cm3.h"[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#elif defined (ARM_MATH_CM0)[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #include "core_cm0.h"[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #define ARM_MATH_CM0_FAMILY[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#elif defined (ARM_MATH_CM0PLUS)[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #include "core_cm0plus.h"[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #define ARM_MATH_CM0_FAMILY[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#else[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]  #error "Define according the used Cortex core ARM_MATH_CM7, ARM_MATH_CM4, ARM_MATH_CM3, ARM_MATH_CM0PLUS or ARM_MATH_CM0"[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#endif[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]
[/FONT][/COLOR]
[COLOR=#008400][FONT=Menlo]#undef  __CMSIS_GENERIC         enable NVIC and Systick functions */[/FONT][/COLOR]

Example Sketch:
Code:
#include "arm_math.h"
#include "arm_const_structs.h"

const static arm_cfft_instance_q15 *S;

int16_t buffer[2048] __attribute__ ((aligned (4)));

void setup() {
  S = &arm_cfft_sR_q15_len1024;
  arm_cfft_q15(S, buffer, 0, 1);
}

void loop() {
  // put your main code here, to run repeatedly:

}

Compile size:
Sketch uses 21,260 bytes (8%) of program storage space. Maximum is 262,144 bytes.
Global variables use 7,640 bytes (11%) of dynamic memory, leaving 57,896 bytes for local variables. Maximum is 65,536 bytes.
 
So, I would love to see the new version of the CMSIS lib being used, because I guess it would make a significant contribution to the accuracy of the results of the FFTs. However, I see that this would mean to substitute all the old arm_fft_radix4 functions by the new cfft functions in the audio lib.

Sorry, but I cannot see that the quality of FFT's have changed since their invention.
the new CMSIS (which I'm using with modified includes to remove unwanted FFT tables) changed API slightly and may be more convenient, but IMO it is not more accurate.
 
Here's a copy of the arm_math.h library, with a makefile that compiles the code.

https://github.com/PaulStoffregen/arm_math

So far, this hasn't really been tested much. My long-term intention was (and still is) to replace the .a files we use now with the result of compiling from this source. Eventually, I want to bring in the newer features, and also make some fairly simple modifications like using separate lookup tables, so using a 1024 point FFT doesn't import the huge 4096 point table.
 
I've committed a fix for this "passthrough" problem at higher than 120 MHz clock speeds.

https://github.com/PaulStoffregen/Audio/commit/834028d889f736281e66418a51aeaef4f03daa80


@Paul:
I reported some problems with audio in the K66 beta thread.

Can't get microphone or line-in audio to play straight through.
The code in this message still doesn't work on a new T3.6
msg #936 https://forum.pjrc.com/threads/34808-K66-Beta-Test/page38

Great question. I didn't think to try that.

It turns out that it works at 96 MHz and 120 MHz, but it doesn't work at 144, 168, or 180 MHz.
 
@Paul: I've just installed the fix and tried out the first sketch in msg #936 on both the original K66 beta and a new T3.6
They now both work at 180 and 192 MHz.
The K66 has a lot of static-like clicks (missed packets perhaps?) at 216 and 240 MHz.
The T3.6 has lots of static at 216MHz. When I started the test at 240MHz on the T3.6, it seemed to be OK but then I heard one click and over the next 5 minutes or so the frequency of the clicks has picked up. After ten minutes, it still isn't as frequent as at 216MHz but still not good.

Pete
 
Here's a copy of the arm_math.h library, with a makefile that compiles the code.

https://github.com/PaulStoffregen/arm_math

So far, this hasn't really been tested much. My long-term intention was (and still is) to replace the .a files we use now with the result of compiling from this source. Eventually, I want to bring in the newer features, and also make some fairly simple modifications like using separate lookup tables, so using a 1024 point FFT doesn't import the huge 4096 point table.
I've been mucking around with the these source files and see in arm_cfft_radix4_q15.c that the function arm_radix4_butterfly_q15 which is the guts of the fft processing is divided into 3 stages. My thought is to see if I can make each stage its own function so I can spread out each stage in the Audio library fft update function at "states" 5,6 and 7. While its not optimizing the actual fft code it could lower the Max Processor Usage. Also this will delay the actual completion of the calculation by 3 block times but might give a substantial lowering of the Max Processor Usage.
 
The K66 has a lot of static-like clicks (missed packets perhaps?) at 216 and 240 MHz.

Please start a new thread about this issue. Hopefully you can give me a good test case? Does it happen with only WAV file playing or synthesis? If passthrough is necessary, I can set up tests... but something that runs stand-alone from synthesis or known WAV files is preferable for reproducibility.

I'm not going to hold up the 1.31 release for issues that only happen in overclock modes, but I will put it on my list of issues to investigate later.
 
Please start a new thread about this issue. Hopefully you can give me a good test case? Does it happen with only WAV file playing or synthesis? If passthrough is necessary, I can set up tests... but something that runs stand-alone from synthesis or known WAV files is preferable for reproducibility.

I'm not going to hold up the 1.31 release for issues that only happen in overclock modes, but I will put it on my list of issues to investigate later.

Might this thread be a good place to discuss . . .

Teensy-3-6-quot-PassThroughStereo-quot-issue?
...
Just picked up the Teensy 3.6 today ...

All of the sketches so far worked identical - with one exception: "PassThroughStereo" which doesn't seem to get any input at all when compiled for the Teensy 3.6. I noticed that if I drop the clock speed down to 96mhz or lower, it runs perfectly. Any insight? I'm quite new to this, but given that all the other sketches work perfectly, I thought I'd reach out here for some advice.

I'm using the standard "PassThroughStereo" sketch included in the Teensy library, a Teensy 3.6, and the Teensy Audio Adapter. I'm compiling the sketch using Arduino on Mac.
 
looks like I got the FFT down to 18% max usage and the data looks good!

Edit: Teensy 3.2 @ 96MHz.
 
Last edited:
Here is the fast version I put together as a library, well not a faster implementation of the algorithm, I spread the processing of the algorithm out over 3 samples. The example just prints this fft's usage.

The output data should be exactly the same as the core Audio library version plus I cut out any code for the Teensy LC (M0 processor support) but should be able to add it later.
 
Thanks Duff for the detailed description on how to get the new lib working! your test sketch works on my Teensy 3.2.

However I would like to use the CMSIS lib on the new Teensy 3.5. When I try to compile the same sketch on the Teensy 3.5, it does not work, it seems not to find the new functions . . .

"F:\Privat\AMATEURFUNK\Teensy SDR\Teensy 3_5\ARM CMSIS Test 2016_10_24\Duff_Test_sketch/Duff_Test_sketch.ino:10: undefined reference to `arm_cfft_q15'"

I set

#define __FPU_USED 1

in arm_math.h, but that did not help.

Have you got a hint for me?
 
Status
Not open for further replies.
Back
Top