Using CMSIS Version 5.3 with the Teensy 3.6

willie.from.texas · Jun 5, 2018

I haven't seen any discussion on this forum on using CMSIS Version 5.3. I wanted to try utilizing the floating point FFTs in a signal filtering application. I'm familiar with the work that has been discussed on the Software Defined Radio and the Guitar Cabinet Simulation blogs using CMSIS Version 4, along with some of the reservations in integrating into the Teensy libraries due to overhead concerns. I was successful in integrating that library into my environment and was able to design a simple overlap-and-save filtering algorithm using a 200 point impulse response filter. I tried it out using 512 and 1024 point transforms. I was impressed with the floating point performance. For a forward FFT (using arm_rfft_fast_f32), complex multiply and inverse FFT using 512 points it took about 500 microseconds. For a 1024 point FFT it took approximately 900 microseconds. With regard to the program storage space for my demo program, here is the result of the compilation:
Sketch uses 106892 bytes (10%) of program storage space. Maximum is 1048576 bytes.
Global variables use 83612 bytes (31%) of dynamic memory, leaving 178532 bytes for local variables. Maximum is 262144 bytes.

I'm currently working on a realtime version of this filter and have the sampling rate running up to 3 microseconds/sample. The data are digitized using the onboard ADC, filtered, with the filtered results being sent through the DAC. The phase delay is running about 7.8 milliseconds using a 512 point FFT filter.

Here is the code I used to evaluate the performance described above.

/*

Filter a signal containing two frequencies using the overlap-save method.
Uses CMSIS Version 5.3.0 f32 routines on a Teensy 3.6

*/

// Common libraries:
#include <arm_math.h>
#include <arm_const_structs.h>
//#include <ref.h>

#define bufSize 8192
#define numpts 512

#define impulse_length_minus_1 199

static arm_status status;
static arm_rfft_fast_instance_f32 fft_instance;

boolean out_pin; // Use for timing
float32_t input_waveform[bufSize];
float32_t output_waveform[bufSize];
float32_t fft_aray[numpts]; // Array used to store fft result
float32_t window[numpts]; // Array used to store fft of filter data
float32_t cmplx_product[numpts]; // Array used to store complex product of fft result and window
float32_t Ak[numpts]; // Used to store data being sent to the fft routines
float32_t fft_result[numpts]; // Output from inverse fft
float32_t filter[numpts]; // Will contain the impulse that is terminated with zero padding

float32_t impulse[impulse_length_minus_1 + 1] = { // Impulse response for low-pass filter beginning at frequency bin 20 out of 128. Stop band > -100 dB
9.4978700E-08, -1.0099240E-06, -3.9555098E-06, -8.1674212E-06, -1.2233856E-05, -1.4147944E-05, -1.1836013E-05, -3.8747080E-06,
9.7844092E-06, 2.7311901E-05, 4.4937360E-05, 5.7465704E-05, 5.9374463E-05, 4.6323969E-05, 1.6752608E-05, -2.6873229E-05,
-7.7589710E-05, -1.2456752E-04, -1.5493964E-04, -1.5661931E-04, -1.2159463E-04, -4.8950325E-05, 5.3220769E-05, 1.6774265E-04,
2.7052492E-04, 3.3489880E-04, 3.3748708E-04, 2.6447337E-04, 1.1682254E-04, -8.7008106E-05, -3.1198579E-04, -5.1149393E-04,
-6.3594928E-04, -6.4365675E-04, -5.1175053E-04, -2.4469101E-04, 1.2205630E-04, 5.2479704E-04, 8.8124334E-04, 1.1058973E-03,
1.1285844E-03, 9.1242952E-04, 4.6714264E-04, -1.4606171E-04, -8.2023298E-04, -1.4193952E-03, -1.8040441E-03, -1.8610211E-03,
-1.5317851E-03, -8.3265360E-04, 1.3849318E-04, 1.2123019E-03, 2.1748259E-03, 2.8075305E-03, 2.9334421E-03, 2.4602629E-03,
1.4107964E-03, -6.7401987E-05, -1.7177495E-03, -3.2154444E-03, -4.2276791E-03, -4.4827877E-03, -3.8347517E-03, -2.3089211E-03,
-1.1716140E-04, 2.3642611E-03, 4.6536868E-03, 6.2518534E-03, 6.7434843E-03, 5.8931952E-03, 3.7152958E-03, 5.0023185E-04,
-3.2125723E-03, -6.7158793E-03, -9.2593249E-03, -1.0197256E-02, -9.1325789E-03, -6.0274401E-03, -1.2551129E-03, 4.4236323E-03,
9.9633829E-03, 1.4204956E-02, 1.6088985E-02, 1.4876395E-02, 1.0335310E-02, 2.8564602E-03, -6.5310283E-03, -1.6255904E-02,
-2.4406433E-02, -2.9011491E-02, -2.8359337E-02, -2.1302867E-02, -7.4992025E-03, 1.2460427E-02, 3.7058422E-02, 6.4002438E-02,
9.0503604E-02, 1.1364256E-01, 1.3076897E-01, 1.3987339E-01, 1.3987339E-01, 1.3076897E-01, 1.1364256E-01, 9.0503604E-02,
6.4002438E-02, 3.7058422E-02, 1.2460427E-02, -7.4992025E-03, -2.1302867E-02, -2.8359337E-02, -2.9011491E-02, -2.4406433E-02,
-1.6255904E-02, -6.5310283E-03, 2.8564602E-03, 1.0335310E-02, 1.4876395E-02, 1.6088985E-02, 1.4204956E-02, 9.9633829E-03,
4.4236323E-03, -1.2551129E-03, -6.0274401E-03, -9.1325789E-03, -1.0197256E-02, -9.2593249E-03, -6.7158793E-03, -3.2125723E-03,
5.0023185E-04, 3.7152958E-03, 5.8931952E-03, 6.7434843E-03, 6.2518534E-03, 4.6536868E-03, 2.3642611E-03, -1.1716140E-04,
-2.3089211E-03, -3.8347517E-03, -4.4827877E-03, -4.2276791E-03, -3.2154444E-03, -1.7177495E-03, -6.7401987E-05, 1.4107964E-03,
2.4602629E-03, 2.9334421E-03, 2.8075305E-03, 2.1748259E-03, 1.2123019E-03, 1.3849318E-04, -8.3265360E-04, -1.5317851E-03,
-1.8610211E-03, -1.8040441E-03, -1.4193952E-03, -8.2023298E-04, -1.4606171E-04, 4.6714264E-04, 9.1242952E-04, 1.1285844E-03,
1.1058973E-03, 8.8124334E-04, 5.2479704E-04, 1.2205630E-04, -2.4469101E-04, -5.1175053E-04, -6.4365675E-04, -6.3594928E-04,
-5.1149393E-04, -3.1198579E-04, -8.7008106E-05, 1.1682254E-04, 2.6447337E-04, 3.3748708E-04, 3.3489880E-04, 2.7052492E-04,
1.6774265E-04, 5.3220769E-05, -4.8950325E-05, -1.2159463E-04, -1.5661931E-04, -1.5493964E-04, -1.2456752E-04, -7.7589710E-05,
-2.6873229E-05, 1.6752608E-05, 4.6323969E-05, 5.9374463E-05, 5.7465704E-05, 4.4937360E-05, 2.7311901E-05, 9.7844092E-06,
-3.8747080E-06, -1.1836013E-05, -1.4147944E-05, -1.2233856E-05, -8.1674212E-06, -3.9555098E-06, -1.0099240E-06, 9.4978700E-08
};
float pi = 3.1415926535897932f;
float freq1; // Want to keep freq1
float freq2; // Want to filter out freq2

void setup()
{
Serial.begin (230400);

// Set up output ports for timing purposes
PORTC_PCR4 = ( unsigned long ) 0x00000101; // Section 12.5.1, pg. 221 - 222, see Pin Mux Control
// Bit 0 -> Pull Select -> Internal pullup resistor is enabled
// Bit 8 -> ALT1 -> GPIO
PORTC_PCR5 = ( unsigned long ) 0x00000101; // Section 12.5.1, pg. 221 - 222, see Pin Mux Control
// Bit 0 -> Pull Select -> Internal pullup resistor is enabled
// Bit 8 -> ALT1 -> GPIO
GPIOC_PDDR = ( unsigned long ) 0x00000020; // Configure pin 13 as output. See Section 63.3.6. Pin 13 is PTC5 (on-board LED)

// Build waveform for processing
freq1 = 5. * (float)numpts/(float)(impulse_length_minus_1 + 1); //Want to keep freq1
freq2 = 4. * freq1; //Want to filter out freq2. For a 256 point FFT the 3 dB cutoff frequency is at about frequency bin #19

for (int i=0; i < (bufSize); i ++) // Calculate signal to filter over bufSize points
{
input_waveform = sin(2. * pi * freq1 * (float)i/(float)numpts) + sin(2. * pi * freq2 * (float)i/(float)numpts);
}

// Build impulse filter array. Need to zero pad from the end of the impulse to numpts (end of the array).
arm_fill_f32(0.0, filter, numpts);
for (int i = 0; i < impulse_length_minus_1 + 1; i++) filter = impulse;
status = arm_rfft_fast_init_f32(&fft_instance, numpts); // Initialize the fft routines
arm_rfft_fast_f32(&fft_instance, filter, window, 0); // Calculate the spectral response of the filter.
GPIOC_PSOR |= (1 << 5); // Section 63.3.2 - Set pin 13 HIGH
out_pin = true;
delay(5000); // Give plenty of time to trigger scope and to select either the Serial Monitor or Serial Plotter
}

void loop()
{
arm_fill_f32(0.0, output_waveform, bufSize);
arm_fill_f32(0.0, input_waveform, impulse_length_minus_1); //Need to zero pad the beginning of the signal so the circular convolution matches a linear convolution.

// This loop for processing input_waveform data completes in about 500 microseconds per pass for numpts = 512
// 900 microseconds per pass for numpts = 1024

for (int i = 0; i < bufSize/(numpts - impulse_length_minus_1); i++) { // Use for overlap to avoid circular convolution
if (out_pin) GPIOC_PCOR |= (1 << 5); // Section 63.3.1 - Set pin 13 LOW -- Use for scope timing
else
GPIOC_PSOR |= (1 << 5); // Section 63.3.2 - Set pin 13 HIGH
out_pin = !out_pin;
arm_copy_f32(input_waveform + i * (numpts - impulse_length_minus_1), Ak, numpts);
arm_rfft_fast_f32(&fft_instance, Ak, fft_aray, 0);
arm_cmplx_mult_cmplx_f32(fft_aray, window, cmplx_product, numpts/2); // Multiplication in frequency domain is a convolution in the time domain
arm_rfft_fast_f32(&fft_instance, cmplx_product, fft_result, 1); // Inverse transform
arm_copy_f32(fft_result + impulse_length_minus_1, output_waveform + i * (numpts - impulse_length_minus_1), numpts - impulse_length_minus_1);
}

for (int i = 0; i < 1000; i++) Serial.println(output_waveform); //Use to drive the serial plotter to display the second set of 500 points

while(1) {}
}

bmillier · Nov 3, 2018

Hi Willie: I am interested in your post- specifically the CMSIS 5.3 part. I am the person who posted the Cabinet Simulation thread on this forum. It used CMSIS 4.5, and I installed the required CMSIS 4.5 libraries according to the instructions in the SDR forum post:
https://forum.pjrc.com/threads/40590-Teensy-Convolution-SDR-(Software-Defined-Radio)?highlight=SDR
I had wondered if it would be useful/possible to upgrade my code to use CMSIS 5 (currently 5.4.0). If I follow the same procedure that is outlined in the post above, but with the corresponding CMSIS 5 files, the compiler throws a whole lot of errors, none of which are meaningful to me.
In your program you have
#include <arm_math.h>
#include <arm_const_structs.h>
which are the same 2 includes that I used in my program. The procedure shown in the above thread calls for some changes to math.h as well as the addition of a few other files:
libarm_cortexM4l_math.a
libarm_cortexM4lf_math.a
to the hardware/tools/arm/arm-none-eabi/lib/ folder
Did you follow the same procedure as the link above? I am using the Audio library for real-time processing of the signal to/from the Audio Shield, where it appears you're not, so possibly that is why your program is working and mine is throwing all the errors.
Also, I am not sure that there is any advantage to be gained from using the CMSIS 5 library instead of the CMSIS 4.5 library. Any thoughts?
Your 7.8 ms latency @ 200 FIR-taps is virtually identical to the 19.5 ms latency I get with a 513-tap FIR filter (factoring in that mine has 2.5X number of taps)
Thanks

WMXZ · Nov 3, 2018

bmillier said:
I had wondered if it would be useful/possible to upgrade my code to use CMSIS 5 (currently 5.4.0). If I follow the same procedure that is outlined in the post above, but with the corresponding CMSIS 5 files, the compiler throws a whole lot of errors, none of which are meaningful to me.
In your program you have
#include <arm_math.h>
#include <arm_const_structs.h>
which are the same 2 includes that I used in my program. The procedure shown in the above thread calls for some changes to math.h as well as the addition of a few other files:
libarm_cortexM4l_math.a
libarm_cortexM4lf_math.a
to the hardware/tools/arm/arm-none-eabi/lib/ folder

What I do with CMSIS DSP is to copy the files and include files of interest into say ./src/CMSIS/ sub directory of my sketch.
this way compiler will find first my new include files and compiles the routines with local files and ignores the arm_math.h in audio library. it is this old arm_math.h file that generates the problems.

I do not add libarm_cortex files, but use the ad-hoc compiled files (they are fast enough).

bmillier · Nov 3, 2018

WMXZ said:
What I do with CMSIS DSP is to copy the files and include files of interest into say ./src/CMSIS/ sub directory of my sketch.
this way compiler will find first my new include files and compiles the routines with local files and ignores the arm_math.h in audio library. it is this old arm_math.h file that generates the problems.

I do not add libarm_cortex files, but use the ad-hoc compiled files (they are fast enough).

Thanks WMXZ.
But, I'm not sure what you are referring to when you state that arm_math.h is in the Audio library. In my Arduino 1.8.7/Teensyduino 1.44 (and 1.8.5) the Audio library is in folder
arduino-1.8.7/hardware/teensy/avr/libraries/Audio
and that folder does not contain arm_math.h (although many of the audio library files there #include arm_math.h of course)
The arm_math.h exists at
arduino-1.8.7/hardware/teensy/avr/cores/teensy3
This folder is where I placed the arm_common_tables, arm_const_structs and arm_math.h files that I got from CMSIS 4.5 (successfully when I followed the instructions in the SDR post) and lately CMSIS 5.4.0 (unsuccessfully, as mentioned in my initial post)

If I (1) make a CMSIS sub-folder under my sketch folder, and add arm_common_tables, arm_const_structs and arm_math.h files to this folder
(2) rename arm_math.h in the cores/teensy3 folder to something else (to make sure it is not being used by the compiler)
the compiler will throw: in the Audio/analyse_fft256 library routine fatal error: arm_math.h: No such file
Same thing happens if I change my sketch includes to
#include <CMSIS/arm_math.h>
#include <CMSIS/arm_const_structs.h>

or
#include "CMSIS/arm_math.h"
#include "CMSIS/arm_const_structs.h"

So, this isn't working for me, unless I am mis-understanding your post, or doing something else wrong

WMXZ · Nov 3, 2018

bmillier said:
Thanks WMXZ.
But, I'm not sure what you are referring to when you state that arm_math.h is in the Audio library. In my Arduino 1.8.7/Teensyduino 1.44 (and 1.8.5) the Audio library is in folder
arduino-1.8.7/hardware/teensy/avr/libraries/Audio
and that folder does not contain arm_math.h (although many of the audio library files there #include arm_math.h of course)
The arm_math.h exists at
arduino-1.8.7/hardware/teensy/avr/cores/teensy3
This folder is where I placed the arm_common_tables, arm_const_structs and arm_math.h files that I got from CMSIS 4.5 (successfully when I followed the instructions in the SDR post) and lately CMSIS 5.4.0 (unsuccessfully, as mentioned in my initial post)

If I (1) make a CMSIS sub-folder under my sketch folder, and add arm_common_tables, arm_const_structs and arm_math.h files to this folder
(2) rename arm_math.h in the cores/teensy3 folder to something else (to make sure it is not being used by the compiler)
the compiler will throw: in the Audio/analyse_fft256 library routine fatal error: arm_math.h: No such file
Same thing happens if I change my sketch includes to
#include <CMSIS/arm_math.h>
#include <CMSIS/arm_const_structs.h>

or
#include "CMSIS/arm_math.h"
#include "CMSIS/arm_const_structs.h"

So, this isn't working for me, unless I am mis-understanding your post, or doing something else wrong

Sorry, I was writing without double checking.
Anyhow, a local (i.e. in src/cmsis) version takes precedence over core version.

so #include "src/cmsis/arm_math.h" refers directly to the local arm_math file. note the src directory in the sketch folder is necessary. AFAIK, it could also be lib and all other subdirectories are ignored by Arduino.

BTW I do it in myProcess.h in https://github.com/WMXZ-EU/microSoundRecorder/tree/microSoundRecorder_dev

bmillier · Nov 4, 2018

Thanks WMXY. I tried this using a folder called src/cmsis/arm_math.h" in my sketch folder, and put arm_math.h and associated files in there. I modified my sketch to #include "src/cmsis/arm_math.h"
While everything compiles fine, it is still using arm_math.h from the "arduino-1.8.7/hardware/teensy/avr/cores/teensy3" location. I know this, because if I rename arm_math,h in that folder, the PJRC Audio library is looking for it there, and throws an error if it is not there.
Since everything works fine using CMSIS 4.5 files with the Audio library (and my added FFT-convolution filter code merged with it) - using the instructions at https://forum.pjrc.com/threads/40590-Teensy-Convolution-SDR-(Software-Defined-Radio)?highlight=SDR
I guess I'll just stick with that and forget about CMSIS 5. I believe that you are not using the PJRC Audio library, so our situations are different.

WMXZ · Nov 4, 2018

bmillier said:
Thanks WMXY. I tried this using a folder called src/cmsis/arm_math.h" in my sketch folder, and put arm_math.h and associated files in there. I modified my sketch to #include "src/cmsis/arm_math.h"
While everything compiles fine, it is still using arm_math.h from the "arduino-1.8.7/hardware/teensy/avr/cores/teensy3" location. I know this, because if I rename arm_math,h in that folder, the PJRC Audio library is looking for it there, and throws an error if it is not there.
Since everything works fine using CMSIS 4.5 files with the Audio library (and my added FFT-convolution filter code merged with it) - using the instructions at https://forum.pjrc.com/threads/40590-Teensy-Convolution-SDR-(Software-Defined-Radio)?highlight=SDR
I guess I'll just stick with that and forget about CMSIS 5. I believe that you are not using the PJRC Audio library, so our situations are different.

The only thing I can imagine is that you use audio.h, which includes other objects that call arm_math.h.
I avoid this and include explicitly the specific audio library files.
Also the linker will use local libraries first even it compiles audio library, I think or better hope.

On the other side, I do not expect that cmsis/dsp routines are changing significantly from 4.5 to 5.4.
after all an FFT is a done deal and in fact the cmsis/dsp is ot touched in the last two years, or so.

XFer · Nov 4, 2018

You may try copying the "right" version of arm_math.h in the same folder of your sketch (same folder of the .ino, that is) and then #include "arm_math.h" (double quotes, not angular parenthesis) as the very first line in your sketch.
This *should* take precedence over other headers.
Unfortunately Arduino keeps changing the way it looks for libraries, headers etc, so it's not straightforward.

Frank B · Nov 4, 2018

Maybe other files are using the "old include" #include <arm_math.h>"
Nevertheless, that's not a problem - worst case is, these files are using a different version - which is more a theoretical problem, only. Your code, if you use the new path in your files, will use the new version.
Just tested, because today I needed CMSIS 5 for the minimal SDR https://github.com/FrankBoesing/Minimal-SDR/tree/master/src

bmillier · Nov 5, 2018

@Frank B. Yes, the audio library itself accesses arm_math.h in folder arduino-1.8.7/hardware/teensy/avr/cores/teensy3.
My FFT convolution filter is completely integrated into the audio library (locally anyway). Placing the CMSIS 4.5 arm_math.h in that folder worked perfectly (when modified according to the procedure posted in SDR thread by Jan, mentioned in my initial post) - but trying this with arm_math.h from CMSIS 5 didn't work. I don't need CMSIS 5, I just thought it made sense to keep up to date.
I really don't want to have the CMSIS math files in a sub-folder of my sketch. While my own sketch code could access them there, I doubt that my FFT convolution routine, now part of the Audio library, would use them- it would go to the cores/teensy3 folder I think.
Thanks posters for your input on this problem

willie.from.texas · Nov 13, 2018

Hi bmiller,

Forgive me for not replying to your post sooner but I have been off the net over the last couple of weeks. I’m still on the road for the next week but wanted to direct you to a thread that should give you everything you need to integrate CMSIS V5.2 onto your system. Please let me know if you were successful in its implementation.
Regards,
W
https://forum.pjrc.com/threads/5380...Now-Working-on-Teensy-3-6?p=188532#post188532

bmillier · Nov 16, 2018

Hi willie.from.texas
Thanks Willie. I re-installed Arduino 1.85/Teensyduino 1.44 fresh into a new folder, performed the instructions that I had earlier used for CMSIS 4.5 (substituting the CMSIS 5 files from github) and then removed the "core_cm4_simd.h" from arm_math.h per your linked post. Now my program compiles/runs fine.
best regards

willie.from.texas · Nov 16, 2018

Glad to hear. Thanks for letting me know.
Kind regards,
W

Using CMSIS Version 5.3 with the Teensy 3.6

willie.from.texas

Well-known member

bmillier

Well-known member

WMXZ

Well-known member

bmillier

Well-known member

WMXZ

Well-known member

bmillier

Well-known member

WMXZ

Well-known member

XFer

Well-known member

Frank B

Senior Member

bmillier

Well-known member

willie.from.texas

Well-known member

bmillier

Well-known member

willie.from.texas

Well-known member