ARM_MATH library

Kerry

New member
Hello:

I have been successful in including the arm.math.h library and running and example fft on the teensy 4.1 an am excited that this capability is now available. Only 1 question. I cannot find any GOOD documentation on the arm_math library funcitons. Take for instance the fft functions. The example works but appears to have a extern data file with sample rates in the order of 100Khz, thus giving a maximum fft frequency of about 50Khz. I cannot see how the sample rate was first input to the fft routines or if that is even possible to change it. Better documentation has to exist that I have not found yet. Anyone??
 
CMSIS math library

You might check this: https://www.keil.com/pack/doc/CMSIS/DSP/html/arm__math_8h.html. Arm_math is part of the CMSIS_DSP library which is part of Teensyduino already. If you do a search for CMSIS DSP or FFT in the forum will give you quite a bit of info.

https://forum.pjrc.com/threads/71128-Cmsis-5-9-0-cmsis-dsp-1-12?highlight=cmsis+dsp
or
https://forum.pjrc.com/threads/52651-Using-CMSIS-Version-5-3-with-the-Teensy-3-6?highlight=cmsis+dsp

thx, it will take some time to digest it all
 
Realize this is an older thread, but the topic is the same.

Where can I actually find the transform functions in the current Teensy CMSIS?

What size FFT's are available in f32 for T4x? Is it possible to increase the FFT size, to 32K or 64K? I'd like to characterize a 16 bit ADC and large FFT's are required. They need to be floating point to express the dynamic range.

Not that I need this now, but are there any f64 FFTs? The T4 can do the math. From what I understand at least the newer CMSIS libraries support this.
 
Just looking at arm_math.h in cores\Teensy4, it looks like the length of the FFT is settable in a field of a struct that is a parameter to the functions. The field fftLen is type uint16_t, so perhaps you can do 32K but not 64K. I've never used it, so I could be wrong.

There is no f64 in the version 1.5.1 of CMSIS DSP in TeensyDuino now, but @mjs513 recently posted a way to use the latest version 1.12. You can download his ZIP file and then search for "fft.*f64" in the header files.

https://github.com/mjs513/Teensy-DSP-1.12-Updates
 
Realize this is a merry go round sort of thing - https://arm-software.github.io/CMSIS-DSP/latest/group__ComplexFFT.html shows the version of cmsis-dsp is now 1.14.3. At least 1.14.3 shows f64 functions.

I think I found a 4K F32 FFT in 1.12. That's better than a 1K F32 FFT. For ADC characterization it is common to use 16K or higher FFT's. I will see what I can figure out with the 1.12 update. Don't think I could build the 1.14.3 library - don't even know if I could repeat the 1.12 build! The reason for the huge FFT's is to reduce the noise floor enough to see if there are any ADC related (or any other problems in the analog chain) issues, like spurious responses. Increasing the size by 4 only reduces the noise floor power by 6 dB. A 16K FFT has a 12 dB lower noise floor than a 1K.
 
Realize this is a merry go round sort of thing - https://arm-software.github.io/CMSIS-DSP/latest/group__ComplexFFT.html shows the version of cmsis-dsp is now 1.14.3. At least 1.14.3 shows f64 functions.

I think I found a 4K F32 FFT in 1.12. That's better than a 1K F32 FFT. For ADC characterization it is common to use 16K or higher FFT's. I will see what I can figure out with the 1.12 update. Don't think I could build the 1.14.3 library - don't even know if I could repeat the 1.12 build! The reason for the huge FFT's is to reduce the noise floor enough to see if there are any ADC related (or any other problems in the analog chain) issues, like spurious responses. Increasing the size by 4 only reduces the noise floor power by 6 dB. A 16K FFT has a 12 dB lower noise floor than a 1K.

Wow, they've been doing a lot of releases then. Just to be clear, if you want to try the 1.12 version, you don't have to build the libraries because @mjs513 included the binaries in his github repo. I haven't tried it, but he provides instructions for copying the new core files and library binaries to the proper locations. Just curious, but do you agree that the 1.5.1 version of CMSIS seems to allow you to specify the size? If so, couldn't you use it do a 16K f32 FFT?
 
Wow, they've been doing a lot of releases then. Just to be clear, if you want to try the 1.12 version, you don't have to build the libraries because @mjs513 included the binaries in his github repo. I haven't tried it, but he provides instructions for copying the new core files and library binaries to the proper locations. Just curious, but do you agree that the 1.5.1 version of CMSIS seems to allow you to specify the size? If so, couldn't you use it do a 16K f32 FFT?

Where I get tripped up are the defines, or the specific names that are used to get the larger FFT's I don't know what they are called. Will make an attempt at it... A lot of the examples are very specific and there's little attempt to generalize them, so I'm left guessing what to do next. Maybe it's because I'm just a hack programmer.
 
Sort of have something running, but the results don't look correct just yet. I cannot get the "more modern way" to work. This is the sequence that I use. Not the whole program. I'd post it, but it contains code from a different contributor, and I have not gotten permission yet. So these are the salient parts, and are my code. I used something similar on my doppler chronograph, which ran on an M4.
Code:
#include <arm_math.h>
#include <arm_const_structs.h>
arm_cfft_radix4_instance_f32 fft_inst;  // instantiate (deprecated but works) 

// define buffer size
#define datalen (1024 * 4) // expression needs to be in parentheses!
#define FFT_SIZE datalen

#define NUMBITS 16
#define Vref 2.5000
#define vbit Vref/powf(2.000f, NUMBITS)

volatile uint32_t data[datalen]; // set up a buffer for our ADC
volatile uint32_t datapointer = 0; // pointer into data[]

float fftbuf[2*FFT_SIZE];
float outbuf[FFT_SIZE];   // where the fft in dBs ends up
float weight[FFT_SIZE];   // holder for the window coefficients

float twoPi =  6.28318531f;
float fourPi =12.56637061f;
float sixPi = 18.84955592f;

void calcBlackmanHarrisWindow(uint16_t fftsize) {
  float volatile ratio;
  float a0, a1, a2, a3, f1; 
  a0 = 0.35875f; a1 = 0.48829f; a2 = 0.14128f; a3 = 0.01168f;
  f1 = ((float) fftsize);
  for (int i=0; i<fftsize; i++) {
    ratio = ((float)i)/f1;      // from Wikipedia Windows
    weight[i] = a0 -a1*cosf(twoPi*ratio) +a2*cosf(fourPi*ratio) -a3*cosf(sixPi*ratio); }
}

// Instantiate an arm fft instance - fwd, with no bit reversal 0,1
arm_cfft_radix4_init_f32( &fft_inst, FFT_SIZE, 0, 1); 
memset(fftbuf, 0, sizeof(fftbuf));  // clear fft buffer memory
calcBlackmanHarrisWindow(FFT_SIZE); // window function returns values in variable "weight"

// collect ADC samples into data[], compute float average, when complete, then
Serial.println("Start of FFT processing");
Serial.printf("FFT_SIZE = %i\r\n", datalen);
// Do an FFT here
// fftbuf is 2x bigger than datalen, due to imaginary part
// data is stored as re[0], im[0], re[1], im[1], ... re[N-1], im[N-1]
// First convert data to float subtract the mean and put in the even mem locations
for (int i = 0; i<datalen; i++) {
  fftbuf[2*i] = ((float) data[i] * vbit) - average; // convert counts in data to volts and remove mean
  fftbuf[2*i+1] = 0.0f;  // imaginary part is zero
  fftbuf[2*i] *= weight[i];   
}
Serial.println("Finished putting into fftbuf");
arm_cfft_radix4_f32( &fft_inst, fftbuf ); // do the fft in place
Serial.println("Finished FFT");
arm_cmplx_mag_squared_f32( fftbuf, outbuf, FFT_SIZE/2 );  // mag squared
Serial.println("Finished mag squared");
for( int i = 0; i < FFT_SIZE/2; i++) {
  outbuf[i] = 10.0f*log10f(outbuf[i]);   // use 10 log, since mag sq is power
}
Serial.println("Computed dBs");

int k = 8;  // print 8 samples per line
for( int i=0; i<FFT_SIZE/2/k; i++) {
  for( int j=0; j<k; j++) {
    Serial.print( outbuf[i+j]), Serial.print(", ");
  }
  Serial.println(); // every 8 samples a carriage return, we get 1 extra comma, but that's easy to edit out!
}
The current version of the CMSIS DSP library supports up to 4K, from what I can tell. An 8K & 16K FFT both cause Teensy 4.1 to reset and restart the program. Actually, I don't even know what version of CMSIS this library corresponds to. I do know that the functions have slightly different API's. What version CMSIS-DSP is in Teensy now?

What is the better way of using CMSIS DSP? Hope to post the whole code soon, however it requires an external (custom) board to run. I have a Teensy wired the that board with jumpers on my desk. Also have a T4.0 audio board set up as a 10KHz source to test this. I have digitized a sine wave at 1 MSPS.
 
Err, made a few mistakes. Dumb indexing problem at the end - printing out. Should be 8*i + j, not i+j. Also forgot to scale the FFT. That matters, at least if you are comparing noise floors. Still limited to 4K ffts.
 
Even the latest CMSIS is limited to 4K FFT's. I wonder why that is? Can't a Teensy/M7 do a bigger one? Is it some sort of register limit? There's lots of RAM. Or is it a matter of keeping the library size under control?
 
Even the latest CMSIS is limited to 4K FFT's. I wonder why that is? Can't a Teensy/M7 do a bigger one? Is it some sort of register limit? There's lots of RAM. Or is it a matter of keeping the library size under control?

I haven't used it myself, but I have a feeling those are convenience functions, and if you want, you can build your own "custom" size. There are also many portable FFT libraries out there, such as https://www.fftw.org/
 
FFTW is pretty easy to build on many OS's. I built it in 2009 for the Cell processor, which ran Yellow Dog Linux. I don't know how to do a build for a Teensy. Definitely not plug and play, as compared to a standard PC build, since Teensy has no OS.
 
Just as a fyi, FFTW consists of 53 folders and about 3500 files. I probably don't need 75% of them, but it's not just a couple of files. I'll keep on looking into it, but it doesn't seem trivial.
 
Just as a fyi, FFTW consists of 53 folders and about 3500 files. I probably don't need 75% of them, but it's not just a couple of files. I'll keep on looking into it, but it doesn't seem trivial.

I thought I had replied again, but yes, you're right that FFTW is not appropriate. You might look at ArduinoFFT on github. I don't know if it would meet your needs, but at least it would be easy to try.
 
Back
Top