Fast Convolution Filtering with Teensy 4.0 and audio board

Status
Not open for further replies.
@bmillier: Thanks. That's what I expected but it doesn't hurt to verify.
@DD4WH: I would not be keeping the convolution portion - though I understand that to be the goal here, my intent is simply to get an array of floats in the frequency domain upon which I will do other work before converting back for output; converting a stereo pair into left, center and right channels. I was under the impression that the partitioning applied to the FFT as well and so was interested in the improved average CPU load more than the improved latency. The closest I will get to convolution will be the application of a raised cosine windowing function. I had worked briefly with the code from the initial post, but as it is in I,Q and not stereo it did not directly suit my needs. I will have a closer look at Brian's audio object and see if that will be better aligned to my purposes. As I am only concerned with the frequency portion and not the phase, a high resolution FFT/iFFT with conversion to Float is really what I'm after. Thank you both!
 
@ Frank. You may be right. But I am doing the same 8 operations at once in the "i" loop. The difference is that you do this
accum[i2] += fftout[k][2 * i + 0] * fmask[j][2 * i + 0] -
fftout[k][2 * i + 1] * fmask[j][2 * i + 1];

and I do
ptr1 = ptr_fftout + kc; // pointer calc
temp1 = *ptr1;
ptr2 = ptr_fmask + jc; // pointer calc
temp2 = *ptr2;
temp3 = *(ptr1 + 1);
temp4 = *(ptr2 + 1);
accum[i2] += ((temp1 * temp2) - (temp3 * temp4));

I am still doing two multiplies to get accum result. But the compiler has to do multiplication/additions tp calculate each of your fftout, fmask array indices. I am just adding offsets kc, jc to the pointers to these arrays. Each time though the outer loops, I recalculate kc, jc only once per iteration. So, mine should be much faster, but it isn't.
I'm patching in the CMSIS call to my code now. I haven't got it right yet though.
 
Hi Frank: I just replaced all of the code inside the "i" loop in my complex multiply routine by the following:
ptr1 = ptr_fftout + k512;
ptr2 = ptr_fmask + j512;
arm_cmplx_mult_cmplx_f32(ptr1, ptr2, ac2, 256); // ac2 is a temporary holding array
for (int q = 0; q < 512; q=q+8) {
accum[q] += ac2[q];
accum[q+1] += ac2[q+1];
accum[q+2] += ac2[q+2];
accum[q+3] += ac2[q+3];
accum[q+4] += ac2[q+4];
accum[q + 5] += ac2[q + 5];
accum[q + 6] += ac2[q + 6];
accum[q + 7] += ac2[q + 7];
}
The CMSIS routine does the complex multiply, but I had to add the code at the end to do the accumulate function. I used the same method of doing 8 accumulates per loop, as you did. Now, my routine executes in 1000 us (verified by 'scope) , a bit faster than the 1280 us you measure on yours.
So, I am almost 100% certain that the compiler is generating code using the same CMSIS complex multiply routine with your code too.
:D
 
Library Code for the uniformly-partitioned FFT convolution filter/cabinet simulator

Hi Frank: I have uploaded this library code to my Github site, in its own folder:
https://github.com/bmillier/FFT-Convolution-Filter-Uniformly-partitioned/blob/master/README.md

I took your demo program, and modified it to use the library function. In the library,I used a few more CMSIS routines and some other fast,low-level copy/clear routines, as well as consolidating several loops into a single loop.
The execution time is now < 1000 us and the total latency is just 6.8 ms.;)
I gave credit to Warren Pratt and yourself in the readme file, as well as in the .cpp/h files.
What do you think? Should we contact Paul to offer it up as a part of the official Audio library?
Cheers
 
Hi Brian,

thats excellent news!!! Congrats! I wont have time to have a look at and test this until the weekend. Very good that you were able to optimize both processor load and latency.

I will write some info about how to calculate coefficients for minimal phase FIR filtering and maybe I can make a pull request to your github repo readme file this weekend. Yes, this should be useful as an addition to the audio lib!

All the best,

Frank DD4WH
 
Thanks Frank. I saw the minimal phase FIR filters in your example code, but didn't know if you needed a lot of computing power on an external PC to generate them. I'll look forward to your comments after testing
Cheers
 
Brian, your code runs and performs very well! Well done, thats very good!

I added some very small modifications in the example file --> AUDIO_BLOCK_SAMPLES, Serial printing processor load, fixing Serial while command in setup --> should run for your IDE now too!

I put some short comments on minimum phase FIR coeff calculation using MATLAB into my github readme file.

Feel free to copy that into your github readme file.

Have a nice weekend! Frank DD4WH
 
@ Frank. Thank you- I'm pleased it ran nicely for you too. I couldn't find the mods mentioned above on my Github site, under Pull requests or Issues. Then I went to your site and found them. The Serial while command is better your way. I usually just leave it out since both the Arduino and VM IDEs re-open the serial monitor immediately after a program download, so you don't lose any initial dubug messages you might have at the beginning, even if you don't wait for Serial available.
I copied your FIR text to the end of my readme file, although it did not end up with the "bullets" that you used- not sure why.
I am not sure how to offer this as an official library item to Paul S. I think we'd have to start a new thread with a subject "Convolution Filter offered for official Audio library" to get his attention. What do you think?
Hope you enjoy your weekend too- but you have 5-6 hours less of it left than I do, being in N.A.!
 
@Brian:
I downloaded your library just to have a look. It doesn't compile because BUFFER_SIZE is not defined. Looking through previous messages here, I assume that it should be 128.

Pete
 
@ Frank. Thank you- I'm pleased it ran nicely for you too. I couldn't find the mods mentioned above on my Github site, under Pull requests or Issues. Then I went to your site and found them. The Serial while command is better your way. I usually just leave it out since both the Arduino and VM IDEs re-open the serial monitor immediately after a program download, so you don't lose any initial dubug messages you might have at the beginning, even if you don't wait for Serial available.
I copied your FIR text to the end of my readme file, although it did not end up with the "bullets" that you used- not sure why.
I am not sure how to offer this as an official library item to Paul S. I think we'd have to start a new thread with a subject "Convolution Filter offered for official Audio library" to get his attention. What do you think?
Hope you enjoy your weekend too- but you have 5-6 hours less of it left than I do, being in N.A.!
 
@Pete: Good observation. I modified this demo program from Frank DD4WHs original program. In it he defines BUFFER_SIZE = partition size, which is 128. For some reason I removed that line- probably because my older "filter_convolution.h" file has the line
#define BUFFER_SIZE 128 . However, there it is declared private. I use Visual Micro, and I can see that it is finding the BUFFER_SIZE definition from that .h file. However, my Arduino IDE also shows no compile error, so it must be getting the definition from there too.
But, you're right, the demo program should have BUFFER_SIZE defined as 128
Thanks
 
I am not sure how to offer this as an official library item to Paul S. I think we'd have to start a new thread with a subject "Convolution Filter offered for official Audio library" to get his attention. What do you think?

Hi Brian, I recently learned a bit about different licenses: it seems (if I understood correctly) that GPLv3 (the license I use for all my code and which is also used by Warren Pratt for the wdsp lib) is incompatible with the MIT license of the audio library. So there does not seem to be any way to include the Convolution object into the audio lib without changing the license of that part of the audio lib! GPL requires code to remain GPL licensed forever (so, if somebody adds proprietary code to it, this proprietary code also has to be GPL licensed). In contrast, MIT permits to add to code and subsequently close the code or license it under a different license . . .

However, its not a real problem in practice, everybody can use the code given that he/she respects the GPL license requirements.

Have a nice week!

All the best, Frank DD4WH
 
DD4WH, I want to say huge thank you for publishing this work!

I've tried guitar_cabinet_impulse (the default version with 22016 taps) yesterday and it sounds truly amazing!

The only problem now is that I don't have enough memory for anything else :). There is a space for Freeverb for one channel, but trying to apply it to the second channel goes beyond the limit. And the versions with less taps are not even close to being as awesome as the 22016 taps version. For now I would want to try an idea of two Teensys. One in I2S master mode and the second one in I2S slave mode. Like ADC -> T4 Master (for all other effects) -> T4 Slave for the cabinet -> DAC. Not sure if it will work or not, but I certainly don't want to decrease the number of taps, the 22016 version is so beautifully punchy!

Anyway, thank you once again!
 
@DD4WH
Could you kindly advise how to modify your code "Real Time PARTITIONED BLOCK CONVOLUTION FILTERING"
(Post #35) to connect to the "Audio Shield for Teensy 4.0" from Antratek?
The individual ADC and DAC boards are hard to get in time.
Thank you,
Cheers
 
No, just ordered the shield, should arrive within in a few days.
Just connect the audio shield and the code (#35) should work?

Meanwhile I tried your code for the Teensy3.5 (Frank DD4WH 2016_10_29).
I get the error "'arm_cfft_instance_f32' does not name a type" from line 145 (const static arm_cfft_instance_f32 *S;)
(in contrast, the version (#35) for the Teensy4 compiles well.)

I grew up with AVR-like MCUs, and more recently used the ESPs.
So, working with the Teensys requires some new experiences....
 
Meanwhile I tried your code for the Teensy3.5 (Frank DD4WH 2016_10_29).
I get the error "'arm_cfft_instance_f32' does not name a type" from line 145 (const static arm_cfft_instance_f32 *S;)
(in contrast, the version (#35) for the Teensy4 compiles well.)

Not sure what kind of program you mean that I put together in 2016!? But I never prepared anything for the T3.5!?

If you want to use the ARM CMSIS lib with the T3.6, follow exactly the steps specified below in order to install a newer version of the CMSIS lib (you have to do that in order to use functions like arm_cfft_f32, which are not included in the old CMSIS version, which comes with the standard implementation of Teensyduino):

https://github.com/DD4WH/Teensy-ConvolutionSDR

The T4 uses a newer version of CMSIS, thus the code runs with the standard install of Teensyduino.

Try the convolution code with the T4 and the audio shield when it arrives. If you encounter any problems, you can post again here.

BTW: use this link for the latest Convolution code, DO NOT USE the code from this thread, it does not work correctly . . .
https://github.com/DD4WH/Uniformly_partitioned_convolution
 
Last edited:
Hi,
the T4 audio shield just had arrived.
Now the hurdle is how to connect the pieces together.
I had soldered common header pins to the Teensy T4 board (sticking downwards) prior to having the audio shield.
Apparently, it would have been better to use A 14 pin socket.
It's hard to reverse if it were done wrongly.
Like this?
IMG_20200212_135315574[1].jpg
 
Last edited:
Frank and Tim already said everything necessary to eventually come to a success, but I think this photograph can back it up [but you only use ONE audio board ;-)]:

https://www.pjrc.com/store/teensy3_audio_quadch.jpg

And yes, I would also think that it would be great to have the audio board silk screen on both sides and a large text indication of the Teensy version it can be used with.
 
Frank and Tim already said everything necessary to eventually come to a success, but I think this photograph can back it up [but you only use ONE audio board ;-)]:
"The headphone socket must be on the same side as the teensy usb connection."
That's weird since totally unexpected in view of boards being sold with those (short) header pins oriented downwards (for instance https://opencircuit.shop/Product/Teensy-4-met-headers)

The mounting picture is hard to understand (shown for T3 (!) https://www.pjrc.com/store/teensy3_audio.html).
The USB socket can only be guessed. Since USB socket is symmetrically on the T4 board it is unclear if the Reset button points to the breadboard or towards the audio shield.
Is the reset button located between the T4 and the shield?
Or shall the Reset button point to the breadboard?
 
Status
Not open for further replies.
Back
Top