Forum Rule: Always post complete source code & details to reproduce any issue!
Page 4 of 4 FirstFirst ... 2 3 4
Results 76 to 87 of 87

Thread: Fast Convolution Filtering with Teensy 4.0 and audio board

  1. #76
    Junior Member
    Join Date
    Nov 2019
    Location
    Piedmont, OK
    Posts
    2
    @bmillier: Thanks. That's what I expected but it doesn't hurt to verify.
    @DD4WH: I would not be keeping the convolution portion - though I understand that to be the goal here, my intent is simply to get an array of floats in the frequency domain upon which I will do other work before converting back for output; converting a stereo pair into left, center and right channels. I was under the impression that the partitioning applied to the FFT as well and so was interested in the improved average CPU load more than the improved latency. The closest I will get to convolution will be the application of a raised cosine windowing function. I had worked briefly with the code from the initial post, but as it is in I,Q and not stereo it did not directly suit my needs. I will have a closer look at Brian's audio object and see if that will be better aligned to my purposes. As I am only concerned with the frequency portion and not the phase, a high resolution FFT/iFFT with conversion to Float is really what I'm after. Thank you both!

  2. #77
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    507
    @highly: maybe this is what you need?

    https://forum.pjrc.com/threads/58054-CMSIS-5-3-0-and-CMSIS_DSP-1-7-0-on-teensy-4?p=219628&viewfull=1#post219628

    to be honest, I have not understood from your description what you are looking for ;-)

  3. #78
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185
    @ Frank. You may be right. But I am doing the same 8 operations at once in the "i" loop. The difference is that you do this
    accum[i2] += fftout[k][2 * i + 0] * fmask[j][2 * i + 0] -
    fftout[k][2 * i + 1] * fmask[j][2 * i + 1];

    and I do
    ptr1 = ptr_fftout + kc; // pointer calc
    temp1 = *ptr1;
    ptr2 = ptr_fmask + jc; // pointer calc
    temp2 = *ptr2;
    temp3 = *(ptr1 + 1);
    temp4 = *(ptr2 + 1);
    accum[i2] += ((temp1 * temp2) - (temp3 * temp4));

    I am still doing two multiplies to get accum result. But the compiler has to do multiplication/additions tp calculate each of your fftout, fmask array indices. I am just adding offsets kc, jc to the pointers to these arrays. Each time though the outer loops, I recalculate kc, jc only once per iteration. So, mine should be much faster, but it isn't.
    I'm patching in the CMSIS call to my code now. I haven't got it right yet though.

  4. #79
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185
    Hi Frank: I just replaced all of the code inside the "i" loop in my complex multiply routine by the following:
    ptr1 = ptr_fftout + k512;
    ptr2 = ptr_fmask + j512;
    arm_cmplx_mult_cmplx_f32(ptr1, ptr2, ac2, 256); // ac2 is a temporary holding array
    for (int q = 0; q < 512; q=q+8) {
    accum[q] += ac2[q];
    accum[q+1] += ac2[q+1];
    accum[q+2] += ac2[q+2];
    accum[q+3] += ac2[q+3];
    accum[q+4] += ac2[q+4];
    accum[q + 5] += ac2[q + 5];
    accum[q + 6] += ac2[q + 6];
    accum[q + 7] += ac2[q + 7];
    }
    The CMSIS routine does the complex multiply, but I had to add the code at the end to do the accumulate function. I used the same method of doing 8 accumulates per loop, as you did. Now, my routine executes in 1000 us (verified by 'scope) , a bit faster than the 1280 us you measure on yours.
    So, I am almost 100% certain that the compiler is generating code using the same CMSIS complex multiply routine with your code too.

  5. #80
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185

    Library Code for the uniformly-partitioned FFT convolution filter/cabinet simulator

    Hi Frank: I have uploaded this library code to my Github site, in its own folder:
    https://github.com/bmillier/FFT-Conv...ster/README.md

    I took your demo program, and modified it to use the library function. In the library,I used a few more CMSIS routines and some other fast,low-level copy/clear routines, as well as consolidating several loops into a single loop.
    The execution time is now < 1000 us and the total latency is just 6.8 ms.
    I gave credit to Warren Pratt and yourself in the readme file, as well as in the .cpp/h files.
    What do you think? Should we contact Paul to offer it up as a part of the official Audio library?
    Cheers

  6. #81
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    507
    Hi Brian,

    thats excellent news!!! Congrats! I wont have time to have a look at and test this until the weekend. Very good that you were able to optimize both processor load and latency.

    I will write some info about how to calculate coefficients for minimal phase FIR filtering and maybe I can make a pull request to your github repo readme file this weekend. Yes, this should be useful as an addition to the audio lib!

    All the best,

    Frank DD4WH

  7. #82
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185
    Thanks Frank. I saw the minimal phase FIR filters in your example code, but didn't know if you needed a lot of computing power on an external PC to generate them. I'll look forward to your comments after testing
    Cheers

  8. #83
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    507
    Brian, your code runs and performs very well! Well done, thats very good!

    I added some very small modifications in the example file --> AUDIO_BLOCK_SAMPLES, Serial printing processor load, fixing Serial while command in setup --> should run for your IDE now too!

    I put some short comments on minimum phase FIR coeff calculation using MATLAB into my github readme file.

    Feel free to copy that into your github readme file.

    Have a nice weekend! Frank DD4WH

  9. #84
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185
    @ Frank. Thank you- I'm pleased it ran nicely for you too. I couldn't find the mods mentioned above on my Github site, under Pull requests or Issues. Then I went to your site and found them. The Serial while command is better your way. I usually just leave it out since both the Arduino and VM IDEs re-open the serial monitor immediately after a program download, so you don't lose any initial dubug messages you might have at the beginning, even if you don't wait for Serial available.
    I copied your FIR text to the end of my readme file, although it did not end up with the "bullets" that you used- not sure why.
    I am not sure how to offer this as an official library item to Paul S. I think we'd have to start a new thread with a subject "Convolution Filter offered for official Audio library" to get his attention. What do you think?
    Hope you enjoy your weekend too- but you have 5-6 hours less of it left than I do, being in N.A.!

  10. #85
    Senior Member
    Join Date
    Nov 2012
    Posts
    1,176
    @Brian:
    I downloaded your library just to have a look. It doesn't compile because BUFFER_SIZE is not defined. Looking through previous messages here, I assume that it should be 128.

    Pete

  11. #86
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185
    @ Frank. Thank you- I'm pleased it ran nicely for you too. I couldn't find the mods mentioned above on my Github site, under Pull requests or Issues. Then I went to your site and found them. The Serial while command is better your way. I usually just leave it out since both the Arduino and VM IDEs re-open the serial monitor immediately after a program download, so you don't lose any initial dubug messages you might have at the beginning, even if you don't wait for Serial available.
    I copied your FIR text to the end of my readme file, although it did not end up with the "bullets" that you used- not sure why.
    I am not sure how to offer this as an official library item to Paul S. I think we'd have to start a new thread with a subject "Convolution Filter offered for official Audio library" to get his attention. What do you think?
    Hope you enjoy your weekend too- but you have 5-6 hours less of it left than I do, being in N.A.!

  12. #87
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    185
    @Pete: Good observation. I modified this demo program from Frank DD4WHs original program. In it he defines BUFFER_SIZE = partition size, which is 128. For some reason I removed that line- probably because my older "filter_convolution.h" file has the line
    #define BUFFER_SIZE 128 . However, there it is declared private. I use Visual Micro, and I can see that it is finding the BUFFER_SIZE definition from that .h file. However, my Arduino IDE also shows no compile error, so it must be getting the definition from there too.
    But, you're right, the demo program should have BUFFER_SIZE defined as 128
    Thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •