Forum Rule: Always post complete source code & details to reproduce any issue!
Page 4 of 4 FirstFirst ... 2 3 4
Results 76 to 82 of 82

Thread: Fast Convolution Filtering with Teensy 4.0 and audio board

  1. #76
    Junior Member
    Join Date
    Nov 2019
    Location
    Piedmont, OK
    Posts
    2
    @bmillier: Thanks. That's what I expected but it doesn't hurt to verify.
    @DD4WH: I would not be keeping the convolution portion - though I understand that to be the goal here, my intent is simply to get an array of floats in the frequency domain upon which I will do other work before converting back for output; converting a stereo pair into left, center and right channels. I was under the impression that the partitioning applied to the FFT as well and so was interested in the improved average CPU load more than the improved latency. The closest I will get to convolution will be the application of a raised cosine windowing function. I had worked briefly with the code from the initial post, but as it is in I,Q and not stereo it did not directly suit my needs. I will have a closer look at Brian's audio object and see if that will be better aligned to my purposes. As I am only concerned with the frequency portion and not the phase, a high resolution FFT/iFFT with conversion to Float is really what I'm after. Thank you both!

  2. #77
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    504
    @highly: maybe this is what you need?

    https://forum.pjrc.com/threads/58054-CMSIS-5-3-0-and-CMSIS_DSP-1-7-0-on-teensy-4?p=219628&viewfull=1#post219628

    to be honest, I have not understood from your description what you are looking for ;-)

  3. #78
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    182
    @ Frank. You may be right. But I am doing the same 8 operations at once in the "i" loop. The difference is that you do this
    accum[i2] += fftout[k][2 * i + 0] * fmask[j][2 * i + 0] -
    fftout[k][2 * i + 1] * fmask[j][2 * i + 1];

    and I do
    ptr1 = ptr_fftout + kc; // pointer calc
    temp1 = *ptr1;
    ptr2 = ptr_fmask + jc; // pointer calc
    temp2 = *ptr2;
    temp3 = *(ptr1 + 1);
    temp4 = *(ptr2 + 1);
    accum[i2] += ((temp1 * temp2) - (temp3 * temp4));

    I am still doing two multiplies to get accum result. But the compiler has to do multiplication/additions tp calculate each of your fftout, fmask array indices. I am just adding offsets kc, jc to the pointers to these arrays. Each time though the outer loops, I recalculate kc, jc only once per iteration. So, mine should be much faster, but it isn't.
    I'm patching in the CMSIS call to my code now. I haven't got it right yet though.

  4. #79
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    182
    Hi Frank: I just replaced all of the code inside the "i" loop in my complex multiply routine by the following:
    ptr1 = ptr_fftout + k512;
    ptr2 = ptr_fmask + j512;
    arm_cmplx_mult_cmplx_f32(ptr1, ptr2, ac2, 256); // ac2 is a temporary holding array
    for (int q = 0; q < 512; q=q+8) {
    accum[q] += ac2[q];
    accum[q+1] += ac2[q+1];
    accum[q+2] += ac2[q+2];
    accum[q+3] += ac2[q+3];
    accum[q+4] += ac2[q+4];
    accum[q + 5] += ac2[q + 5];
    accum[q + 6] += ac2[q + 6];
    accum[q + 7] += ac2[q + 7];
    }
    The CMSIS routine does the complex multiply, but I had to add the code at the end to do the accumulate function. I used the same method of doing 8 accumulates per loop, as you did. Now, my routine executes in 1000 us (verified by 'scope) , a bit faster than the 1280 us you measure on yours.
    So, I am almost 100% certain that the compiler is generating code using the same CMSIS complex multiply routine with your code too.

  5. #80
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    182

    Library Code for the uniformly-partitioned FFT convolution filter/cabinet simulator

    Hi Frank: I have uploaded this library code to my Github site, in its own folder:
    https://github.com/bmillier/FFT-Conv...ster/README.md

    I took your demo program, and modified it to use the library function. In the library,I used a few more CMSIS routines and some other fast,low-level copy/clear routines, as well as consolidating several loops into a single loop.
    The execution time is now < 1000 us and the total latency is just 6.8 ms.
    I gave credit to Warren Pratt and yourself in the readme file, as well as in the .cpp/h files.
    What do you think? Should we contact Paul to offer it up as a part of the official Audio library?
    Cheers

  6. #81
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    504
    Hi Brian,

    thats excellent news!!! Congrats! I wont have time to have a look at and test this until the weekend. Very good that you were able to optimize both processor load and latency.

    I will write some info about how to calculate coefficients for minimal phase FIR filtering and maybe I can make a pull request to your github repo readme file this weekend. Yes, this should be useful as an addition to the audio lib!

    All the best,

    Frank DD4WH

  7. #82
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    182
    Thanks Frank. I saw the minimal phase FIR filters in your example code, but didn't know if you needed a lot of computing power on an external PC to generate them. I'll look forward to your comments after testing
    Cheers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •