Forum Rule: Always post complete source code & details to reproduce any issue!
-
@bmillier: Thanks. That's what I expected but it doesn't hurt to verify.
@DD4WH: I would not be keeping the convolution portion - though I understand that to be the goal here, my intent is simply to get an array of floats in the frequency domain upon which I will do other work before converting back for output; converting a stereo pair into left, center and right channels. I was under the impression that the partitioning applied to the FFT as well and so was interested in the improved average CPU load more than the improved latency. The closest I will get to convolution will be the application of a raised cosine windowing function. I had worked briefly with the code from the initial post, but as it is in I,Q and not stereo it did not directly suit my needs. I will have a closer look at Brian's audio object and see if that will be better aligned to my purposes. As I am only concerned with the frequency portion and not the phase, a high resolution FFT/iFFT with conversion to Float is really what I'm after. Thank you both!
-
Senior Member
@highly: maybe this is what you need?
https://forum.pjrc.com/threads/58054-CMSIS-5-3-0-and-CMSIS_DSP-1-7-0-on-teensy-4?p=219628&viewfull=1#post219628
to be honest, I have not understood from your description what you are looking for ;-)
-
Senior Member
@ Frank. You may be right. But I am doing the same 8 operations at once in the "i" loop. The difference is that you do this
accum[i2] += fftout[k][2 * i + 0] * fmask[j][2 * i + 0] -
fftout[k][2 * i + 1] * fmask[j][2 * i + 1];
and I do
ptr1 = ptr_fftout + kc; // pointer calc
temp1 = *ptr1;
ptr2 = ptr_fmask + jc; // pointer calc
temp2 = *ptr2;
temp3 = *(ptr1 + 1);
temp4 = *(ptr2 + 1);
accum[i2] += ((temp1 * temp2) - (temp3 * temp4));
I am still doing two multiplies to get accum result. But the compiler has to do multiplication/additions tp calculate each of your fftout, fmask array indices. I am just adding offsets kc, jc to the pointers to these arrays. Each time though the outer loops, I recalculate kc, jc only once per iteration. So, mine should be much faster, but it isn't.
I'm patching in the CMSIS call to my code now. I haven't got it right yet though.
-
Senior Member
Hi Frank: I just replaced all of the code inside the "i" loop in my complex multiply routine by the following:
ptr1 = ptr_fftout + k512;
ptr2 = ptr_fmask + j512;
arm_cmplx_mult_cmplx_f32(ptr1, ptr2, ac2, 256); // ac2 is a temporary holding array
for (int q = 0; q < 512; q=q+8) {
accum[q] += ac2[q];
accum[q+1] += ac2[q+1];
accum[q+2] += ac2[q+2];
accum[q+3] += ac2[q+3];
accum[q+4] += ac2[q+4];
accum[q + 5] += ac2[q + 5];
accum[q + 6] += ac2[q + 6];
accum[q + 7] += ac2[q + 7];
}
The CMSIS routine does the complex multiply, but I had to add the code at the end to do the accumulate function. I used the same method of doing 8 accumulates per loop, as you did. Now, my routine executes in 1000 us (verified by 'scope) , a bit faster than the 1280 us you measure on yours.
So, I am almost 100% certain that the compiler is generating code using the same CMSIS complex multiply routine with your code too.
-
Senior Member
Library Code for the uniformly-partitioned FFT convolution filter/cabinet simulator
Hi Frank: I have uploaded this library code to my Github site, in its own folder:
https://github.com/bmillier/FFT-Conv...ster/README.md
I took your demo program, and modified it to use the library function. In the library,I used a few more CMSIS routines and some other fast,low-level copy/clear routines, as well as consolidating several loops into a single loop.
The execution time is now < 1000 us and the total latency is just 6.8 ms.
I gave credit to Warren Pratt and yourself in the readme file, as well as in the .cpp/h files.
What do you think? Should we contact Paul to offer it up as a part of the official Audio library?
Cheers
-
Senior Member
Hi Brian,
thats excellent news!!! Congrats! I wont have time to have a look at and test this until the weekend. Very good that you were able to optimize both processor load and latency.
I will write some info about how to calculate coefficients for minimal phase FIR filtering and maybe I can make a pull request to your github repo readme file this weekend. Yes, this should be useful as an addition to the audio lib!
All the best,
Frank DD4WH
-
Senior Member
Thanks Frank. I saw the minimal phase FIR filters in your example code, but didn't know if you needed a lot of computing power on an external PC to generate them. I'll look forward to your comments after testing
Cheers
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules