Forum Rule: Always post complete source code & details to reproduce any issue!
Page 3 of 3 FirstFirst 1 2 3
Results 51 to 68 of 68

Thread: Fast Convolution Filtering with Teensy 4.0 and audio board

  1. #51
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    Just to confirm: minimum phase lowpass (real valued) FIR filters built with MATLAB also work well with the code and seem to exhibit low latency (as far as I can measure) and good filter effects. (latest code is on github)

    I built 512 tap, 1024, 2048, and 4096 tap filters and they all have the same 128 sample delay. So this problem is solved, the partitioned convolution does what it should.

    MATLABsī Filterbuilder has some severe restrictions: minimum phase is only possible with real valued lowpass filters and equiripple design, not with complex valued bandpass filters and choosable window [all of which would be needed for SDR IQ filtering]. Additionally calculating a 4096 tap minimum phase filter needs 31 minutes on a standard 2.4GHz laptop!!!

    Now, the last and most complex problem to solve is the following:

    * we need an algorithm that runs on the Teensy 4.0 to calculate COMPLEX bandpass FIR filter coefficients with minimum phase OR
    * an algorithm that is able to transform linear phase coefficients to minimum phase (that runs on the T4) OR
    * MATLAB code to calculate complex bandpass minimum phase filter coefficients for sizes 512, 1024, 2048, 4096 . . .

    Any volunteer or ideas on that would be very welcome!
    Last edited by DD4WH; 11-09-2019 at 10:55 AM.

  2. #52
    Could you explain why 'we need an algorithm that runs on to calculate COMPLEX bandpass FIR filter coefficients...'

  3. #53
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    @tschrama: I see (at least) three different applications for low latency partitioned convolution:

    1.) guitar cabinet impulse resonse (IR) filtering: that is already possible with the existing code with recorded impulse responses of up to 0.49 sec @44.1ksps sample rate (= 21632 taps) length, so all "standard" IR lengths (512 samples, 1024 samples, 170msec, 400msec, 500msec) can be used [the 500ms version has to be cut to 21632 samples, but should work]. However, I have to change the code, because at the moment the IR has to have zero insertion after every sample. I will do that change in the next days, so one can enter the impulse response as it is without having to insert zeros.

    2.) low latency low pass filtering with high number of FIR filter taps: that is already possible, but you have to calculate the coefficients for the FIR filter by yourself. The coefficients have to be minimum phase coeffs in order for the filter to work with low latency. If you use linear phase coefficients, the filter works, but the latency is half the filter length. There has to be zero insertion in the filter coeffs as well, but I will work on that (see above). At the moment, I use the MATLAB FilterBuilder option for calculating minimum phase coeffs, but the filter design is restricted to real valued lowpass equiripple filters and is extremely slow needing 31 minutes for calculating a 4096 tap filter on a 2.4GHz laptop.

    3.) Software defined radio main filtering for the IQ incoming signals: this is not possible with the code at the moment, because it needs complex bandpass FIR coefficients with minimum phase response.

    Sorry for not being clear enough in my last post, but I was talking only about the third option needing complex coeffs :-).

  4. #54
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    The partitioned convolution code has been changed now (is on github), so that any recorded impulse responses can be directly put into the code (without zero insertion).

    If you have impulse response files as WAV files, a quick and easy way to get them into the code is to use Audacity --> Analyze --> Sample Data Export --> specify number of samples to export and Linear measurement scale. Then put the coeffs into an .h-file and use them with the code :-). [use replace function in WORD or any other program to insert commas: BTW: ^p is the sign for a line change]

  5. #55
    I dont see the problem. You can take, measure, the impulse response of any filter and feed it to a convolutor.

  6. #56
    and why do those coeeficients need to be calculated on a T4? Why not store your IR as a file on a Teensy?

  7. #57
    ah, I see, software defined radio... needs those complex coeeficients and minimim phase.

    But radio isnt bothered by latency, is it? so why are you trying to use a method which is designed for low latency?

  8. #58
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    I see you already answered some of your questions by yourself :-).

    Here are two possible answers to your "latency in SDR" question:

    https://www.n1kdo.com/sdr-delay-measured/

    https://en.wikipedia.org/wiki/QSK_op...full_break-in)

  9. #59
    Thanks! I know w a thing or two about Matlab, IRs and guitar.. but please excuse my total lack of radio knowledgeble.
    Last edited by tschrama; 11-10-2019 at 08:50 PM.

  10. #60
    removed double post...

  11. #61
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    170
    @Frank. Congratulations- looks like you solved all of the issues, except the one you need for SDR.
    I wondered if you were planning on embedding this routine into a Teensy Audio library object, like I did with your older routine?
    Cheers

  12. #62
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    Yes, it seems all options work, except the one I am most interested in and which I was originally targeting ;-). If there is any volunteer to put the code into an audio lib LOW LATENCY IMPULSE RESPONSE PLAYER object, that would be perfect!
    I never dealt with audio lib objects, so I am a total beginner in that respect. Also one would have to decide whether this object only works on a T4, and how and where one would store the impulse responses. Additionally: should the algorithm be in Stereo, like it is now? Or should it be MONO? In the latter case, the algorithm would have to be altered internally, which would require some work, but would enable about double the size of IRs [ie. a bit less than one second at 44.1ksps on a Teensy 4.0, with external RAM this could probably be extended a bit, but then processor load would come into play even when overclocked, so more than 2 seconds will probably not be possible even with overclocked T4 with external RAM]

  13. #63
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    170
    Hi Frank: Yes, I thought you were really interested in the SDR filters, and was a bit surprised when you drifted into the guitar cabinet simulation aspect of it. The fact that the guitar IR files were min. latency did help you make progress with the routine, which was a nice coincidence. For the library object, I think it would be possible to use a T3.6 if one kept the tap length shorter, to fit 3.6 smaller memory. I was thinking exactly the same as you that for guitar cab. simulation, the signal is mono, so it would make sense to code it mono, and increase the max taps by 2. I don't think external RAM would be an option. The iMxrt MCU could handle external sram using a parallel address/data bus, but all of those pins are not present on the T4. While SPI, QSPI SRAM is possible, it would be too slow for this application, I think. But the 1 sec impulse length is perfectly fine for cabinet simulation & convolution reverb would work for times < 1sec OK using your routine (or something like it). Really "lush" reverb is generally several seconds long and that is out of reach for T4.
    There are some fellows on the forum who could probably embed your code into a Teensy audio library object, as well as tailor it so that it would compile differently to match either a T3.6 or T4. I don't have that expertise currently, but I did code your original routine into a Teensy lib. object, so I guess I could do a T4 version for this one as well. Personally, I think that to be really useful for guitar cabinet simulation, one really has to be able to pick the desired IR file at run-time, probably from an SD card, and load the impulse into sram- instead of having 1 or more hard-coded files in program flash. That is how I did it with the lib object I wrote for your original routine.
    If one of the real programming "Experts" steps up to do the library coding, I'll defer to them. If not, I can certainly give it a try.
    Cheers

  14. #64
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    Hmm, I have gone through the variables again and I fear we cannot save memory when going to a MONO version. These are the variables used in the Stereo version:

    FFT length is 256
    partitionsize is 128
    nfor is number of blocks --> 169 for an IR of 21632 taps

    Code:
    const float32_t PROGMEM impulse_response[21632];
    float32_t DMAMEM maskgen[512]; // SAME FOR MONO
    float32_t DMAMEM fmask[169][512]; // SAME FOR MONO, however maybe one could cut this into half by intelligent file management: BUT I think that would not make a difference, because this variable has to be either in DMAMEM or in RAM1 . . . so we cannot partition equally anymore
    float32_t DMAMEM fftin[512]; // SAME FOR MONO
    float32_t DMAMEM accum[512]; // SAME FOR MONO
    float  fftout[169][512]; //  SAME FOR MONO, because output of a real-to-complex FFT would have the same size as this complex-to-complex FFT
    
    float32_t DMAMEM float_buffer_L [128];  //  SAME FOR MONO
    float32_t DMAMEM float_buffer_R [128]; //  SAME FOR MONO
    float32_t DMAMEM last_sample_buffer_L [128];  //  SAME FOR MONO
    float32_t DMAMEM last_sample_buffer_R [128]; //  SAME FOR MONO
    We have two very large variables, one is in RAM1 [fftout], the other is in RAM2 [fmask]. So if we cut one of them in half, we still have the other array which has the large size and fills up one part of the RAM. So I do not think its worth coding a MONO version, what do you think?

    I am not sure about where to put the IRs. Maybe for the T4 (no SD card!) hardcoding a nice selection of useable IRs in FLASH is sufficient for a on-stage-realtime-version of the guitar cabinet simulator? However, for the T3.6, you are right, using SD card would be useful. Maybe I will test whether the code will run on the T3.6 and how many max taps it can use . . .

  15. #65
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    170
    Hi Frank: Since I wasn't the one who wrote the code, I neglected to think about 1) the filter mask has to be the same size, mono or stereo and 2) that the memory was in 2 distinct blocks (although I only finished my Soundfont synthesizer project a month ago, and had to do the same juggling to get all of its large arrays to fit in using DMA memory). So, you are right, of course, if one went to Mono & tried to increase the number of taps further, the complex filter mask would be the biggest array and would still be in the DMA block. But, if you moved a few of the other DMAMEM arrays out into the "normal" sram, it would free up space in DMAMEM for a larger filter mask. Since FFTout should only be 1/2 the size, due to only 1 channel, I think maybe one could increase the tap size significantly, but not double as I assumed. The other consideration here is that if your routine was made into a Teensy library, the chances are good that other users would want to add other audio library function blocks to the program, for other features. At that point, audio block memory would need to increase. So, if for no other reason, if you went to Mono, the shrinking of the FFTOUT array by 2 would free up some space in "normal" sram for use by the library Audio blocks.
    The IR storage is the other issue. Right now your .h files have the IR array as constants, so they should reside in program flash. You are only loading one of them now, so the space taken up by the others is not used now. I had loaded your 08/11/2019 version from Github- when I compile it, it uses only 105,120 bytes of Flash. So, you could easily add many more IR .h files without filling up the 2 Meg program flash. I know that the T4 transfers the program itself from program flash to sram, at bootup. I would assume that any constants in program flash would also get transferred to sram too, BUT I AM NOT CERTAIN OF THIS. Your 08/11/2019 version, which includes only 1 fixed filter mask of 16384 (Marshall 197 impulse response from cabIR.eu) takes 386,640 bytes of sram. I don't know whether the 386640 figure includes the IR constants or not. If it does, you wouldn't be able to store many IR files in Flash at once, without overflowing the sram once they were transfered (assuming that they are transferred from flash to sram).
    That is why I like the SD card route. People using the routine as a library object might not want to go into the library code to specify which .h file to use, or for that matter, want to translate an off-the-shelf IR WAV file into a form that would be suitable for a .h file. The demo program that I wrote for my convolution library will accept an IR file in WAV format, and import it directly into the coefficient array ( although my demo only read in 513 taps (strictly speaking you need FFT/2 + 1 samples for an FIR filter).
    Even though the T4 doesn't have the SD card built in, it has the IO pins needed to add an SD card. Also, the Audio Shield itself has an SD card socket on it, and chances are most people, including myself, use that for the CODEC.
    That said, if one could fit 1.5+ MB of IR files into T4 program flash, and if constants are not transferred to sram on bootup, then your method has a lot of advantages.
    Cheers

  16. #66
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    491
    I have been playing around with the code a couple of days on two hardware setups:

    * T4 and DAC PCM5102a & ADC PCM1808
    * T3.6 and Teensy audio shield rev B

    I have also tried T4 and the Teensy audio shield, but I failed again in getting this setup to run, I get drizzle noise again in all situations.

    Also, now even in the T3.6 and the Teensy audio shield I cannot find a configuration that is satisfying, I always get drizzling noise/artefacts. I already soldered a 100 Ohm resistor into the MCLK line on the audio shield itself, but still the problem is not solved, although I use proper 1cm headers to connect T3.6 and my audio shield. (Yes, I use a stereo microscope for soldering and checking all solder connections ;-))

    I begin to think I have a faulty Teensy audio shield, maybe . . .

    The good thing is, that I found out that the code also can run on the Teensy 3.6 for IRs up to a length of about 7552 taps.

    With the T4, it can now process IRs with a length of up to about 24000 taps, that is a little more than 500msec and has a processor load of 50% with that length.

    So, with the frustrating hardware issues continuing, I will stop development of the low latency partitioned convolution code now, because I cannot proceed further at the moment.

    Mainly because I have no access to the SD card (which would be necessary for loading IRs):

    * with Teensy 3.6 (SD card) I cannot eliminate the noise
    * and with the T4.0 setup the audio is nice, but I cannot use SD card (because my T4.0 will not work with my Teensy audio shield and I have no chance to solder an SD card holder to my T4, did anybody do that already?)

    So I feel I cannot develop this further now, until I have access to a working audio shield that will play with the T4 (rev D audio shields are impossible to obtain in Central Europe at the moment [even Digikey does not have it in stock] and it is unclear to me whether they will exhibit the same noise problems as my rev B audio shield with the T3.6.)

    So, everybody feel free to use the code to build a convolution object or whatever you want. The latest and optimized code is on github.

    @Brian, thanks for your thoughts and comments and measurements and all the help!
    Last edited by DD4WH; Yesterday at 09:48 PM.

  17. #67
    Senior Member
    Join Date
    Nov 2017
    Location
    Belgium
    Posts
    138
    @DD4WH
    Antratek has revD audio shields in stock. They ship from the Netherlands if I'm not mistaken.
    https://www.antratek.com/teensy4-audio-board

  18. #68
    Senior Member bmillier's Avatar
    Join Date
    Apr 2016
    Location
    Halifax, N.S. Canada
    Posts
    170
    Hi Frank: You're very welcome!
    It would seem like you must have a bad Teensy Audio shield. It definitely should work with the audio shield mounted directly onto the T3.6 via short headers. I assume you are testing it with something simple like the audio pass-through demo, thus eliminating any code errors. (although you can get things working with your alternate ADC/DAC, so the code must be OK)
    My old audio shield works great on the T4- you can see the length of my wiring in post #14. I only use a 100 ohm resistor on MCLK.
    For your conv. filter testing, I just used the SD card socket on the audio shield. But,for my T4 Soundfont Synth, I used only a PT8211 DAC. For the SD card socket, I hand-wired one up from the footprint on the T4 PCB bottom. I had to use wire-wrap wire for this, but the wiring is about 11 cm long, and that works fine.
    Click image for larger version. 

Name:	Figure5.jpg 
Views:	0 
Size:	68.2 KB 
ID:	18202
    Maybe you'll get a new Rev D board from neurofun's vendor suggestion.
    I'll have to take a look at your latest github code. I guess you must be conditionally compiling for the T3.6/T4 as I thought the T3.6 had no separate DMA RAM section, indeed much less ram period.
    Cheers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •