Forum Rule: Always post complete source code & details to reproduce any issue!
Page 2 of 2 FirstFirst 1 2
Results 26 to 36 of 36

Thread: Audio Library and 4096 point FFT on Teensy 3.6

  1. #26
    Junior Member
    Join Date
    Sep 2016
    Posts
    4
    Hi!

    I've just made a quick test with the 4096FFT (using the files goldhausen has provided). It seems to work ok with the ILI9341_t3n Library (using frame buffer) I get 22 FPS.
    I'm using the Audio Library and have no issues with using the 4096FFT.

    In my project I would like to visualize the audio spectrum pre and post filtering (20 Hz ... 20kHz).
    As you can see on the picture, my frequency-axis is not linear. I try to get better resolution especially in the lower frequency range (20 Hz ... 320 Hz), therefore the 4096FFT gets handy with a Frequency Bin resolution of ca. 11 Hz.
    Although this works, I agree with DD4WH, the "ZoomFFT" would be a much better approach to solve this "low-frequency" problem.

    Is there a "ready to use" ZoomFFT library available that works with the audio library?

    Click image for larger version. 

Name:	AudioAmp_Teeny_3.6_FFT4096.jpg 
Views:	9 
Size:	147.8 KB 
ID:	16937

  2. #27
    Junior Member
    Join Date
    Jul 2019
    Posts
    2
    I'm using this to drive an audio spectrum analyzer that outputs to a 128x64 rgb led matrix with 128 bands displayed. I wanted more resolution at the low end (0 to 200hz) and this allowed that to happen.
    Since I am only viewing the full audio spectrum as bin intensity, not listening to individual bins, it works for me.
    I ran a frequency sweep (20 to 20k) and it displayed the peak properly, so the fft appears to be working correctly.



    I get about 10 fps update on the led at 240mhz, so it is slower than I would want, but I get lower frequency discrimination.

    This still uses the complex functions for fft (cfft radix4 q15). Looking at the code in cmsis 4.5, it was running at 4096 all of the time, just stepping down for lower sizes. I don't see any difference in the display response characteristics with 4096 vs 1024 (except that caused by having 4x bins) - that confirms to me that it is the same functions being used.

  3. #28
    Junior Member
    Join Date
    Sep 2016
    Posts
    4
    Hi!
    I've just run into a problem with the 4096FFT. Everything works fine as long as I use only one instance of the 4096FFT.
    As soon as I connect a second FFT to the Input "AudioConnection patchCord (i2sInput, 0, fftPre, 0);" both FFTs don't work anymore.
    I get some strange artifacts (different Spectrum although I put in the same Signal - Sinewave 380Hz).
    Does someone know what could be the reason for this behavior and how I can fix this issue?

    Running only one 4096FFT:
    Click image for larger version. 

Name:	Teensy_3.6_1x_4096FFT.jpg 
Views:	3 
Size:	124.4 KB 
ID:	16965

    Running two 4096FFTs:
    Click image for larger version. 

Name:	Teensy_3.6_2x_4096FFT.jpg 
Views:	3 
Size:	123.8 KB 
ID:	16966

    Initialisation Code:
    Code:
    AudioMemory(80);
    fftPre.windowFunction (AudioWindowHanning4096);
    fftPost.windowFunction (AudioWindowHanning4096);
    Spectrum Drawing Function:
    Code:
    for (int x = 0; x < 230; x++)
    {
      static int xPrev = 0, xTemp = 0;
      static float pre0 = 0, pre1 = 0, post0 = 0, post1 = 0;
      int16_t bin     = (20 * powf (4, (float) x     / 46)) / 10.75;
      int16_t binPrev = (20 * powf (4, (float) xPrev / 46)) / 10.75;
      int16_t binTemp = (20 * powf (4, (float) xTemp / 46)) / 10.75;
        
      if (bin != binPrev)
      {
        if (x > 3)
        {
          pre0 = getLevel (fftPre.read(binTemp, binPrev), x);
          pre1 = getLevel (fftPre.read(binPrev, bin), x);
          post0 = getLevel (fftPost.read(binTemp, binPrev), x);
          post1 = getLevel (fftPost.read(binPrev, bin), x);
          //tft.drawLine (xPrev + 25, 77 + pre0, x + 25, 77 + pre1, CYAN);
          tft.drawLine (xPrev + 25, 77 + post0, x + 25, 77 + post1, MAGENTA);
        }
        else
        {
          pre0 = getLevel (fftPre.read(1), x);
          pre1 = getLevel (fftPre.read(1, 2), x);
          post0 = getLevel (fftPost.read(1), x);
          post1 = getLevel (fftPost.read(1, 2), x);
          //tft.drawLine (0 + 25, 77 + pre0, 2 + 25, 77 + pre1, CYAN);
          tft.drawLine (0 + 25, 77 + post0, 2 + 25, 77 + post1, MAGENTA);
        }
        xTemp = xPrev;
        xPrev = x;
      }
    }

  4. #29
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,019
    Perhaps put in memoryCPUusage monitoring checks - do with One then Both fft_#_4096 enabled

    What if the second FFT is made as 1024 or something less intense?

    In the boards.txt file is a commented 256 MHz speed option for T_3.6 - uncomment these two lines with removal of '#':
    Code:
    #teensy36.menu.speed.256=256 MHz (overclock)
    #teensy36.menu.speed.256.build.fcpu=256000000

  5. #30
    Junior Member
    Join Date
    Sep 2016
    Posts
    4
    Ok, I've changed the second FFT to 1024 and that works fine. But this is not really a solution for me, because I really need both Signal displayed in "high resolution".
    After changing the CPU Clock Speed to 256 MHz I got the following Error:
    #error "This CPU Clock Speed is not supported by the Audio library";
    Do you know the right values for MCLK_MULT and MCLK_DIV for this CPU frequency?
    Are there some other critical options I can try to change to get it to work?

  6. #31
    Senior Member
    Join Date
    Nov 2012
    Posts
    1,134
    Try increasing the AudioMemory. Maybe AudioMemory(160);

    Pete

  7. #32
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,019
    Post #29 was for diagnostic purposes - the 1024 FFT just shows the system is and can run the 4096 in parallel with 1024, a good sign.

    Bad suggestion on the 256 MHz - apparently the math for those clocks hasn't been worked out - not sure if it can be worked out.

    Along with @el_supremo's - the FIRST part of the p#29 asked to integrate the "memoryCPUusage monitoring checks" - that will show if there is a memory or CPU cycle deficit. There is an example sketch that exposes and prints those values - some few lines of code copied into loop and printing 1 time per second will give some good info.

  8. #33
    Junior Member
    Join Date
    Sep 2016
    Posts
    4
    Here are the results of the CPU and Memory usage:

    Only one 4096FFT running:
    Code:
    fftPre = 0 (0 max), fftPost = 77 (78 max), All = 78.71 (79.56 max), Memory: 19 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.43 (79.56 max), Memory: 31 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 26 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.41 (79.56 max), Memory: 24 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.41 (79.56 max), Memory: 24 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.41 (79.56 max), Memory: 24 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.46 (79.56 max), Memory: 24 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.46 (79.56 max), Memory: 24 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.46 (79.56 max), Memory: 24 (37 max)
    fftPre = 0 (0 max), fftPost = 77 (78 max), All = 78.79 (79.56 max), Memory: 19 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.43 (79.56 max), Memory: 31 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 26 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.45 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.44 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 77 (78 max), All = 78.81 (79.56 max), Memory: 19 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 31 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 26 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.45 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 19 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.43 (79.56 max), Memory: 30 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.43 (79.56 max), Memory: 25 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.45 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.45 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.44 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 77 (78 max), All = 78.74 (79.56 max), Memory: 19 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 31 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.42 (79.56 max), Memory: 26 (37 max)
    fftPre = 0 (0 max), fftPost = 0 (78 max), All = 1.44 (79.56 max), Memory: 23 (37 max)
    fftPre = 0 (0 max), fftPost = 77 (78 max), All = 78.77 (79.56 max), Memory: 19 (37 max)
    Two 4096FFT running:
    Code:
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.47 (7.37 max), Memory: 23 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.46 (6.73 max), Memory: 23 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.47 (6.93 max), Memory: 24 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.46 (6.88 max), Memory: 24 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (6.98 max), Memory: 25 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.06 max), Memory: 25 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.23 max), Memory: 26 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.37 max), Memory: 26 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.03 max), Memory: 27 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.53 max), Memory: 27 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.05 max), Memory: 28 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.32 max), Memory: 28 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.14 max), Memory: 29 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.09 max), Memory: 29 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (6.88 max), Memory: 30 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.19 max), Memory: 30 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (6.99 max), Memory: 31 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.46 (6.98 max), Memory: 31 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.36 max), Memory: 32 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.34 max), Memory: 32 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.08 max), Memory: 33 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.10 max), Memory: 33 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.33 max), Memory: 34 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.59 max), Memory: 34 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.53 max), Memory: 24 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.45 (7.73 max), Memory: 25 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.45 (7.45 max), Memory: 25 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.14 max), Memory: 26 (37 max)
    fftPre = 0 (79 max), fftPost = 0 (78 max), All = 1.44 (7.55 max), Memory: 26 (37 max)
    As it seems, one FFT takes ca. 80% of the CPU usage, therefore it's obvious that 2 FFTs are going to exceed the total CPU usage (more than 100%).
    I've changed the AudioMemory to 160 but got the same result. I think memory is not the problem in this case.

    Is there a possibility to split the FFT processing functions or optimize the code to increase the efficiency?

  9. #34
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,019
    CPU busy is what I expected/feared ... and shows memory wasn't an issue as allocated. Would be interesting to see those numbers for the combined 4096 with 1024 FFT combo, if 1024 is 4X faster it would explain it fitting in the other 20% leftover.

    IIRC the FFT works in (4 sets of 256B buffers? - for 4096 maybe 4X with more buffers in play) steps as buffers fill then does one big hit to get the final calculations. If that is indeed the case and somehow the two could be run out of phase they might share better? That might allow end use processing to also work in turn, or out of phase - but may skew the results if the two need sample time synchronization.

    There was also alternate algorithm work by one forum user performing partial calc's on each of those 4 steps to spread the workload - reducing the hit on the final pass.
    by @KPC :: pjrc.com/threads/27905-1024-point-FFT-30-more-efficient-(experimental)

    There is hope the Teensy 4 will ship soon enough and all else being sufficient the extra processing speed would more than double given the speed increase and more advanced design/resources.

    If the code can be shared and run with minimal hardware ( audio board and ili9341 ) perhaps myself or another beta user could run the code that gave the above results to see what comes of CPU usage.

  10. #35
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,249

    better multi-block processing?

    Alternatively to partial calculation of very large FFT's (here fft4096) one could run the FFT on a lower priority level (triggered also by SW interrupt). This would need an extension of the audio library, which could be an idea to allow easier multi-block processing.
    (&defragster BTW: the 1024 point FFT is done every 8th block, so a 4096 point FFT would be done every 32th block)

  11. #36
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    403
    As far as I can see, there is no need for such a large (and processor-intense) FFT. Of course the T4 could help you, but I am pretty sure that by reducing the load you can do that on a T3.6

    Your visualization only uses (at most) 128 points, so you would only need a much smaller FFT, say 256 points [producing a resolution of 172Hz, which is more than enough for your spectrum analyser [for the frequencies above about 700Hz]. The needed resolution for the lower frequencies (< 700Hz) could be produced by a ZoomFFT.

    Sorry to say, there is no ready-to-use library for that. You would have to lowpass-filter your audio at the desired frequency (689Hz) and then decimate with your desired factor (decimation-by-32 produces a resolution of 5Hz and corresponds with your cutoff freq at 689Hz, if I did the math correctly). Then perform a 256-point-FFT on the decimated audio [which is now restricted to DC-689Hz and comes at a sample rate of 44.1ksps / 32 = 1.38ksps] and you get 128 real samples with magnitudes in a resolution of 5.4Hz (which is also the maximum number of frames per second for your display).

    In a second step you could speed up the frame rate by using an overlap in your input samples [sliding window] and perform both FFTs at 44.1ksps, then you can have the full frame rate AND full resolution. The processor load comparison would then be: FFT4096 OR [2 x FFT256 and lowpass filter] --> the winner in efficiency would be the ZoomFFT, I think.

    see here for an explanation:

    https://github.com/df8oe/UHSDR/wiki/...ode-=-Zoom-FFT

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •