PDA

View Full Version : Is there a way to disable FFT when it's not in use to free up the CPU?



MrTom
03-21-2016, 06:47 AM
After I've created an FFT1024 and used it, is there a way to disable it while it's not in use to free up the 52% CPU it uses?
I haven't found any type of .disable() function for it.

defragster
03-21-2016, 09:44 AM
There is no published available way I've seen noted, once started - it runs.

I thought I'd want to disable to use that code - and seen others ask. There are times the FFT will tell me something before a process starts, once started I need the CPU cycles for other things.

PaulStoffregen
03-21-2016, 02:15 PM
You could place a mixer before the FFT. Just set the mixer gain to zero.

The mixer and many other objects are programmed to not transmit data when they know their output is silent. All objects are programmed to detect no data arriving at their inputs. Most avoid using CPU time to process known-silent input. Only a few actually continue doing work with no data arriving (like delayExt, which fills the external memory with zeros). But that's those are the rare exceptions. Nearly all of them avoid some or all CPU usage when their inputs get no data.

The mixer also contains code to handle 1.0 gain efficiently. When it's just a single channel at 1.0 gain, the original data is just passed right to the output with virtually no CPU usage.

MrTom
03-22-2016, 03:44 AM
You could place a mixer before the FFT. Just set the mixer gain to zero.

Hmm, that actually has the opposite effect. Setting the gain to zero for the FFT mixer drives the CPU up to 51 constantly. At least before the CPU just spikes to 52 occasionally, it seems to be off most of the time anyway.

Oh well, I'll just leave well enough alone.

DerekR
06-29-2016, 11:51 PM
I'm in the process of cleaning up an asynchronous FFT block for the audio library - full complex fft (256 length) with two inputs, and with "fft shift" operation to reorder the output samples. It works as follows:
1) The synchronous audio-block operations merely save the past two records for the real and imaginary inputs. They do not compute the fft.
2) A separate asynchFFT.computeFFT() function (which can be called at any time) interleaves the real and imaginary data, applies the window function, and computes the FFT on demand, using the most recent saved data blocks.
3) A set of retrieval functions returns magnitude, magnitude^2, phase in q15.1 or float format.

My application is a spectral display for an SDR radio. I only need updates at around 20 Hz, so it makes no sense to tie up the CPU with unused FFTs every 2.9ms! It's working and I am just hammering it to ensure the data integrity...

I'll get back when I feel it's ready for a wider audience :-)

Derek

Xenoamor
06-30-2016, 09:45 AM
I imagine it uses interrupts. Can you just temporarily disable those interrupts?

DerekR
06-30-2016, 10:26 AM
The audio library system in general uses interrupts, which can be turned on and off with
AudioNoInterrupts();
...
AudioInterrupts();
but the FFT code does contain its own interrupt system. I assume that while AudioNoInterrupts() is active, the whole Audio board is disfunctional.

Xenoamor
06-30-2016, 01:23 PM
Meh, just start and stop the ADC from triggering the DMA with:

ADC0_SC2 &= ~(ADC_SC2_ADTRG | ADC_SC2_DMAEN);//Disable
ADC0_SC2 |= ADC_SC2_ADTRG | ADC_SC2_DMAEN; // Enable
You'll probably want to reset the FFT flags.
The proper way to do it would be to use a DMA with a Periodic interrupt to collect all the data you need. Then have it disable itself and trigger an interrupt on completion. You can then service said interrupt

DerekR
06-30-2016, 07:24 PM
Thanks, but the data is streamed in 128 length blocks every 2.9ms (44.1kHz sampling rate) from the Audio board codec line_in inputs through I2S. That same data stream is used for all the DSP functions (and there are many) in my SDR application, and it cannot be touched while running. I do not use the ADCs, although I thought about it to get a higher sampling rate (and thus wider spectral display).

I did use your approach previously with an Arduino Due without using a codec.

Paul has done a fantastic job of providing a basic set of processing blocks (and their interconnections) for audio work without worrying about data acquisition. My AsynchFFT stuff is just an add-on to address a need I have to free up the CPU, and it is running as we speak without faltering yet.

MacroMachines
07-28-2016, 09:21 AM
Paul has done a fantastic job of providing a basic set of processing blocks (and their interconnections) for audio work without worrying about data acquisition. My AsynchFFT stuff is just an add-on to address a need I have to free up the CPU, and it is running as we speak without faltering yet.

Would you be willing to provide your code and post it to the audio library? I am working on something that I want to be able to use the FFT to analyze on demand without taking up CPU when it is not needed. In general I would like to add a way to disable objects directly without needing to add a mixer beforehand. I am beginning on a large project with the audio library and will likely be posting a lot of new features and revisions very soon.

DerekR
08-02-2016, 11:26 PM
Would you be willing to provide your code and post it to the audio library? I am working on something that I want to be able to use the FFT to analyze on demand without taking up CPU when it is not needed. In general I would like to add a way to disable objects directly without needing to add a mixer beforehand. I am beginning on a large project with the audio library and will likely be posting a lot of new features and revisions very soon.


I'll be happy to let you have a copy of what I have right now, but please understand:
1) I am writing an extensive package of general DSP functions to extend the Audio library. The current version of the Async FFT is designed for 256 length complex (quadrature) inputs, and returns the magnitude-squared output. I am rewriting it to use the arm_math radix-2 fft functions, with real and complex inputs, together with varying radix-2 lengths. I have also written (In the past few days FFT functions based on the old tricks of computing two independent real FFTs in a single complex FFT, and computing a single length 2N real FFT in a single length N complex FFT. The latter is considerably faster than the current FFT functions in the Teensy Audio library, and I want to include it in all my FFT functions... (Including a sped-up version of Paul's functions)

2) I have a new (and I think better) way of off-loading FFT (or any low-priority compute intensive) functions from the audio library. I have written a set of "AudioDataGrabber" functions that will return the previous n buffers for real or complex data. The data is kept in a ring buffer, and all the update function does is update the pointers to the head and tail. I currently have written and tested real and complex 128 and 256 length grabbers and am in the process of extending to 512 and 1024 versions. This allows you to do whatever you wish with the data in the loop() function - including FFTs if you wish :-). Very low overhead on the 2.9 msec cycle time of the audio block updates.

3) I have written a set of functions for FFT based filtering/processing/convolution operations with overlap-add output, including demodulation, Hilbert transformers, and general FIR filters.

Other functions that I am working on include adaptive-least-squares filters, decimation/interpolation, and others.

So the point is that all of these things are works-in-progress, and I need to get things organized and extended before making a general release. (I am doing this while recovering from a serious encounter with the big-"C", and still get very-very tired, and so am limited by my endurance)

PM me with you email, letting me know what you would like and I'll package up a zip file and send you the appropriate stuff.

- Derek

DerekR
08-03-2016, 12:44 AM
@Mr. Tom, MacroMachines,
How about this for another possible solution - add another couple of public member functions to the existing audio library FFT functions:
bool enable() - sets a (public) enable flag
bool disable() - resets the enable flag, and maybe release all of the audio blocks (to enable a clean start when enabled again)(see below)
I see two possible scenarios:
1) When disabled, the update() function could simply release the input block and exit (in other words do absolutely nothing), OR
2) When disabled, the update() function could insert the current buffer and update the queue (releasing the oldest member), so that the queue would be current when enabled again. The call to the fft and subsequent processing would be disabled.
The advantage of the second is that the blocks would be primed and ready to go when enabled again, at the expense of a small amount of processing at each iteration.

The read() functions would be disabled while the fft is not enabled (can be debated:-) ).

I'll take a pass at it with the 1024 fft, seeing as this is the one with the most overhead. More later...

- Derek

DerekR
08-03-2016, 10:04 AM
As proposed in my previous post, I have modified analyze_fft1024.h and analyze_fft1024.cpp to include the three new public functions:
void myFFT. enable(void)
void myFFT.disable(void)
bool myFFT.isEnabled(void)
and all seems to be working with the FFT example sketch, although I have yet to check the processor load enabled vs disabled.

What is the best way to package and distribute small mods like this? I no longer have a web site where I can hang a zip file, and I've never used gitHub (and am reluctant to start) :-)

defragster
08-03-2016, 10:18 AM
GitHub is a horrid mess in many ways - at least for me - but it is how things get done. If you did that the diffs are easily seen and read and might be considered, or easily shared.
There is a sketch with CPU use under Audio, did you try using those funcs? Of course some independent measure could be useful too. Would be interesting to be able to see it be disabled.

DerekR
08-04-2016, 09:27 PM
@defragster
FWIW - I added timing to the update function, and the results were entirely as expected:
enabled: 1400 usec every 4th buffer update (state = 7, when the FFT is computed), 1-3 usec on the other three (states = 4,5,6)
disabled: 1-3 usec every buffer update
As I see it, the problem is that the audio stream processing must be designed around the worst case cpu usage to ensure no data loss, which is why I am doing all analysis functions such as FFT externally in the loop() function, using my dataGrabber audio functions.

Talking about processing times, I have developed a new "fasterFFT()" 256 length FFT that uses an old trick of computing a length N real FFT in a complex length N/2. (arm_math seems to have something similar, but I believe it uses newer FFT code that Paul has advised against for Teensy because it is a memory hog). The time savings for length 256 are very significant, and I'm about to code up the 1024 version. I'll pop it into analyze_fft1024 let you know the equivalent timing results as soon as I have them, either here or in a new thread...

Frank B
08-04-2016, 09:44 PM
not really the topic here, but perhaps a 64-length fft would be useful for bargraph displays ?

defragster
08-04-2016, 09:52 PM
DerekR - that sounds promising. I got a code change pulled in the other day using github on the web. I forked it to my online account and then opened the affected file and edited and pasted in my change.

It was this change to String for a memory issue (https://forum.pjrc.com/threads/35688-Something-wrong-with-Adding-a-String-to-itself!?p=111109&viewfull=1#post111109) and it resulted in this pull request (https://github.com/PaulStoffregen/cores/pull/165) when the edit was done I did a PULL to Paul's tree and he reviewed and incorporated it. It went much more smoothly than using GitHub for windows.

The best thing about posting to GitHub is that your forked version is public and you can link and allow others to pre-test it so Paul has some confidence before he considers putting it in the code base.

DerekR
08-05-2016, 02:43 AM
defragster: thanks - I'll have to go away and get my ancient head around github - my specialty is DSP and algorithms, not software production/collaboration.

Now the good news. I just got my "faster" FFT 1024 going and inserted into the Example FFT 1024 sketch.
The update() execution time for state=7 dropped from
1410 usecs using AudioAnalyzeFFT1024 to
1030 usecs using my AudioAnalyzeFasterFFT1024
on the Teensy 3.2, with exactly the same output. And that's just a first pass without even trying to optimize the code using the DSP library.

I think it is probably worth pursuing... Paul, if you read this let me know what you think.

Can't wait to get my hands on a 3.6 to explore more general DSP stuff...
Derek

DerekR
08-05-2016, 03:29 AM
Frank B - we are straying a bit off topic, but yes -my goal is to create a "FFTpak" library with an extensive list of functions: radix-2 real and complex FFTs with lengths 64, 128, 256, 512, and 1024, along with IFFTs, and complex, mag, magsq, logmag (dB) outputs. I also intend to include FFT based filtering/convolution operations with overlap/add outputs to maintain continuity of the output stream.
I have prototypes working for all of these.

Derek

WMXZ
08-05-2016, 08:43 AM
As proposed in my previous post, I have modified analyze_fft1024.h and analyze_fft1024.cpp to include the three new public functions:
void myFFT. enable(void)
void myFFT.disable(void)
bool myFFT.isEnabled(void)
and all seems to be working with the FFT example sketch, although I have yet to check the processor load enabled vs disabled.

What is the best way to package and distribute small mods like this? I no longer have a web site where I can hang a zip file, and I've never used gitHub (and am reluctant to start) :-)

Just read this thread, but to solve the original problem (skipping a lot of data and carry out FFT on lower rate, which is what I understand), instead of modifying the build in FFT it seems better to generate a new ode in the audio library:

ignore_all_but_one (or a better name)
where all incoming packets are dropped, except the one you wanted to pass to subsequent processing (the FFT)
It would have a single parameter (decimation factor)
A factor of 17 would pass a single block every 47.6 ms (21 Hz update rate)

You could also add a second parameter defining the number of consecutive blocks to be passed to linked processing
say skip 16 blocks, pass 2 blocks for a 256 point FFT

This way you could also use the node diagram as a intuitive description of your algorithm