1024 point FFT, 30% more efficient (experimental)

Status
Not open for further replies.
I had placed my FREQUENCY in the calculation rather than NN - sample count.

That and switching to AudioWindowBlackmanHarris1024 - I get this on PJRC/sinewave:
Freq:100.00 CalcFreq:99.93 bMax=33 bMaxId=2 bStart=0 bAscDur=2 bDecDur=2 bSecond= 0
Freq:200.00 CalcFreq:199.91 bMax=33 bMaxId=5 bStart=2 bAscDur=3 bDecDur=1 bSecond= 0
Freq:400.00 CalcFreq:400.12 bMax=34 bMaxId=9 bStart=7 bAscDur=2 bDecDur=2 bSecond= 0
Freq:800.00 CalcFreq:799.94 bMax=32 bMaxId=19 bStart=16 bAscDur=3 bDecDur=1 bSecond= 0
Freq:1600.00 CalcFreq:1600.07 bMax=34 bMaxId=37 bStart=35 bAscDur=2 bDecDur=1 bSecond= 0
Freq:3200.00 CalcFreq:3200.11 bMax=34 bMaxId=74 bStart=72 bAscDur=2 bDecDur=2 bSecond= 0

I'll save my sketch and output and then repeat with KPC_FFT code shortly.
 
[edit]: Good News it compiles fine - and . . .
I am getting the same Good results between the PJRC and KPC code: as far as I can tell by the stats at this point, and the apparent efficiency improvement from the measure I have is 5% [calls to loop() between myFFT.available() running the same code on each. Below are what I see as comparable, followed by a legend. View attachment FFT_STAT_PJRC.txt View attachment T_FFT_SINEWAVE_edit.ino View attachment FFT_STAT_KPC.txt

Sketch uses 96,752 bytes (36%) of program storage space. Maximum is 262,144 bytes.
Global variables use 13,568 bytes (20%) of dynamic memory, leaving 51,968 bytes for local variables. Maximum is 65,536 bytes.
... I:\Teensy16\hardware\teensy\avr\libraries\Audio\analyze_fft1024.cpp ...

//PJRC AudioWindowBlackmanHarris1024
48 ]FFT:<91_@0>8 26 34 20
Freq:4000.00 CalcFreq:3999.91 bMax=34 bMaxId=93 bStart=91 bAscDur=2 bDecDur=1 bSecond= 0
FFT1024=51,52 all=53.09,54.00 Memory: 4,8 Ready:14400 Not:55713162 # Samples:160 Sample Time:1857 Time per sample:11 % Eff=1.00

Sketch uses 99,440 bytes (37%) of program storage space. Maximum is 262,144 bytes.
Global variables use 15,624 bytes (23%) of dynamic memory, leaving 49,912 bytes for local variables. Maximum is 65,536 bytes.
... I:\T_Teensy16\hardware\teensy\avr\libraries\Audio\analyze_fft1024.cpp ...

//KPC_FFT1024 AudioWindowBlackmanHarris1024 called for 1.0528 Not Ready Loops (5804650/5513427) : 5.28% faster
// Quick ref: msg #10 PJRC 'Not' was 5742280 and dropped to 5513427, lost 4.15% with extra stats, remove many USB prints
// For self measurement of 100% Stat using PJRC code hits 3,860 'Not' per ready sample set [versus 4039 for KPC]
48 ]FFT:<91_@0>8 26 34 20
Freq:4000.00 CalcFreq:3999.92 bMax=34 bMaxId=93 bStart=91 bAscDur=2 bDecDur=1 bSecond= 0
KPC1024=8,11 all=10.12,12.49 Memory: 6,10 Ready:14400 Not:58219479 # Samples:160 Sample Time:1856 Time per sample:11 % Eff=1.05

Legend:
Compile CODE/RAM data - and showing unique source tree path
// General Comments
"#]FFT:..." output line starts with the "correlated Magnitude" of the samples. Only the 3rd stable sample is shown, the first transition and 2 similar samples are not displayed
The <bracket> notation shows the number of leading zero bins truncated:<91_@0>, then the bin data .read()
"Freq:..."shows 'set' sinewave Freq and the value calculated [This varies with max +/- 0.12 from sample to sample and run to run - similar on both]
Bin Max & Index, first non-zero bin, bins before peak and after, multiple peak detection[ only seen on 4000Hz to 100Hz transition]
Then KPC1024= or FFT1024=:
Stats summary from 160 samples from 40 freq steps 100Hz with 4 per step starts showing FFT==PJRC and KPC for new code and % CPU used as reported in FFT and OverAll and memory blocks used
Ready: is times a complete sample set was available, Not: are loop() calls waiting - this ratio is the basis for my efficiency measure
Time to take 160 samples 1.85 seconds at 11 millis() per sample
Efficiency with current PJRC code as 1.00 and 105% more idle calls to loop() with KPC code
Note: adding enhanced Stats to compare data set calculation dropped the value over 4% from first reported numbers.
 
Last edited:
I went to the AudioWindowBlackmanHarris256 and the provided math works out the same to pinpoint the frequency. The accuracy looks just as good on this sinewave input.

Problem - I'm missing something: Why are per sample times on the 256 FFT twice as long as the 1024 FFT? {this was the same with AudioWindowHanning256 as AudioWindowBlackmanHarris256}

1024: count myFFT.available():160 Sample Time:1857 Time per sample:11
256: count myFFT.available():160 Sample Time:3714 Time per sample:23

This shows I timed for 160 .available() FFT's of each type - and the per FFT time is double?

I expected the 256 version to take one quarter the time reading data compared to the 1024. There must be something in the code I'm not seeing Audio or Sinewave?
 
Last edited:
I just looked into the 256 point fft.
For the 1024 point fft, the output flag should be set every (1024/44117)/2 = 11 ms for the 1024 point fft. Divide by two, since the fft is half overlapping, ie. it gives a response every 512 samples. This corresponds to your test.
The 256 point fft should set the output flag every (256/44117) = 2.9 ms, but since the 512 point fft averages 8 times by default, you end up with 23 ms.
 
@kpc: Thanks that was the answer and info I've needed. That is unique to '256' and I failed to understand the value of 'naverage(8)' or this function.

myFFT.windowFunction(AudioWindowBlackmanHarris256);
myFFT.averageTogether(1);

48 ]FFT:<21_@0>4 19 34 27 9
Freq:4000.00 CalcFreq:4000.22 bMax=34 bMaxId=23 bStart=21 bAscDur=2 bDecDur=2 bSecond= 0
fft256=12,13 all=13.88,14.34 Memory: 1,2 Ready:1440 Not:1605988 # Samples:160 Sample Time:464 Time per sample:2 % Eff=0.29

with Avg(1) the Frequencies correct near and above 300Hz.

Freq:100.00 CalcFreq:187.52
Freq:200.00 CalcFreq:171.77
Freq:300.00 CalcFreq:295.33
Freq:400.00 CalcFreq:399.84
 
Last edited:
@defragster, I assume the last table you gave is for 256 point fft. In that case, the frequency resolution is 44117/256 = 172 Hz, The first bin stops at 172/2 = 86 Hz, so bot 100 and 200 Hz lie in bin 1, the (k-1) point might be corrupted since bin 0 contains the DC offset.
 
Last edited:
Thanks @kpc: I had already accounted for that. Yes that table was for the 256 point fft.

Code:
#ifdef FFT1024
    CalcFreq = (AUDIO_SAMPLE_RATE_EXACT / 1024) * (bMaxId + dd * 2.251);
#else
    CalcFreq = (AUDIO_SAMPLE_RATE_EXACT / 256) * (bMaxId + dd * 2.251);
#endif

I think I just figured out why - the bins are so small there is no value[-1] and the peak bin is bin ZERO.

Here is a better sample of the rest of the frequencies. I'll be looking to use this to see if I can get the data I need on my sounds. With 160 groups of 256 samples in .465 seconds sitting under 15% usage.

Freq:100.00 CalcFreq:165.54 bMax=61 bMaxId=0 bStart=0 bAscDur=0 bDecDur=2 bSecond= 0
Freq:200.00 CalcFreq:190.72 bMax=40 bMaxId=0 bStart=0 bAscDur=0 bDecDur=3 bSecond= 0
Freq:300.00 CalcFreq:305.15 bMax=34 bMaxId=2 bStart=0 bAscDur=2 bDecDur=1 bSecond= 0
Freq:400.00 CalcFreq:399.63 bMax=33 bMaxId=2 bStart=0 bAscDur=2 bDecDur=2 bSecond= 0
Freq:800.00 CalcFreq:799.90 bMax=33 bMaxId=5 bStart=2 bAscDur=3 bDecDur=1 bSecond= 0
Freq:1200.00 CalcFreq:1199.91 bMax=34 bMaxId=7 bStart=5 bAscDur=2 bDecDur=1 bSecond= 0
Freq:1600.00 CalcFreq:1600.19 bMax=33 bMaxId=9 bStart=7 bAscDur=2 bDecDur=2 bSecond= 0
Freq:2000.00 CalcFreq:1999.95 bMax=33 bMaxId=12 bStart=9 bAscDur=3 bDecDur=1 bSecond= 0
Freq:2400.00 CalcFreq:2399.92 bMax=34 bMaxId=14 bStart=12 bAscDur=2 bDecDur=1 bSecond= 0
Freq:2800.00 CalcFreq:2800.26 bMax=34 bMaxId=16 bStart=14 bAscDur=2 bDecDur=2 bSecond= 0
Freq:3200.00 CalcFreq:3200.11 bMax=32 bMaxId=19 bStart=16 bAscDur=3 bDecDur=1 bSecond= 0
Freq:3600.00 CalcFreq:3599.84 bMax=34 bMaxId=21 bStart=19 bAscDur=2 bDecDur=1 bSecond= 0
Freq:4000.00 CalcFreq:4000.22 bMax=34 bMaxId=23 bStart=21 bAscDur=2 bDecDur=2 bSecond= 0
fft256=12,13 all=13.83,14.53 Memory: 1,2 Ready:263200 Not:243891435 # Samples:160 Sample Time:465 Time per sample:2 % Eff=0.24

Here is another sample where 200Hz comes out well because the peak isn't bin zero:
51 ]FFT:38 32 15
Freq:100.00 CalcFreq:177.61 bMax=38 bMaxId=0 bStart=0 bAscDur=0 bDecDur=2 bSecond= 0
46 ]FFT:19 32 26
Freq:200.00 CalcFreq:207.01 bMax=32 bMaxId=1 bStart=0 bAscDur=1 bDecDur=1 bSecond= 0
49 ]FFT:13 28 34 19
Freq:300.00 CalcFreq:300.52 bMax=34 bMaxId=2 bStart=0 bAscDur=2 bDecDur=1 bSecond= 0
 
Last edited:
@kpc

Not sure if this has any value over the real tuning fork data but I started the sinewave at 850 and ran to 1,000 in 1 Hz steps using my instrumented code. If you wanted to edit and go for finer resolution you could - but from what I checked the calculation across this range was identical, also attached the code that did it.

View attachment KPC_850to1k.txt View attachment PJRC_850to1k.txt View attachment T_FFT_SINEWAVE_edit.ino

A quick edit at line 225 could adjust where it focuses:
Code:
      Freq  += 1;
      if ( 1000 < Freq ) {
        Freq = 850;


Note: I turned my counters into double and the code throughput on empty loop() passes was cut in half, went to unsigned long.
 
@kpc here is the data you asked for ( not as easy as might seem ). Will try try again if i missed the right frame.

-13279 -10395 -7408 -4188 -890 2408 5702 8861 12025 15086 17837 20374 22536 24186 25422 26421
27099 27106 26533 25443 23876 21833 19518 16796 13737 10445 6980 3552 42 -3560 -6915 -10172
-13186 -15729 -18102 -20094 -21725 -23182 -24050 -24423 -24467 -23930 -22957 -21757 -19995 -17818 -15354 -12749
-10048 -7148 -4086 -809 2608 5930 9151 12146 14970 17676 20185 22334 24052 25392 26284 26811
26775 26012 24865 23132 21147 18778 15891 12774 9435 5940 2631 -681 -4113 -7346 -10311 -13088
-15595 -17834 -19711 -21284 -22379 -23077 -23166 -23019 -22456 -21535 -20313 -18573 -16445 -14109 -11472 -8524
-5511 -2283 1007 4281 7649 10868 13968 17074 19842 22224 24311 25844 26966 27819 28215 28151
27506 26239 24470 22522 20154 17386 14317 10986 7479 3933 271 -3231 -6506 -9586 -12506 -15162
-17561 -19471 -20881 -21865 -22435 -22589 -22363 -21821 -20888 -19675 -17927 -15819 -13479 -10804 -7978 -4935
-1724 1714 5047 8414 11566 14607 17554 20021 22138 24017 25570 26696 27401 27528 27179 26348
25129 23502 21493 18876 15836 12532 9080 5668 2208 -1255 -4802 -8030 -10988 -13699 -15964 -18075
-19697 -20881 -21814 -22236 -22262 -22137 -21475 -20336 -18922 -17150 -15081 -12837 -10101 -7112 -4077 -881
2372 5715 9021 12266 15228 18021 20552 22804 24797 26391 27296 27823 27846 27395 26586 25217
23351 21097 18553 15740 12778 9451 6001 2487 -946 -4287 -7354 -10335 -13126 -15540 -17872 -19851
-21206 -22103 -22632 -22698 -22438 -21763 -20510 -19042 -17094 -14723 -12169 -9386 -6301 -3112 170 3528
6687 9799 12943 15861 18642 21209 23288 25231 26861 27795 28443 28489 27989 27131 25589 23462
21020 18339 15475 12425 9067 5393 1831 -1583 -4903 -7990 -10973 -13740 -16185 -18212 -19997 -21167
-22084 -22572 -22497 -22049 -21206 -20032 -18454 -16480 -14126 -11431 -8660 -5591 -2428 781 4150 7519
10783 14007 16920 19698 22130 24332 26083 27440 28455 28899 28785 28065 26748 25103 23123 20754
18015 14988 11720 8357 5051 1626 -1813 -5199 -8469 -11442 -14121 -16569 -18673 -20131 -21166 -21948
-22328 -22293 -21764 -20730 -19386 -17667 -15748 -13556 -11012 -8225 -5152 -1954 1366 4626 7827 11080
14239 17189 19868 22096 24085 25661 26916 27652 27889 27684 27104 25996 24397 22437 20001 17372
14478 11272 7797 4166 663 -2704 -6037 -9147 -12014 -14581 -16872 -18859 -20505 -21676 -22191 -22389
-22151 -21609 -20797 -19637 -17903 -15892 -13625 -10974 -8237 -5324 -2132 1107 4393 7741 11061 14141
17220 19957 22290 24206 25731 26907 27613 27628 27267 26542 25275 23592 21582 18980 16069 12940
9558 6161 2778 -833 -4360 -7615 -10765 -13514 -15886 -18166 -19926 -21320 -22411 -22971 -23160 -22884
-22139 -20980 -19501 -17721 -15676 -13261 -10492 -7436 -4211 -960 2231 5477 8793 11924 14747 17502
19953 22140 24259 25849 26766 27329 27291 26876 26127 24927 23125 20880 18341 15505 12505 9143
5478 1878 -1567 -4832 -8046 -11098 -13822 -16269 -18436 -20216 -21511 -22511 -22984 -23038 -22714 -21911
-20668 -19121 -17090 -14781 -12319 -9486 -6483 -3401 -333 2878 6100 9333 12544 15487 18287 20892
23077 24948 26213 27107 27630 27590 26964 25953 24447 22467 20102 17306 14163 11003 7800 4263
822 -2708 -6240 -9397 -12534 -15400 -17661 -19773 -21491 -22629 -23506 -23938 -23853 -23391 -22590 -21309
-19739 -17649 -15261 -12770 -9989 -6998 -3954 -691 2648 5906 9294 12398 15194 17756 20148 22172
23980 25377 26180 26615 26543 25974 25025 23552 21567 19333 16705 13691 10481 6967 3318 -189
-3635 -7071 -10269 -13162 -15802 -17985 -19988 -21704 -22798 -23638 -23977 -23806 -23316 -22440 -21120 -19535
-17621 -15238 -12752 -9846 -6710 -3551 -217 3296 6637 9964 13169 15982 18790 21306 23311 24995
26253 26862 27266 27067 26371 25296 23716 21750 19404 16587 13352 10035 6679 3270 -42 -3525
-7018 -10272 -13299 -15944 -18041 -19857 -21386 -22516 -23347 -23579 -23307 -22729 -21959 -20783 -19290 -17278
-14884 -12293 -9438 -6370 -3238 3 3384 6622 9812 12827 15659 18312 20630 22544 24119 25192
25773 26016 25672 24708 23250 21477 19294 16738 13870 10548 7276 3886 402 -2940 -6342 -9574
-12473 -15306 -18074 -20371 -22328 -23855 -24977 -25676 -26008 -25840 -25151 -24036 -22457 -20644 -18543 -16024
-13404 -10450 -7374 -4184 -921 2468 5768 9068 12256 15102 17746 20062 22003 23675 24918 25490
25705 25362 24543 23344 21570 19294 16589 13665 10472 7274 3822 140 -3328 -6743 -9861 -12760
-15514 -17989 -20031 -21856 -23276 -24443 -25210 -25348 -24876 -24111 -22985 -21477 -19722 -17465 -14921 -12326
-9445 -6377 -3211 130 3384 6612 9763 12750 15465 17953 20222 22134 23703 24718 25195 25225
24642 23708 22502 20730 18601 16032 12968 9543 6206 2703 -747 -4122 -7587 -10882 -13849 -16629
-19021 -20993 -22725 -23952 -24871 -25296 -25345 -24847 -24001 -22782 -21192 -19476 -17264 -14777 -12161 -9085
-5835 -2571 841 4196 7387 10530 13511 16208 18763 20930 22664 24078 24972 25483 25442 24838
23794 22346 20406 18053 15428 12394 9228 5966 2492 -1083 -4599 -7935 -11056 -14051 -16813 -19283
-21302 -22922 -24105 -24974 -25434 -25354 -24857 -24069 -22805 -21155 -19194 -16835 -14321 -11569 -8349 -5036
-1818 1393 4548 7645 10842 13821 16360 18765 20778 22213 23351 24119 24389 24275 23551 22437
20941 18873 16533 13864 10837 7608 4324 841 -2610 -6007 -9444 -12568 -15348 -17950 -20215 -22020
-23592 -24808 -25590 -25968 -25718 -25002 -24084 -22831 -21281 -19384 -17064 -14398 -11541 -8528 -5388 -2110
1223 4474 7625 10774 13743 16448 18889 20835 22329 23559 24300 24535 24375 23484 22090 20268
18028 15361 12532 9415 6187 2969 -405 -3917 -7392 -10772 -13866 -16490 -19089 -21350 -23191 -24717
-25799 -26504 -26906 -26749 -26024 -24967 -23510 -21659 -19619 -17093 -14237 -11387 -8277 -5140 -1940 1533
4862 7918 10941 13766 16358 18807 20823 22316 23358 23933 23965 23755 22984 21671 19963 17749
15134 12300 9133 5838 2366 -1154 -4616 -7942 -11187 -14339 -17097 -19790 -22043 -23694 -25141 -26056
-26547 -26714 -26441 -25659 -24630 -23240 -21373 -19346 -16943 -14186 -11341 -8297 -5063 -1763 1631 4925
8022 11164 14076 16753 19076 20940 22128 23057 23573 23566 23195 22256 20807 19044 16855 14212
11341 8301 5055 1727 -1770 -5301 -8698 -11919 -14992 -17668 -20075 -22260 -23883 -25154 -26110 -26435
-26358 -25862 -24909 -23730 -22279 -20322 -18210 -15839 -13158 -10388 -7379 -3952 -631 2693 5955 8985
12085 14918 17377 19529 21434 22761 23775 24352 24238 23808 22781 21280 19470 17304 14734 11755
8556 5131 1702 -1698 -5210 -8486 -11670 -14642 -17199 -19428 -21514 -23163 -24405 -25211 -25410 -25253
 
@cartere, I checked the data, and the spectrum is not a single tone, so the interpolation algorithm will fail. I cannot get the resolution from only 1024 points, but it appears like there are two tone very close together. Like I said earlier, the fft is not meant to give the precission you require.
I also performed a comparisson on the data for the PJRC fft, mine, both using integers and octave(=free Matlab) using doubles.
The first 24 fft points are:
Code:
PJRC:   767 942 221 168 232 133 91 229 139 252 188 172 203 170 175 242 265 371 411 554 826 1774 12343 1348 719 499 374
mine:   765 940 219 166 231 132 91 229 141 252 188 172 203 170 174 242 264 371 411 555 825 1773 12342 1349 719 499 373
octave: 771 938 218 167 233 136 95 232 144 255 193 175 206 174 179 246 268 374 415 558 829 1777 12340 1346 716 497 371
Assuming octave is the true response, the PJRC FFT gives an RMS error of 1.27 and a peak error of 5. The optimized FFT gives an RMS error of 1.29 and a peak error of 6. In percentages relative to the peak, the RMS errors are 0.01 %, or 100 ppm, so still pretty accurate.
So both PJRC and the optimized version are approximately equal, with only minor differences due to truncation effects. When simulation with multiple other arbitrary signals, the PJRC (=CMSIS) FFT always performs marginally better. The difference probably due to spectral leakage when using the FFT for two real signals simultaneously.
 
Last edited:
@kpc thanks for all your work on this. Would be happy to send you a tuning fork and mic :) The application is calibration of tuning forks used in speed radar by law enforcement, some of the forks are of substandard manufacture and their output decays very quickly resulting in the need to determine frequency using an other than counter based solution. With your tweaks to the code and CMSIS based library I can hit the sweet spot of fast updates and needed accuracy.
 
Never used it, but you might also want to have a look at the the FreqCount/FreqMeasure library.
 
@carter, Not related to the FFT, bu the same idea as freq measure, but using the audio input. Zero crossing time is determined through interpolation
You might want to add a band pass filter to the input (AudioFilterStateVariable), as there is no suppression of noise in the code.
Also the alpha should be tweaked.
Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>

AudioInputI2S i2s1;
AudioRecordQueue queue1;
AudioSynthWaveformSine sine1;
AudioConnection pc1(sine1, queue1);

void setup()
{
  AudioMemory(12);
  Serial.begin(9600);  
  sine1.frequency(941.);
  sine1.amplitude(1.);
  queue1.begin();
}

int16_t lastsample = 1;
uint32_t prevstartidx = 0;
uint16_t idx = 0;
const float alpha = 1./(10+1); // lowpass filter 10 results
float filtfreq = 0;

void loop()
{
  if (queue1.available()) {
    int16_t *src = queue1.readBuffer();
    for (unsigned s = 0; s < AUDIO_BLOCK_SAMPLES; s++) {
      int16_t sample = src[s];
      if (sample > 0 && lastsample <= 0) {
        uint32_t startidx = (idx << 16) | (((lastsample << 16) / (lastsample - sample)) & 0xffff);
        float freq = (AUDIO_SAMPLE_RATE_EXACT * 65536.) / (startidx-prevstartidx);
        if (freq > 500. && freq < 1500.)
          filtfreq = alpha * freq + (1 - alpha) * filtfreq;
        Serial.println(filtfreq);
        prevstartidx = startidx;
      }
      lastsample = sample;
      idx++;
    }
    queue1.freeBuffer();
  }
}
 
Last edited:
Just read a remark of Paul in another thread. Seemed more appropriate to react here.
Pauls remark was that I used a radix-2 implementation.
This is only partly true. The majority of the work is still performed in the CMSIS fft function (256). I only replaced the final radix4 step by two radix-2 steps. Yes it is slighly less efficient then a radix-4, but this allowed me to spread the load evenly over all cycles. This in combination with the double use of a complex fft for two real ffts, gave the improvement.
@Paul when you come around to looking at this, don't hesitate to ask for more explanations/comments. But please don't hurry, your other work is more important. Whenever somebody complains about processing power, a simple link to this thread will suffice. Also, although my code is somewhat more efficient, your code is more readible. I don;t think the occasional user will comprehend my code.

FYI, I am also looking into the fft256. It looks like I can get it almost 45% more efficient, and also a better spread over cycles. This time using a different approach, where I focus somewhat more on integration, but the gain is still almost 40% without integration. Won't publish the code yet, need to do some more testing, maybe next week.

Edit: A correction on better spread. The current fft256 already has an even spread. My new fft256 code use some more pipelining, and thereby the processing in each cycle is different, but still managed to reasonably keep the load equal forall cycles.
 
Last edited:
Status
Not open for further replies.
Back
Top