Excellent results with Floating Point FFT/IFFT Processing and Teensy 3.6

Status
Not open for further replies.

Ray.E

Member
I 'm not sure where to post this, but I'm very pleased with the results so far with testing FFT/IFFT processing using the 32-bit floating point versions of the arm_math Library and the Teensy 3.6 and its FPU.

Previously, passing audio through pairs of the q15 based arm_math library FFT/IFFT functions resulted in very noisy results, due to the limitations of the 16 bit depth. Using the arm_cfft_radix4_f32 as pretty much a drop in replacement for the arm_cfft_radix4_q15 function, the results are dramatically better.

For example with a 1024 FFT/IFFT length, artefact tone levels dropped from around -40/-45 dB to around -78dB. Base level white noise went down from around -63 dB at say 5KHz to occasional peaks of about -78 dB, mostly at much lower levels than that.

I have attached some Spectrographs of the FFT/IFFT reconstruction of a 1K Hz sine wave to show the differences, both at 1024 and 256 lengths. Note that so far there is no shaped windowing or overlapping etc done at all, so I'm hoping for some reduction in some of the artefact peaks on these graphs with some window processing.

Music played from iTunes through the FFT/IFFT pair at either 1024 or 256 lengths is quite acceptable for casual listening at least. There is certainly a slight increase in background hiss noticeable with more careful listening.

Now I'm ready to start with the timestretching, pitch shifting, sound tone freezing, sound blending across bins and … and …
I expect this will take a while to get together though. I am hoping it will be usable in a realtime performance context eventually.

For the graphs shown, the audio is played into Line-in on the Audio shield via a record 'queue' Library object , an FFT/IFFT is performed on it and then sent to Line-out via a play 'queue' Library object.

Measured Latency data -
1024 length FFT/IFFT passthrough: 29.57 ms (23.22 ms of that would be serialisation delay)

256 length FFT/IFFT passthrough: 12.15ms (5.8 ms of that would be serialisation delay)

For reference, for a 128 length Audio block passthrough via Library 'queue' objects without any FFT/IFFT processing, the latency measures as 9.25 ms (2.9 ms of that would be serialisation delay)

A big thanks to everyone who helped get me to this point, particularly DerekR and Duff. I suspect the next bits will be harder :)

Let me know if you have any questions.


TEENSY_IFFT_Q15_256_Sine1k_Sgraph.jpgTEENSY_IFFT_Q15_1024_Sine1k_Sgraph.jpgTEENSY_IFFT_32F_256_Sine1k_Sgraph.jpgTEENSY_IFFT_32F_1024_Sine1k_Sgraph.jpg
 
Thanks for this post. I, too, have found that the Int32 and Float32 FFTs to be very nice. They're also quite fast on the Teensy 3.6:

https://openaudio.blogspot.com/2016/10/benchmarking-teensy-36-is-fast.html

This part of your message caught my eye:

For reference, for a 128 length Audio block passthrough via Library 'queue' objects without any FFT/IFFT processing, the latency measures as 9.25 ms (2.9 ms of that would be serialisation delay)

Can you say a little more about how you did this test? For my audio processing projects, latency will end up being quite important. If a straight pass-through of 128 points takes 9.25 ms, instead of 128/44100 x 2 = 5.8 ms as expected, that could be problematic.

How did you do your latency test?

Thanks,

Chip
 
Chip,
Just to be pedantic a little for clarity, I'm looking to measure the _additional_ latency of passing through the teensy. A wire connecting two devices directly will have a 'latency' (all serialisation) of 128/44100 x 1 in this case. Putting a Teensy in the path adds additional latency - another 128/44100 serialisation (2.9 ms) plus processing time.

That said, for my processing in the Teensy I'm using the Audio Library queue functions to get a queue receive buffer, recieve the 128 samples, into the buffer, then get an output queue buffer, copy the input to output (memcpy), then send buffer to be output queue. Not exactly direct, but it gives me visibility of the samples at least, and I don't have the knowledge to do native access to the Audio shield. The cost is the processing time for all this.

The Audio Library does a more direct input to output streaming connection capability with no sketch code involved. With a Teensy 3.2/Audio Shield this has an additional latency figure (including serialisation) of around 6.3 ms. Even this is generalised library code to some extent so may be able to be optimised even further.

Unfortunately I can't test this with Teensy 3.6 as the Audio library function for this direct streaming currently has a bug for the 3.6, at least at high clock rates.

In more general terms, I measure the additional latency by comparing the timing of a purely wired analogue audio loop with an analogue wired loop including the Teensy. This is done with Metric Halo Spectrafoo's Transfer Function tool, set to detect the _difference_ in timing between audio from the purely wired loop compared with the loop including the Teensy. I've found it be quite accurate. I can decribe the setup in more detail if it is of interest.

Hope this helps.
 
Thanks for the added detail.

I like the fact that you're comparing to literally a purely wired loop. This directly addresses my concerns that you might have been using your soundcard as an oscilloscope, but NOT allowing for the possibility that your soundcard had its own latency. It sounds like your test method is excellent.

Thanks,

Chip
 
Unfortunately I can't test this with Teensy 3.6 as the Audio library function for this direct streaming currently has a bug for the 3.6, at least at high clock rates.

Can you please point me to more info about this bug? I've missed quite a lot of forum posts over the last few weeks. :(
 
@Paul:
I reported some problems with audio in the K66 beta thread.

Can't get microphone or line-in audio to play straight through.
The code in this message still doesn't work on a new T3.6
msg #936 https://forum.pjrc.com/threads/34808-K66-Beta-Test/page38

Then see #955 on the next page

These are specifically about clock speed
#997 and #999 on page 40

There were earlier messages from others about I2S problems in the K66 beta thread.

Pete
 
@Paul:
Can't get microphone or line-in audio to play straight through.
The code in this message still doesn't work on a new T3.6

I, too, have problems with any sort of straight-through signal chain...i2s_in -> i2s_out. I tried to get smart and insert a "dummy" filter algorithm, but the optimizer or whatever sees right through my ploy. Works on Teensy 3.2, doesn't on Teensy 3.6.

My code is below, and on my github:

https://github.com/chipaudette/Open...o/Teensy Time Domain Audio/PassThrough_LineIn

Chip


Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

class AudioFilterEmpty : public AudioStream
{
  public:
    AudioFilterEmpty(char *foo_txt) : AudioStream(1, inputQueueArray) {myName = foo_txt; }
    char* myName;
    void update(void)
    {
      audio_block_t *block;
      block = receiveWritable();
      if (!block) {
        return;
      }
      transmit(block);
      release(block);
    }

  private:
    audio_block_t *inputQueueArray[1];
};

AudioInputI2S            i2s1;         
AudioOutputI2S           i2s2;        
AudioFilterEmpty         filter1("Filter1");
AudioFilterEmpty         filter2("Filter2");
AudioConnection          patchCord1(i2s1, 0, filter1, 0);
AudioConnection          patchCord2(i2s1, 1, filter2, 0);
AudioConnection          patchCord3(filter1, 0, i2s2, 1);
AudioConnection          patchCord4(filter2, 0, i2s2, 0);
AudioControlSGTL5000     sgtl5000_1;     

void setup() {
  AudioMemory(20);
  delay(500);

  // Enable the audio shield and set the output volume.
  sgtl5000_1.enable();
  sgtl5000_1.inputSelect(AUDIO_INPUT_LINEIN);
  sgtl5000_1.volume(0.45); //headphone volume
}

void loop() {
   delay(20);
}
 
Interesting. Is it speed-dependant ? Does it work with 96Mhz on the 3.6?

Great question. I didn't think to try that.

It turns out that it works at 96 MHz and 120 MHz, but it doesn't work at 144, 168, or 180 MHz.

Does that help locate the problem?

Chip

Updated (simplified) code is on my GitHub and below:

Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

AudioControlSGTL5000     sgtl5000_1;    
AudioInputI2S            i2s1;         
AudioOutputI2S           i2s2;

//simplest pass-through  (On Teensy 3.6: works at 96 MHz and 120MHz.  Not at 144, 168, or 180 MHz)
AudioConnection          patchCord1(i2s1, 0, i2s2, 0);
AudioConnection          patchCord2(i2s1, 1, i2s2, 1);

void setup() {
  Serial.begin(115200);
  delay(500);
  Serial.println("Pass-Through Line-In to Headphone...");
  
  AudioMemory(20);
  delay(250);

  // Enable the audio shield and set the output volume.
  sgtl5000_1.enable();
  sgtl5000_1.inputSelect(AUDIO_INPUT_LINEIN);
  sgtl5000_1.volume(0.45); //headphone volume
  //sgtl5000_1.lineInLevel(11, 11); //max is 15, default is 5
}

void loop() {
   delay(20);
}
 
Great question. I didn't think to try that.

It turns out that it works at 96 MHz and 120 MHz, but it doesn't work at 144, 168, or 180 MHz.

Does that help locate the problem?

I hope NOT !
These frequencies suggest that it has something to do with HSRUN. That would not be good. Could indicate a silicon-bug ..
But we have to review the code first...
Can you make a little test for me ? I don't know wether it is related or not: On my 3.6 BETA-Board and the pre-production, i have massive problems with I2S Output when using more than 192 MHz. Can you check this on your Teensy ? Something with the timing is totally wrong and it is very distorted.
Unfortunately, these both are the only 3.6 i have at the moment..i'm still waiting for the others.
 
Last edited:
I hope NOT !
These frequencies suggest that it has something to do with HSRUN. That would not be good. Could indicate a silicon-bug ..
But we have to review the code first...
Can you make a little test for me ? I don't know wether it is related or not: On my 3.6 BETA-Board and the pre-production, i have massive problems with I2S Output when using more than 192 MHz. Can you check this on your Teensy ? Something with the timing is totally wrong and it is very distorted.
Unfortunately, these both are the only 3.6 i have at the moment..i'm still waiting for the others.

I don't know what you mean by more than 192MHz. Is that the clock rate of the I2S bus? Regardless, how do I change it to do your test?

Chip
 
Please try 216MHz "Overclock" / 240MHz "Overclock" and "PlaySynthMusic" from the examples. Up to 192MHz, everything is OK on my 3.6. But not above.
I'm sure, for MY 3.6, it is a hardwareproblem..
 
Last edited:
It turns out that it works at 96 MHz and 120 MHz, but it doesn't work at 144, 168, or 180 MHz.

Thanks. I've got this on my high-priority bug list now. :)

At this moment I'm working on bringing SDIO support into the Arduino SD library. That's at the absolute top of my list right now!

Will try to get as much of this other stuff as I can before 1.31-beta2. But realistically, I really want to get beta2 out ASAP with the most urgent fixes. This and quite a few other things might end up waiting another week for beta3.

It's now on my short list, and we're *finally* recovering from Kickstarter and then a few weeks of being short-staffed due to much-needed vacations... so I will look at this soon.
 
Just to set expectations appropriately (low), this first version will read/write only 1 sector at a time using polling. Pretty much the same as the SPI code does, but with 4 bits parallel at 25 MHz.
 
Ok, perhaps thats fast enough. OK. We'll see :)

Paul: Another info regarding I2S: I don't know wether this is useful to know, or not, but i thought i should mention it: My problems with I2S and 216/240MHZ exist only with the SGTL-5000. Not with the PT8211.
I wonder why, and I have no real explanation for this, since there are only little differences.
 
Please try 216MHz "Overclock" / 240MHz "Overclock" and "PlaySynthMusic" from the examples. Up to 192MHz, everything is OK on my 3.6. But not above.
I'm sure, for MY 3.6, it is a hardwareproblem..

My audio pass-through test does not work at any of the overclock settings. 120 MHz or below is as fast as the it'll work with the Teensy 3.6.

Arduino 1.6.11. Teensy Loader 1.30. USB set to "Serial". Teensy 3.6 from Kickstarter. Audio Shield Rev B.

Chip
 
Last edited:
Ooops, I just saw that you asked for "PlaySynthMusic". This is purely about sound generation, not about the Line-In. So, maybe it'll work at the higher speeds? We'll see. I repeated my tests:

* 120 MHz: plays OK
* 180 MHz: plays OK
* 192 MHz: plays OK
* 216 MHz: the song is recognizable but the pitches warble hilariously. Funny! And, it stopped playing unexpectedly part way through.
* 240 MHz: plays OK

So, my only trouble was with 216 MHz. Everything else was fine.

By the way, on my PC (Thinkpad T410, Core i5 M540 @ 2.53GHz, 8GM RAM, Non-SDD harddrive, Win7 64-bit), recompiling for the Teensy 3.6 takes longer than the song takes to play. It's quite a bit slower than I'd like. It's quite a bit slower than is comfortable for rapid-fire iterative programming.

Hope this helps...

Chip
 
recompiling for the Teensy 3.6 takes longer than the song takes to play. It's quite a bit slower than I'd like. It's quite a bit slower than is comfortable for rapid-fire iterative programming.

Chip - what IDE are you using? It seemed 1.6.12 put some attention to recompiling and sped it up.

from below re-compile is much faster using the 1.6.12:
[30 second full compile on desktop SSD with IDE 1.6.12 :: recompile 4-5 seconds]
 
Last edited:
PlaySynthMusic on Beta2 T_3.6 - all these speeds sound fine and seem to play through at same speed.
Using PJRC AUDIO shield :: plays to stop in about 1:22 (timed if noted **)::
240 - FINE - >> F_BUS 120M
216 - FINE - ** >> F_BUS 54M default and F_BUS 108M
192 - FINE - ** >> F_BUS 48M
180 - FINE - >> F_BUS 90M
120 - FINE - ** >> F_BUS 60M
[30 second full compile on desktop SSD with IDE 1.6.12 :: recompile 4-5 seconds]

NOTE: I have my HSRUN changes on this machine - but that only affects EEPROM writes and Up front Serial# read.

Nothing 'local' in libraries::
Multiple libraries were found for "SD.h"
Used: C:\arduino-1.6.12\hardware\teensy\avr\libraries\SD
Not used: C:\arduino-1.6.12\libraries\SD
Using library Audio at version 1.3 in folder: C:\arduino-1.6.12\hardware\teensy\avr\libraries\Audio
Using library SPI at version 1.0 in folder: C:\arduino-1.6.12\hardware\teensy\avr\libraries\SPI
Using library SD at version 1.0.8 in folder: C:\arduino-1.6.12\hardware\teensy\avr\libraries\SD
Using library SerialFlash at version 0.4 in folder: C:\arduino-1.6.12\hardware\teensy\avr\libraries\SerialFlash
Using library Wire at version 1.0 in folder: C:\arduino-1.6.12\hardware\teensy\avr\libraries\Wire

<Update> ::
Compiled for Beta T_3.5 and works at 120 MHz
Failure of T_3.5 to run at 144M and 168 - is work needed for them to run or should those [and higher] be pulled from BOARDS.txt?
 
Last edited:
Chip - what IDE are you using? It seemed 1.6.12 put some attention to recompiling and sped it up.

from below re-compile is much faster using the 1.6.12:
[30 second full compile on desktop SSD with IDE 1.6.12 :: recompile 4-5 seconds]

I'm using 1.6.11. When I installed Teensyduino, it wouldn't let me install into 1.6.12, so I downgraded to 1.6.11. This is for the default downloadable Teensyduino installer for Windows. If you can run it on 1.6.12, and if it's faster, that great! I'm looking forward to it!

Chip
 
Status
Not open for further replies.
Back
Top