A Fast Hartley Transform for the Teensy 3.1?

Status
Not open for further replies.

MartinR

Member
I am looking for good implementations of the Fast Hartley Transform for the Teensy 3.0 or 3.1; I have been looking at the Arduino FHT library for example (but this uses embedded asm so is not portable). Can anyone recommend something written ideally in ANSI C for portability please? I would be using signed 16 bit integers for input and output (e.g. the output from an ADC). I could potentially use an FFT instead, however, the performance gains of the FHT make this more attractive and I need a faster implementation than the FFT could perhaps provide.

Thanks.
 
I am looking for good implementations of the Fast Hartley Transform for the Teensy 3.0 or 3.1; I have been looking at the Arduino FHT library for example (but this uses embedded asm so is not portable). Can anyone recommend something written ideally in ANSI C for portability please? I would be using signed 16 bit integers for input and output (e.g. the output from an ADC). I could potentially use an FFT instead, however, the performance gains of the FHT make this more attractive and I need a faster implementation than the FFT could perhaps provide.

Thanks.

Hi, where can i find this Arduino FHT Library ?
Perhaps it is possible to port it...

Maybe this one works for the Teensy:
http://www.waitingforfriday.com/ind...ansformation_Library_for_AVR_microcontrollers
Please note that, whilst the library is primarily tested on the ATmega328P and ATmega32U4, it is suitable for any GCC compatible environment and will function well on other boards such as the Raspberry Pi, the Arduino Due, etc.
 
The Arduino (openmusiclabs) version is at http://wiki.openmusiclabs.com/wiki/ArduinoFHT. The 'waiting for Friday' version seems to be at least partly a GCC compliant port of the openmusiclabs version (e.g. the gneerateTables() function). It is certainly worth a detailed look. It would seem to need reworking for full 2^16 bit operation (e.g. use of int32_t etc) but that shouldn't be a problem.
 
Okay..then openmusiclabs-version is too much assembler-code..
But did you try the Teensy-FFT ? On Teensy, multiplication is as fast as addition and the audio-lib implementation uses the fast Cortex-M4 DSP-extensions.
 
I could potentially use an FFT instead, however, the performance gains of the FHT make this more attractive and I need a faster implementation than the FFT could perhaps provide.

Could you be a bit more specific about the actual performance you really do need?

Is this an audio application, or something much faster? Is it something like music visualization? Or is sort of sophisticated signal processing, where both real and complex (magnitute and phase) matter for reconstructing the signal when you do the inverse transform?

Understanding what you're trying to actually accomplish can allow for much more helpful answers than a discussion narrowly focused on very specific technical points.

If you're looking to do music response or visualization, the audio library already has everything you need and some nice examples are available.
 
Okay..then openmusiclabs-version is too much assembler-code..
But did you try the Teensy-FFT ? On Teensy, multiplication is as fast as addition and the audio-lib implementation uses the fast Cortex-M4 DSP-extensions.

I haven't yet tried the Teensy-FFT because I'm looking for the greater speed possible from FHT code, however, maybe the audio-lib library will help. I'll have to look! Thanks.

Edit: just looked - is this audio library the one available within the Teensyduino add-on?
 
Last edited:
It's a full DSP application handling data in near real time from sensors. The need for performance is critical because of a. the near real time element and b. the need to process multiple streams per cycle. The ability to run the full inverse transform is therefore important. I am looking to run a few 10s of 128 or 256 element transforms per second. It is possible that the Cortex DSP extensions will be sufficient: I'll have to look into them. I'm new to coding for non-ATMEL processors so am learning new things! I will have a look at the links you have posted - thanks. One quick question: does including 'audio.h' in the Teensyduino environment bring in the Teensy version with the Cortex extensions automatically?
 
The audio library easily runs the 1024 point FFT 86 times per second, with plenty of CPU time for OctoWS2811 to render an image to 2000 pixels. For only 256 points, you ought to be able to get well over 100, maybe even around 200 per second.

I believe you might be underestimating Teensy 3.1's speed for the normal FFT from ARM's optimized code.
 
One quick question: does including 'audio.h' in the Teensyduino environment bring in the Teensy version with the Cortex extensions automatically?

Yes, the audio library is heavily optimized to use the Cortex-M4 DSP extensions and efficient DMA to move 16 bit, 44.1 kHz streams on/off chip.

I highly recommend you start with File > Examples > Audio > Analysis > FFT. By default, that example designed to get input from the audio board. You can modify it to use the on-chip ADC if you like, by replacing the I2C object with the one for the ADC. See the OctoWS2811 example for the spectrum analyze for how to do that.

If you want to do things differently than how the audio library works, you may need to dig into the code. But at least to get started, I'd highly recommend putting side the old AVR ways of thinking. Teensy 3.1 is dramatically faster and very well optimized FFT code and the powerful yet easy to use audio library already exists. Do yourself a favor by at least trying the easy way first.
 
You might also want to check these links for optimized versions of the 256 and 1024 point FFT's.
Both versions are optimized for streaming performance. These should be equally efficient as the Hartley transform for streaming data. The core of the optimization is in essence based on the Fourier/Hartley relation.
As said in previous posts, the teensy can easily handle 44.1 kHz audio. The audio library uses half-overlapping FFT's. So for the audio library it calculates 86 FFT's of 1024 points per second, with only 12% CPU load at 72 MHz. For the 256 point FFT, it calculates 345 FFT's per second at roughly 8% CPU usage. (Load percentages are for the optimized versions)
If you look at the raw processing of the CMSIS FFT functions (ie. without copying data and applying the window), then for a 256 point FFT @ 72 MHz, it takes only 0.3 ms for a complex FFT, or inversely about 3200 complex FFT's per second. With the optimization, you can get about 5700 real FFT's per second. (Note that this is raw processing, the CPU is fully occupied calculating FFT's).
Edit: reading back the posts, I see you only needed 10s of FFT's per second. So with the teensy being capable of a couple of 1000s (without overclocking), there is no need worrying about efficiency or using my optimized versions.
 
Last edited:
The audio library easily runs the 1024 point FFT 86 times per second, with plenty of CPU time for OctoWS2811 to render an image to 2000 pixels. For only 256 points, you ought to be able to get well over 100, maybe even around 200 per second.

I believe you might be underestimating Teensy 3.1's speed for the normal FFT from ARM's optimized code.

Thanks Paul, I probably am! I've done a lot of work with FFTs in the past but I'm new to the ARM processor and the Teensy in general. 200 per second sounds useful and I have the option of passing two data streams at a time through the FFT in parallel (one in the real and the other in the imaginary) which will help. I had underestimated yesterday (it was late) how many FFTs I needed per second - 160 / s is the minimum.
 
Last edited:
Status
Not open for further replies.
Back
Top