Teensy3.1 and FPU (Floating Poitn Unit)

Status
Not open for further replies.

fische007

Member
I have a Project where I have to calculate a lot (with float type figures) and fast. Thias was the reason to Switch from a 8 bit to a 32 bit MCU. Now my question: is the FPU in Teensy3.1 always active or do I have to #define something in the program. I saw in the source code some line with #ifdefine __FPU_used? Now I'm not sure if I have to define something to use the FPU in calculations? If somebody has an answer this would be great.
 
Pedvide,

if you look in your link on the overview diagram in the core for the K20 MCU's there is a FPU and a DSP unit.
 
Yes, Freescale offers K20's with a floating point unit, but it is optional. The Teensy 3.x use the non-floating point version of the chip. So you still have software emulation of floating point. However, given the cycle speed of the Teensy is 4-8x faster than the AVR processors and the ARM processors can do 32-bit add/subtract/multiply in single cycle instructions that the AVR has to do multiple instructions, your code will be faster than the Arduino. If you look at the table on the bottom, you will see it broken down and some have the FPU and some do not. I believe the chip that runs 72Mhz is the one used in the Teensy.

If you need to do lots of FP calculations, perhaps you should consider stepping up to the systems designed to run Linux (Raspberry Pi, Beagle Bone Black, pcDunion, etc.). These run at higher clock speeds, have more memory, have a hardware floating point, but they don't give you real time access to the pins like you get in an Arduino type environment. However, given the chips run at much faster clock speeds, you will need more power than a typical 3.7v li-po battery 5v USB connection. If you need real time access to control things like you get on an Arduino/Teensy, perhaps consider a two chip solution (Rasberry Pi + Teensy or Yun).

Intel Galileo might be useful if you don't need fast access to the pins, and you have a lot of power.

Finally, I believe Sparkcore is now shipping (this uses a Spark chip but uses a port of the Arduino libraries).

Unfortunately, if you need lots and lots of floating point (more than 8x faster than an Arduino), the Teensy is not a solution at this time. Out of curiousity what are you doing that needs so much floating point? High speed GPS is one thing that needs FP, as do things like hand held game controllers.

The compiler defines __FPU__ if the options indicate that the target chip has floating point support.
 
Last edited:
I posted the code earlier but today finished running a simple floating point benchmark on all the Arduinoesque boards I have. Even dusted off the Due and downloaded the latest Arduino 1.5.x beta (the rest were done with 1.0.5rc2 and Teensyduino 1.19rc1/rc2).

As the elapsed time is in 10^-6 seconds and the clock speed is in 10^6 cycles/sec, I multiplied them together to see how far the boards diverge from being clock-rate limited. Notice how much faster Teensy 3.1 is compared to Teensy 3.0. I suspect the faster memory bandwidth is a factor there.

Code:
// 8 byte doubles
// ==============
// time elapsed = 18287 micros (Teensy 3.1, 96 Mhz = 1.755)
// time elapsed = 19763 micros (Teensy 3.1, 72 Mhz = 1.422)
// time elapsed = 23350 micros (Teensy 3.1, 48 Mhz = 1.121)
// time elapsed = 18713 micros (Arduino Due, 84 Mhz = 1.572)
// time elapsed = 29504 micros (Teensy 3.0, 96 Mhz = 2.832)
// time elapsed = 31417 micros (Teensy 3.0, 48 Mhz = 1.508)

// 4 byte doubles
// ==============
// time elapsed = 48480 micros (Teensy ++ 2.0, 16 Mhz = 0.776)
// time elapsed = 48484 micros (Teensy 2, 16 Mhz = 0.776)
// time elapsed = 48500 micros (Arduino Uno, 16MHz = 0.776)
// time elapsed = 50116 micros (Arduino Mega 2560, 16 Mhz = 0.802)
 
Last edited:
Nantonos, it might be an interesting test to compare the arm boards using the float type. This would give a fairer comparison to the AVR processors (which as you mention only do calculations in single precision). You would need to call the sinf function to prevent the conversion of the float argument to double.

I recall in the past, that the AVR math library might be more optimized than the ARM math library, which may affect things since you are testing the sin function.
 
Last edited:
I'm interested in doing Audio Processing on embedded devices. So, I did some speed tests with a bunch of Arduino's and a Teensy 3.2 and a NXP K66 board (which is a temporary stand-in for the Teensy 3.6). Writing audio processing software using Floats is much easier for mere mortals to program than Ints, so floating point speed is of great interest to me. The Teensy 3.2 does not have an FPU whereas the K66 / Teensy 3.6 does.

In my benchmarks using FIR filtering, the K66 / Teensy 3.6 absolutely crushes the Teensy 3.2 on floating point operations...

https://openaudio.blogspot.com/2016/09/benchmarking-fir-filtering.html

...the K66 / 3.6 is 25x faster than the 3.2. Wow.

Chip
 
The cool thing with the Teensy 3.x is that it has built in DSP instructions for 16 bit fixed point math, so if you stick to fixed point you can do quite a lot CD-quality audio processing on the Teenies with little CPU load, with no FPU required.
 
The cool thing with the Teensy 3.x is that it has built in DSP instructions for 16 bit fixed point math, so if you stick to fixed point you can do quite a lot CD-quality audio processing on the Teenies with little CPU load, with no FPU required.

You gotta be careful with equating 16-bit and CD-quality. Yes, a CD stores its data as 16-bit numbers. But, when you do processing on 16-bit values (such as multiplying when doing filtering) you naturally end up with 32-bit data. If you're going to bring that back down to a 16-bit data type, you have to figure out what bits to discard. If you do this without a lot of though, you'll quickly lose your "CD-quality audio". Or, even when you do it smart, if you do it too many times, you'll lose your "CD-quality audio".

Digital Audio Workstations (ie, digital audio recording on computers) recognized this problem long ago. I believe that VSTs and other audio processing plugins for DAWs moved (as their standard) from 16-bit to 32-bit (or higher) data types many, many years ago. The standard might have even changed back before 2000.

So, yes, I agree that you can do wicked fast 16-bit processing on the Teensy 3.x...but you might not want to due to quantization and other artifacts. It totally depends upon what processing you want to do and upon your application.

Chip
 
I did more benchmarks of different boards. This benchmark uses the FFT instead of an FIR filter:

https://openaudio.blogspot.com/2016/09/benchmarking-fft-speed.html

For this testing, I tried FFTs coded in generic C as well as FFTs using the DSP extensions built into the Teensy. For the Teensy 3.2, these DSP extensions (the CMSIS library) accelerate In16 operations so that they are about 3.7x as fast as the generic C FFT. For Int32 operations, the CMSIS FFIT is about 3.3x. And, even though the Teensy 3.2 doesn't have an FPU, the CMSIS version of the FFT is faster than my generic C FFT...for Float32, the CMSIS FFT is 1.9x faster. Pretty sweet.

For giggles, I also tested a K66 board, which should perform very closely to the new Teensy 3.6. It crushes everything on floating point calculations. Against the Teensy 3.2, the K66 Float32 FFT was 14x faster. Wow. What was even more surprising was that, for this FFT test, the K66 was actually faster on Float32 than on Int32.

Chip
 
Last edited:
Status
Not open for further replies.
Back
Top