Teensy4 performance in audio context

Status
Not open for further replies.

martin42

New member
Hi there, I have already purchased 2 pieces of new teensy 4 and I am really on my toes waiting for a day when it's finally gonna arrive.

I had my fair share of development on this back on version 3.2 but since that day it obviously received major update in terms of performance. I did a simple drum synth, eg. Two sinewaves as FM, noise & filter with envelopes, pitch envelope and simple compressor, with oled display on top.
Also I tried running mutable Braids on this and it worked just fine.

Will I be able to scale this to 8 Voices? I mean it ran quite well on 3.2 and I'm thinking about some other things that I might be able to throw at it if it can hold on.

So my question is... How is this boards function in terms of DSP performance? And how big is the performance tax of using floating point numbers?
Cheers guys
 
And how big is the performance tax of using floating point numbers?
Cheers guys

Obviously it depends on how much you used floating point, and whether your floating point was adds/subtracts, or whether it had a lot of divisions, calls to trig functions, etc.

For the record:
  • Teensy LC, 3.0, 3.1, and 3.2 had no hardware floating point support at all. All floating point was done by emulation functions;
  • Teensy 3.5 and 3.6 had hardware support for single precision floating point, and reverted to software emulation for double precision. This meant it was an advantage to only use single precision floating point, i.e. using explicit 'f' on floating point constants, using the 'float' keyword, and using the float version of the math functions (i.e. 'sinf instead of 'sin'). If there were doubles in the expression, or the double math function was used, the compiler would convert the floats to doubles;
  • Teensy 4.0 has hardware support for both single precision and double precision. If you are going for the ultimate in speed, it may be faster to use single precision like in the Teensy 3.5/3.6, but it may not -- you would want to measure it yourself to see how it works with your code.
 
The processing power is there to do a lot of polyphony. In my humble opinion the limiting factor is the 16 bit math that is used to mix everything together.

I have been working on a "big" synth that uses the mutable oscillator code, at some point soonish I will hopefully be able to demo it.

It starts sounding pretty crunchy with 4 voices playing at once. In some cases (lots of effects and stuff) even just 1 voice sounds crunchy.

If I am able to continue working on the synth, I think I'd like to fork the Audio library and have it use float all the way through (until the codec obviously). That should hopefully really help clean the sound up.
 
The processing power is there to do a lot of polyphony. In my humble opinion the limiting factor is the 16 bit math that is used to mix everything together.

I have been working on a "big" synth that uses the mutable oscillator code, at some point soonish I will hopefully be able to demo it.

It starts sounding pretty crunchy with 4 voices playing at once. In some cases (lots of effects and stuff) even just 1 voice sounds crunchy.

If I am able to continue working on the synth, I think I'd like to fork the Audio library and have it use float all the way through (until the codec obviously). That should hopefully really help clean the sound up.

If the issue is 16-bits, using float will help a little, but you've just moved the goal post out just a little. Float has 23 bits of precision for the mantissa, plus the hidden bit. So you can represent numbers in the range -16777215 to 16777215 without loss. And of course you can't do somethings in float, such as binary AND, binary OR, shifting, etc. Maybe use int32 instead? Or maybe you need double (54 bits including hidden bit) or int64? Note, double should only be used on the Teensy 4.0, while float would work on the Teensy 3.5/3.6 as well as the Teensy 4.0. Int32 (and int64) will work on all platforms.

And of course if you use a non int16 type, your code will be slower because the Audio library can't use the 16-bit SIMD instructions it uses, and because the data sizes will be doubled or larger, and that can have cache memory effects in the Teensy 4.0. But sometimes you are willing to trade raw speed for functionality.
 
So my question is... How is this boards function in terms of DSP performance?

As a very rough guideline, the DSP performance is approx 11 times what you had with Teensy 3.2. So has a very crude guess, if you were able to synthesize 2 voices on Teensy 3.2, you'll probably be able to do 22 on Teensy 4.0.


And how big is the performance tax of using floating point numbers?

For "normal" C / C++ code, you might expect somewhere between 10 to 40% less. While 32 bit float operations are approximately the same speed as 32 bit integers, Cortex-M7 has 2 integer execution units but only 1 FPU. Benchmarking shows about 50% speedup relative to Cortex-M4 at the same clock speed. But not all of that 50% comes from the dual issue pipeline being able to sometimes execute 2 instructions in the same clock cycle. Some of it comes from branch prediction, which M7 has but M4 does not.

But some of the DSP code in the audio library is anything but normal. For highly optimized code taking advantage of the DSP extension instructions, especially the SIMD multiply-accumulate, and using tricks like packing 16 bit samples into the 32 bit registers (double the memory bandwidth), you could expect a huge hit in performance by converting to "normal" programming. Code using those sorts of intense optimizations does DSP on 16 bit samples much faster than more ordinary programming can accomplish.




It starts sounding pretty crunchy with 4 voices playing at once. In some cases (lots of effects and stuff) even just 1 voice sounds crunchy.
...
If I am able to continue working on the synth, I think I'd like to fork the Audio library and have it use float all the way through (until the codec obviously). That should hopefully really help clean the sound up.

I believe Chip (the guy working on that hearing aid project) made a fork of the library using floats. Maybe that might help, or at least give you a head start?

Whether it actually helps is a good question. If you decide to publish your synth code and you can give a reproducible test case that demonstrates the "chunky" sound, I'd be curious to take a look.


If the issue is 16-bits, using float will help a little, but you've just moved the goal post out just a little. Float has 23 bits of precision for the mantissa, plus the hidden bit.

Floats can really help if something is clipping, or if some part of the algorithm is using relatively few bits, like the FreeVerb code might be doing. The automatic scaling by the exponent can be really nice.

But ultimately the DAC output is an integer with effectively 15 to at best 18 bits. DACs that really use more than 16 bits are rare. Sure, they all say 24 bits, but the noise floor makes those low 7 to 9 bits worthless. Like so much consumer and even pro audio stuff, it's all a bunch of gaming the numbers. Nearly all DACs have A-weighted specs, which lets them claim a somewhat higher number like 100 to 110 dB dynamic range (but rarely SINAD - the true measure of the DAC's perfection). By the time you take away the spectral weighting and divide by ~6 for effective number of bits, the reality is even the very best DACs are using barely more than 16 bits.
 
Status
Not open for further replies.
Back
Top