How does the inner loop of AudioSynthWaveformSine work?

Status
Not open for further replies.
I've been experimenting with building my own AudioSynthXXX objects, working towards the eventual goal of doing some very basic physical modelling synthesis on a Teensy 4.1.

My first try used floating point maths and a sin function within the update() loop. This didn't work well, the sound was extremely distorted - I think I was going way over my budget in terms of processor operations per second and causing overruns.

So in an attempt to learn how the inner loop of a synthesis object needs to work, I'm referring to the source code of the AudioSynthWaveformSine object on the Teensy Github. There are several aspects about this code that I don't understand, so I was wondering if someone could explain it to me?

Code:
void AudioSynthWaveformSine::update(void)
{
	audio_block_t *block;
	uint32_t i, ph, inc, index, scale;
	int32_t val1, val2;

	if (magnitude) {
		block = allocate();
		if (block) {
			ph = phase_accumulator;
			inc = phase_increment;
			for (i=0; i < AUDIO_BLOCK_SAMPLES; i++) {
				index = ph >> 24;
				val1 = AudioWaveformSine[index];
				val2 = AudioWaveformSine[index+1];
				scale = (ph >> 8) & 0xFFFF;
				val2 *= scale;
				val1 *= 0x10000 - scale;
#if defined(__ARM_ARCH_7EM__)
				block->data[i] = multiply_32x32_rshift32(val1 + val2, magnitude);
#elif defined(KINETISL)
				block->data[i] = (((val1 + val2) >> 16) * magnitude) >> 16;
#endif
				ph += inc;
			}
			phase_accumulator = ph;
			transmit(block);
			release(block);
			return;
		}
	}
	phase_accumulator += phase_increment * AUDIO_BLOCK_SAMPLES;
}

My questions are:
  1. This code uses a wavetable (AudioWaveformSine). Is this absolutely necessary to achieve acceptable performance on a Teensy 4.1?
  2. It's using linear interpolation between points in the wavetable, but I don't understand the bitwise maths. What is the purpose of the scale = (ph >> 8) & 0xFFFF; statement and the following lines and why can't we just use normal multiplication and division?
  3. Why is the multiply_32x32_rshift32(val1 + val2, magnitude); statement used instead of normal math operators and what is it doing?
  4. Is it possible to do floating point math inside an audio update() function without overruns?

Thanks!
 
Hi Jeremiah,

I'll give a few basic answers and maybe one of the experts can fill in the details.

  1. This code uses a wavetable (AudioWaveformSine). Is this absolutely necessary to achieve acceptable performance on a Teensy 4.1?
    A Teensy4.1 should be able to compute sin() and use floats with plenty of spare capacity - can you post the code you were trying that uses sin.
  2. It's using linear interpolation between points in the wavetable, but I don't understand the bitwise maths. What is the purpose of the scale = (ph >> 8) & 0xFFFF; statement and the following lines and why can't we just use normal multiplication and division?
    The library objects are highly optimised (the bitwise maths) you can change these to simple multiplies and divides for learning purposes or to try something different, on the T4.1 there will be loads of headroom to be able to cope with this. On the bit wise >>8 is dividing by 2 to the power of 8 (256), the & 0xFFFF reserves the first 16 bits (FFFF) of the answer.
  3. Why is the multiply_32x32_rshift32(val1 + val2, magnitude); statement used instead of normal math operators and what is it doing?
    Again this is just optimisation calling a function in dspinst.h to multiply (val1 + val2) * magnitude / 2 to the power of 32 using direct calls to the microprocessor. You could replace it (and the earlier val & scale statements) with a linear interpolation formula using normal maths.
  4. Is it possible to do floating point math inside an audio update() function without overruns?
    Yes no problem at all - just remember return it to a int16_t when putting back in the audio_block_t

Hope that helps a bit, Cheers, Paul
 
Keep in mind this code was originally written in 2014 for Teensy 3.0 and 3.1. When talking of how to "achieve acceptable performance", remember it is also meant for a those older boards which don't have a FPU.

On Teensy 4.x, use of 32 bit float is nearly the same speed as integers, so perhaps a floating point version could offer similar speed. Maybe. Cortex-M7 is complicated, with in-order dual issue for integer operations, but how often it really manages to execute 2 instructions per clock for this particular code is a question nobody (as far as I know) has really studied.

I can tell you I did spend a lot of time optimizing this code in 2014, but for Cortex-M4 without FPU.
 
Actually there is no need for a lookup table for fast sine generation - there are several schemes for harmonic oscillators
using only a couple of multiples and adds. They are a little tricky though, and need careful attention to long-term
amplitude stability, especially for low frequencies. search terms include "magic circle oscillator", "biquad oscillator",
"waveguide oscillator"

Here's a good starting point if you're interested in these approaches: https://www.njohnson.co.uk/pdf/drdes/Chap7.pdf
 
Status
Not open for further replies.
Back
Top