Teensy 4.0 As a Live Vocal Processor?

Status
Not open for further replies.

ElectricBearSFO

New member
Assuming that Teensy 4.0 has audio processing capabilities that surpasses by far the capability of other Teensy versions, is anyone working on a vocal pitch shifter using non-granular formant methods? I'm looking for an ultra-compact pitch-correcting live harmonizing effects vocal processor for singing. Oh and let's throw in reverb in the mix as well.
 
Assuming that Teensy 4.0 has audio processing capabilities that surpasses by far the capability of other Teensy versions, is anyone working on a vocal pitch shifter using non-granular formant methods? I'm looking for an ultra-compact pitch-correcting live harmonizing effects vocal processor for singing. Oh and let's throw in reverb in the mix as well.

Except for the reverb part, the rest is way beyond my skill set. Is anyone willing to take on this challenge?
 
Whilst the Teensy 4.0 is powerful, having tried something similar on an STM32H750 which is of similar performance, I think you will struggle. I've migrated to a Pi4 for high powered audio processing.
 
Do you use the PI4 for real time audio processing? Is this through some OS, or bare metal? I feel like the teensy 4 is very well optimized for this (real time requires low latency). I wonder how the teensy stacks against the ADAU1701 or ADAU1452.
 
Do you use the PI4 for real time audio processing? Is this through some OS, or bare metal? I feel like the teensy 4 is very well optimized for this (real time requires low latency). I wonder how the teensy stacks against the ADAU1701 or ADAU1452.

Hi Jay. Yes the Pi is used for real time audio. Two cores run Linux. I try to make one do all general housekeeping whilst the other focuses on input control - MIDI, HID, etc. The other two cores are (supposedly) isolated from Linux by the isolcpu command. I do see the occasional interrupt but nothing fatal. One core processes inputs whilst the other is reserved for output signal (re)construction.

The original system used six STM32H750s is a 3 by 2 SPI connected mesh, but this had several problems so I was migrating to the STM32MP157 when the Pi4 came out so I changed to using that.

Biggest advantage of the Pi is it's 64 bit so rather than constantly having to think whether to use single or double precision integer, or floating point, in the maths, everything is 64 bit integer. Each core is about 6 to 10 times faster than a H750 which is similar to a Teensy 4.0.

However the thing missing from the Pi is SPDIF support and I did first use a WM8804 for this but I didn't like the clocking arrangements so was considering putting one H750 back in the system when I saw the Teensy4.0 release announcement. It hasn't arrived yet but I've already got a PCB respin done with one of these added to the system linked to the Pi by SPI to do all SPDIF and clock processing. It also has two more I2S outputs which are useful as secondary ports.

Biggest problem with the Pi is its shutdown time - its excrutiatingly slow even by Linux standards. It simply hasn't been optimised as I assume nobody considers it important. But of course audio equipment is renowned for having the mains plugged pulled so working on the best way to get round this - either a UPS or some other form of allowing fast shutdown without data loss.

Hope this helps whatever you are thinking about doing. I'd love to hear more details if you can.

Mike
 
Hi Mike,

You've certainly peeked my interest with the PI4. I'm interested in all things audio. Live pro audio is my background, and I'm working on gadgets to create my own pro audio mixer. My goal is to create a series of DAC/ADC modules that can be plugged into a DSP (teensy 4? Pi4? FreeDSP) and create just the right combination of inputs and outputs for any given real time processing task. I'm desinging my modules to be DSP agnostic.

I'm currently working on a Teensy 4 Audio Breakout that should help me towards this goal on the teensy. I also am playing with the FreeDSP Aurora.

Jay
 
Hi Mike,

You've certainly peeked my interest with the PI4. I'm interested in all things audio. Live pro audio is my background, and I'm working on gadgets to create my own pro audio mixer. My goal is to create a series of DAC/ADC modules that can be plugged into a DSP (teensy 4? Pi4? FreeDSP) and create just the right combination of inputs and outputs for any given real time processing task. I'm desinging my modules to be DSP agnostic.

I'm currently working on a Teensy 4 Audio Breakout that should help me towards this goal on the teensy. I also am playing with the FreeDSP Aurora.

Jay

Yeah I've been interested in audio for about fifty years and was at Soundcraft for a time 20 years ago. Now I design stuff on contract but have been thinking about doing my own brand.

Being DSP agnostic is good. Things move on constantly. Key thing is definition of the backplane to make sure it can do everything you need so make sure the backplane can handle far more traffic than you think you need.

But I've never been keen on Xmos I'm afraid. A bit like Wolfson before them - more mass market and less pro audio. Also I prefer the microphone amplifiers right by the back of the XLRs, not at the end of a cable running from them where they can pick up digital noise, but maybe that's my own bias.
 
Yeah I've been interested in audio for about fifty years and was at Soundcraft for a time 20 years ago. Now I design stuff on contract but have been thinking about doing my own brand.

Well it's nice to meet you. I'm far from being a professional engineer. I'm self taught, just trying hard to get what it is I want. I'm glad we connected. Maybe we can help eachother out. We seem to have some similar projects. I'll post about my progress. I'm close to releasing the schematic and PCB images for the Audio Breakout. I might put one together first before I post. Not sure if I want to post it here, or on DiyAudio. Maybe both, I don't know yet! Trying to find the most helpful crew to bounce ideas off. :)
 
I was working on a pitch shifter for the T3.6 that implements the Ocean's pitch shifting algorithm. It is a spin on the concept of the phase vocoder. The basic process is

1) Use overlapping STFT to get the initial frequency/phase bins.
2) Move each bin from it's original index to a new index based on the pitch scaling factor. For example, for octave up (scale factor 2). This will cause a sinusoid component to be created at the new desired frequency. Scale factors < 1.0 pitch down, > 1.0 pitch up.
newBinIndex = round(oldBinIndex * 2.0f)
3) The shifted sinusoid will have incorrect instantaneous phase so the phase is corrected based on the phase for the original bin, it's frequency and the scaling factor.
4) Perform inverse STFT to get back to time domain, apply a synthesis windowing (hanning) function, then overlap add to the output.

The main difference versus a traditional phase vocoder is you are not achieving pitch shifting by resampling, you are simply scaling the complex sinusoid components directly. Pitch shifting is very difficult, this method has serious drawbacks just like every other method.

Here's why I paused the project.
- The T3.6 is stuck using the old CMSIS library which has poor selection of FFT sizes. E.g. 512 and 2048 where 1024 is probably the best size for this application. My understanding is the T4 will use a much newer version of CMSIS that offers better FFT selection sizes.
- The T3.6 is more than fast enough to perform the FFTs in real time. The problem is the phase correction steps require calculating a lot of cos() and sin() values which even using the CMSIS library is very slow. The T4 has a newer FPU with double precision that I'm hoping will allow far more acceleration of these computations.
- The T3.6 code I had actually working was using > 90% of the T3.6 and the algorithm still need more tweaking (like IIR filtering to enhance tone, etc.).

The T4 should provide more than enough power to implement the Ocean's algorithm with lots of processing power left to spare. I'm looking forward to completing the pitchshifter using the T4 once I get it working on myTGA Pro.

My development branch for my pitch shfiter can be found in the BALibrary on a branch called 'feature/pitchshifter'. You can find the branch here. Take a look at the AudioEffectPitchShift class.
 
I was working on a pitch shifter for the T3.6 that implements the Ocean's pitch shifting algorithm. It is a spin on the concept of the phase vocoder.

Fair enough, but the OP did ask about running something far more complicated, probably using the Wavelet transform. The Teensy4.0 should be more than powerful enough for what you are doing and hopefully you'll get yours soon and make progress for which I hope you succeed.

BTW, you mention having so calculate cos() and sin() values slowing you down. These are normally pre-calculated in an array as you need the same values time and time again if the Fourier transforms are all the same size. Or am I missing something about your algorithm ? I haven't looked at the library you mentioned yet.
 
I
- The T3.6 is more than fast enough to perform the FFTs in real time. The problem is the phase correction steps require calculating a lot of cos() and sin() values which even using the CMSIS library is very slow. The T4 has a newer FPU with double precision that I'm hoping will allow far more acceleration of these computations.
The Teensy 3.5 and 3.6 only support hardware single precision 32-bit floating point (i.e. float type). Double precision (i.e. double type) is done completely via software emulation. This means on the 3.5/3.6, you should:
  • Use float everywhere;
  • On floating point constants use the 'f' suffix. On the 3.5/3.6, teensydunio actually uses a compiler option to say all constants are single precision (which causes other problems), but it is a good habit to get into to use 'f' on any single precision floating point constant;
  • Use the single precision floating point libraries (i.e. cosf, not cos).

Now, the Teensy 4.0 does have both double precision 64-bit and single precision 32-bit floating point hardware, so if you are moving exclusively to the Teensy 4, you can use the double precision math functions.

However, in general, if you don't need more than 6 digits of accuracy, on most machines, it is faster to use single precision.

However, there are machines (such as the PowerPC that I work on for my day job) where that is not the case, and double precision is faster than single precision for most operations (internally, the machine keeps everything in the 64-bit double format in registers, and the single precision instruction does the double precision instruction, and then rounds the value down to single precision, but keeps it in double precision format).
 
BTW, you mention having so calculate cos() and sin() values slowing you down. These are normally pre-calculated in an array as you need the same values time and time again if the Fourier transforms are all the same size. Or am I missing something about your algorithm ? I haven't looked at the library you mentioned yet.

In this algorithm you are not using the same values over and over again so they can't be precomputed like for wave synthesis. You are taking the input complex phasor and multiplying in by another phase-shifed phasor where the argument of this complex sinusoid is a function of the current phase, the frequency and the scaling factor.

Code:
void AudioEffectPitchShift::m_ocean(float *inputFreq, float *outputFreq, float frameIndex, float pitchScale)
{
    // zero the output buffer
    for (unsigned i=0; i<(2*SYNTHESIS_SIZE); i++) {
        outputFreq[i] = 0.0f;
    }

    float phaseAdjustFactor = -((2.0f*((float)(M_PI))*frameIndex)
            / (OVERLAP_FACTOR_F * FFT_OVERSAMPLE_FACTOR_F * SYNTHESIS_SIZE_F));

    for (unsigned k=1; k < SYNTHESIS_SIZE/2; k++) {

        float a = (float)k;
        // b = mka + 0.5
        // where m is the FFT oversample factor, k is the pitch scaling, a
        // is the original bin number
        float b = std::roundf( (FFT_OVERSAMPLE_FACTOR_F * pitchScale * a));
        unsigned b_int = (unsigned)(b);

        if (b_int < SYNTHESIS_SIZE/2) {

            // phaseAdjust = (b-ma) * phaseAdjustFactor
            float phaseAdjust = (b - (FFT_OVERSAMPLE_FACTOR_F * a)) * phaseAdjustFactor;

            float a_real = inputFreq[2*k];
            float a_imag = inputFreq[2*k+1];

            outputFreq[2*b_int]   = (a_real * arm_cos_f32(phaseAdjust)) - (a_imag * arm_sin_f32(phaseAdjust));
            outputFreq[2*b_int+1] = (a_real * arm_sin_f32(phaseAdjust)) + (a_imag * arm_cos_f32(phaseAdjust));
        }
    }
}
 
In this algorithm you are not using the same values over and over again so they can't be precomputed like for wave synthesis. You are taking the input complex phasor and multiplying in by another phase-shifed phasor where the argument of this complex sinusoid is a function of the current phase, the frequency and the scaling factor.

Ok took me a while but try this. It won't be absolutely correct but should give you a strategy.

For k=1 calculate phaseadjust as you are doing so you need a starting cos() and sin() values.

Now if you rewrite the phaseadjust formula as

phaseadjust = ((FFT_OVERSAMPLE_FACTOR_F * pitchScale * a) - (FFT_OVERSAMPLE_FACTOR_F * a)) * phaseAdjustFactor;

This reduces to

phaseadjust = ((FFT_OVERSAMPLE_FACTOR_F * (pitchScale - 1) * a) * phaseAdjustFactor;

Hence you are actually calculating the cos() and sin() of a number that is stepping up
.... 2C, 3C, 4C where C = FFT_OVERSAMPLE_FACTOR_F * (pitchScale - 1) * phaseAdjustFactor

There are fast calculations for all of these that don't involve finding cos() or sin() again.
For example
cos(2C) = cos^2(C) - sin^2(C)
sin(3C) = 3 * sin(C) - 4 * sin^3(C)
.....

There are derivable formulae for any multiplier that is always based on just the single cos(A) and sin(A) value. What you probably need to do is program in how these formulae are derived rather than a formula for each value of a (or k).

Hope this helps - it will be much faster.
 
In this algorithm you are not using the same values over and over again so they can't be precomputed like for wave synthesis. You are taking the input complex phasor and multiplying in by another phase-shifed phasor where the argument of this complex sinusoid is a function of the current phase, the frequency and the scaling factor.

Ok took me a while but try this. It won't be absolutely correct but should give you a strategy.

For k=1 calculate phaseadjust as you are doing so you need a starting cos() and sin() values.

Now if you rewrite the phaseadjust formula as

phaseadjust = ((FFT_OVERSAMPLE_FACTOR_F * pitchScale * a) - (FFT_OVERSAMPLE_FACTOR_F * a)) * phaseAdjustFactor;

This reduces to

phaseadjust = ((FFT_OVERSAMPLE_FACTOR_F * (pitchScale - 1) * a) * phaseAdjustFactor;

Hence you are actually calculating the cos() and sin() of a number that is stepping up
.... 2C, 3C, 4C where C = FFT_OVERSAMPLE_FACTOR_F * (pitchScale - 1) * phaseAdjustFactor

There are fast calculations for all of these that don't involve finding cos() or sin() again.
For example
cos(2C) = cos^2(C) - sin^2(C)
sin(3C) = 3 * sin(C) - 4 * sin^3(C)
.....

There are derivable formulae for any multiplier that is always based on just the single cos(A) and sin(A) value. What you probably need to do is program in how these formulae are derived rather than a formula for each value of a (or k).

Hope this helps - it will be much faster.

@MikeDB is correct, you can substantially speed up your (@Blackaddr) algorithm by using a trig equalities trick for a constant stepped sine, cos. That being said, in my work on the Oceans algorithm has shown to be less than ideal in pitch shifting in that it's output is modulated, the original-paper appears to reference this but does not show how they unmodulated it! Also it phase correction is a bit wonky and produces muddled waveforms in my tests along with the modulation.

Any-who, here is the equality you (@Blackaddr) are looking for.

  • sin([n+1] * a) = sin(n*a) * cos(a) + cos(n*a) * sin(a)
  • cos([n+1] * a) = cos(n*a) * cos(a) - sin(n*a) * sin(a)

I have a working Audio object of a ok phase vocoder optimized for 1024 floating point FFT with a overlap of 8 which I will be posting to GitHub in the near future. The only caveat is that you have the ARM dsp version 1.6.0 because I lifted the arm split fft-ifft trick with real valued waveforms, meaning you can compute a real valued 1024 point fft-ifft using a split complex valued 512 point fft-ifft which is significant speed up. This is what the arm_rfft_fast_f32 does but I calculated my own split tables so not import those huge tables. My vocoder it is still very processor intensive and a Teensy 3.5 overclocked to 168MHz will use 90% of it's processing time.

My phase vocoder is based off the
Stephan M. Bernsee algorithm but is highly optimized to squeeze every cycle I could out of it while still producing identical outputs.

A little rant on optimizing: I find more and more that the biggest speed ups are not code level optimizations but math optimizations. Using programs like Mathmatica or Wxmaxima, you can simplify things down to the bare bones which I find the compiler just can't do a lot of the times. In making this vocoder I was able to reduce the algorithm using wxmaxima to not use any divides and precomputed all the coefficients. Using this and code level optimizations I was able to take an algorithm that originally took ~15 msecs and make it complete in ~2.7 msecs using the Teensy 3.5 at 168MHz!
 
A little rant on optimizing: I find more and more that the biggest speed ups are not code level optimizations but math optimizations. Using programs like Mathmatica or Wxmaxima, you can simplify things down to the bare bones which I find the compiler just can't do a lot of the times. In making this vocoder I was able to reduce the algorithm using wxmaxima to not use any divides and precomputed all the coefficients. Using this and code level optimizations I was able to take an algorithm that originally took ~15 msecs and make it complete in ~2.7 msecs using the Teensy 3.5 at 168MHz!

I totally agree - getting the maths right is far more important than optimising the odd instruction out of the code.

And when I want autotune, I start with wavelets, not short FFTs as selecting the correct algorithm in the first place is even more important.
 
I am about to publish a set of Audio Library compatible objects for the T3.6 that were created for live performance. Unfortunately I had to stop working on the project last year for health reasons, but the system has been used in prototype form for professional live performances.
As of now the system includes:
1) An STFT (short-term-Fourier-Transform) channel encoder with variable number of channels, adjustable sibilance, and with internal and external carriers.
2) A flexible audio compressor
3) A flexible audio limiter
4) A multi-tap delay line with feedback
5) shelf filters

The prototype also includes a touchscreen/encoder based menu system for setting parameters and EEPROM storage/recall of settings for individual songs.

When I was forced to stop I was working on a phase encoder for pitch shifting. I'll put it up on GitHub in the next few days with some sample audio files...
While I haven't yet tried it, I see no reason why it shouldn't work on a T4.
 
here is my first release of the vocoder for the Audio library -> https://github.com/duff2013/AudioEffectVocoder

There are two examples that pitch shift (+- semitones) a female singing a short clip which should be self explanatory on how to use. These examples use usb audio but can be adapted like any other Audio Object to use the Audio Board, etc. Currently only one 'voice' can be used but in the future I will add the ability to add more voices meaning that you can pitch shift a single input to multiple pitch shifted outputs.

Have not tested with Teensy 4 yet but it should work and the T3.6 should be at least 180MHz cpu speed.

Edit: You need to have the arm CMSIS dsp version 1.6:( Sorry but I needed this version, dsp ver. 1.5 might work also though haven't tried it. If anyone wants explainer on how to upgrade I can post a how-to also.
 
That being said, in my work on the Oceans algorithm has shown to be less than ideal in pitch shifting in that it's output is modulated, the original-paper appears to reference this but does not show how they unmodulated it! Also it phase correction is a bit wonky and produces muddled waveforms in my tests along with the modulation.

The paper published by the Ocean authors had a typo in the phase calculation. The correct phase adjustment and demodulation function can be obtained from their public JAVA code of the algorithm, in fact their is a comment about this in their JAVA code too. Yes, it sounds terrible with the phase calculation as presented in the paper (wrong). That's why the demos on their website sound better.
 
The link to the JAVA source code implementation is here.

The rotation applied to the phaser in the paper is given as
Code:
-i * (b-ma) * p * 2*PI / (m * O * N)

The correct phase shift is
Code:
// Remove the N from the denominator
-i * (b-ma) * p * 2*PI / (m * O)
 
The link to the JAVA source code implementation is here.

The rotation applied to the phaser in the paper is given as
Code:
-i * (b-ma) * p * 2*PI / (m * O * N)

The correct phase shift is
Code:
// Remove the N from the denominator
-i * (b-ma) * p * 2*PI / (m * O)
Thanks for the link, I'll check out Java code to see the demodulation technique, did you implement it yet in your Audio Object?
 
Duff,

I have been reading over your pitch shifting phase vocoder code. I cant wait to give it a try! That said, I would like to do time stretching on short pieces of audio without pitch shifting and from looking at your code, it is not immediately clear how to implement this. Originally I thought should just be able to change the time scaling in the ifft part, but after reading on the arm cortex dsp functions it seems like arm_cfft_f32 cant be used like this. Looking at your code it does not jump out at me what to change...i think time stretching code-wise should be easier than pitch shifting. Anyhow, if you have thoughts on this I would appreciate it. Thanks!
 
I would like to do time stretching on short pieces of audio without pitch shifting and from looking at your code, it is not immediately clear how to implement this. Originally I thought should just be able to change the time scaling in the ifft part, but after reading on the arm cortex dsp functions it seems like arm_cfft_f32 cant be used like this.
I'm not exactly sure at the moment how this would be done, are you looking to slow down the tempo of a piece of music while keeping the same pitch like the Amazing Slow Downer? In the future, probably best to start another thread for this.
 
Status
Not open for further replies.
Back
Top