Real-Time Vocal Formant Shifter

djketley

Member
Hi everyone,

I've been working on implementing a formant shifter based on the OpenAudio Formant shifter, but I've encountered some issues that I could use some help with. Here's a breakdown of my implementation and the challenges I'm facing:
#ifndef _AudioEffectFormantShiftFD_OA_F32_h
#define _AudioEffectFormantShiftFD_OA_F32_h
#include "AudioStream_F32.h"
#include <arm_math.h>
#include "FFT_Overlapped_OA_F32.h"
#include <memory>
// Set frequency range constants
const float minFreq = 100.0f;
const float maxFreq = 20000.0f;
// FIR filter setup
#define BLOCK_SIZE 128 // Should grab this from audio audio_block_samples really to ensure its always correct
#define NUM_TAPS 256
#define CUTOFF_FREQ 100.0f // Hz
class AudioEffectFormantShiftFD_OA_F32 : public AudioStream_F32 {
public:
AudioEffectFormantShiftFD_OA_F32() : AudioStream_F32(1, inputQueueArray_f32) {
}

AudioEffectFormantShiftFD_OA_F32(const AudioSettings_F32 &settings) :
AudioStream_F32(1, inputQueueArray_f32), sample_rate_Hz(settings.sample_rate_Hz) { }

AudioEffectFormantShiftFD_OA_F32(const AudioSettings_F32 &settings, int _N_FFT) :
AudioStream_F32(1, inputQueueArray_f32) {
setup(settings, _N_FFT);
}
void setOversamplingFactor(int factor) { // Set the oversampling factor (currently unused)
oversampling_factor = factor; // Currently unused
}
int setup(const AudioSettings_F32 &settings, int _N_FFT) {
std::vector<float> firCoeffs;
calculateFIRCoefficients(firCoeffs, NUM_TAPS, CUTOFF_FREQ);
window_coeffs = std::make_unique<float[]>(BLOCK_SIZE);
computeHannWindow(window_coeffs.get(), BLOCK_SIZE);
sample_rate_Hz = settings.sample_rate_Hz;

int N_FFT = myFFT.setup(settings, _N_FFT);
if (N_FFT < 1) return N_FFT;

N_FFT = myIFFT.setup(settings, _N_FFT);
if (N_FFT < 1) return N_FFT;
//Set windowing function
bool useMyFFTWindow = true;
if(useMyFFTWindow){
myFFT.getFFTObject()->useHanningWindow();
if (myIFFT.getNBuffBlocks() > 3) {
myIFFT.getIFFTObject()->useHanningWindow();
}
}
// Print FFT parameters for debugging
printFFTParameters(settings, N_FFT);
// Allocate memory for frequency domain data
complex_2N_buffer = std::make_unique<float32_t[]>(2 * N_FFT);
enabled = 1;
return N_FFT;
}
float setScaleFactor(float scale_fac) {
shift_scale_fac = (scale_fac < 0.00001f) ? 0.00001f : scale_fac;
return shift_scale_fac;
}
float getScaleFactor() const {
return shift_scale_fac;
}
virtual void update();
private:
int enabled = 0;
int oversampling_factor = 1;
float sample_rate_Hz = 44117.0f;
float shift_scale_fac = 1.0f;
std::unique_ptr<float32_t[]> complex_2N_buffer;
audio_block_f32_t *inputQueueArray_f32[1];
FFT_Overlapped_OA_F32 myFFT;
IFFT_Overlapped_OA_F32 myIFFT;
float envelope_coeffs[NUM_TAPS] = { };
float32_t firStateF32[NUM_TAPS + BLOCK_SIZE - 1];
float32_t envelope_buffer[BLOCK_SIZE];
float32_t overlap_buffer[BLOCK_SIZE]; // Declaration of overlap buffer
void printFFTParameters(const AudioSettings_F32 &settings, int N_FFT) {
Serial.println("AudioEffectFormantShiftFD_OA_F32: FFT parameters...");
Serial.print(" : N_FFT = "); Serial.println(N_FFT);
Serial.print(" : audio_block_samples = "); Serial.println(settings.audio_block_samples);
Serial.print(" : FFT N_BUFF_BLOCKS = "); Serial.println(myFFT.getNBuffBlocks());
Serial.print(" : IFFT N_BUFF_BLOCKS = "); Serial.println(myIFFT.getNBuffBlocks());
Serial.print(" : FFT use window = "); Serial.println(myFFT.getFFTObject()->get_flagUseWindow());
Serial.print(" : IFFT use window = "); Serial.println(myIFFT.getIFFTObject()->get_flagUseWindow());
}
std::unique_ptr<float[]> window_coeffs;
void computeHannWindow(float *window_coeffs, int length) { // Compute a Hann window for the audio signal to be applied before the audio is proccesed
for (int i = 0; i < length; i++) {
window_coeffs = 0.5f * (1.0f - cosf(2.0f * M_PI * static_cast<float>(i) / static_cast<float>(length)));
Serial.println(window_coeffs);
}
}
void applyWindow(audio_block_f32_t *audio_block) { // Apply the window function to the envelope
for (int i = 0; i < BLOCK_SIZE; i++) {
audio_block->data *= window_coeffs;
}
}
void normalizeInputAudio(audio_block_f32_t *audio_block) {
const float noise_gate_threshold = 0.01f; // Adjust this value based on your needs
float max_val = -1.0f;
for (int i = 0; i < BLOCK_SIZE; i++) {
max_val = fmax(max_val, fabs(audio_block->data));
}
if (max_val > noise_gate_threshold) { // Normalize the audio signal
float normalization_factor = 1.0f / max_val;
for (int i = 0; i < BLOCK_SIZE; i++) {
audio_block->data *= normalization_factor;
}
Serial.print("Normalization factor: ");
Serial.println(normalization_factor);
}
}
void computeEnvelope(const audio_block_f32_t *audio_block) { // Compute the envelope of the audio signal using FIR filter
arm_fir_instance_f32 firInstance;
arm_fir_init_f32(&firInstance, NUM_TAPS, envelope_coeffs, firStateF32, BLOCK_SIZE);
arm_fir_f32(&firInstance, audio_block->data, envelope_buffer, BLOCK_SIZE);
}
void performFFT(audio_block_f32_t *audio_block) { // Perform an FFT on the audio signal and store in complex buffer
myFFT.execute(audio_block, complex_2N_buffer.get());
}
void performIFFT() { // Perform an IFFT on the complex buffer and rebuild the audio signal
audio_block_f32_t *out_audio_block = myIFFT.execute(complex_2N_buffer.get());
AudioStream_F32::transmit(out_audio_block);
}
void calculateIdealImpulseResponse(std::vector<float>& impulseResponse, int numTaps, float cutoffFreq) { // Calculate the ideal impluse response for the FIR Filter
int midPoint = numTaps / 2;
for (int i = 0; i < numTaps; i++) {
if (i == midPoint) {
impulseResponse = 2 * cutoffFreq;
} else {
impulseResponse = sin(2 * M_PI * cutoffFreq * (i - midPoint)) / (M_PI * (i - midPoint));
}
}
}
void applyHammingWindow(std::vector<float>& impulseResponse, int numTaps) { // Apply a Hamming window to the impulse response
for (int i = 0; i < numTaps; i++) {
impulseResponse *= 0.54 - 0.46 * cos(2 * M_PI * i / (numTaps - 1)); // Hamming Window
envelope_coeffs = impulseResponse;
}
}
// Function to calculate FIR filter coefficients dynamically
void calculateFIRCoefficients(std::vector<float>& coeffs, int numTaps, float cutoffFreq) {
coeffs.resize(numTaps);

// Calculate the ideal impulse response
calculateIdealImpulseResponse(coeffs, numTaps, cutoffFreq);

// Apply the Hamming window
applyHammingWindow(coeffs, numTaps);
}

void shiftFormants() { // Shift the formants
int fftSize = myFFT.getNFFT();
int N_2 = fftSize / 2 + 1;
float orig_mag[N_2];
arm_cmplx_mag_f32(complex_2N_buffer.get(), orig_mag, N_2); // Get the magnitude of the complez buffer
for (int dest_ind = 0; dest_ind < N_2; dest_ind++) {
float source_ind_float = static_cast<float>(dest_ind) / shift_scale_fac;
int mirrored_source_ind = mirrorIndex(source_ind_float, N_2);
float new_mag = interpolateMagnitude(orig_mag, mirrored_source_ind, source_ind_float, N_2);
float scale = new_mag / orig_mag[dest_ind];
scaleComplexBuffer(dest_ind, scale);
}
myFFT.rebuildNegativeFrequencySpace(complex_2N_buffer.get());
}
int mirrorIndex(float source_ind_float, int N_2) const { // mirror the index
if (source_ind_float < 1.0f) {
return 1 - static_cast<int>(source_ind_float);
} else if (source_ind_float >= N_2 - 1) {
return N_2 - 2 - static_cast<int>(source_ind_float - (N_2 - 1));
} else {
return static_cast<int>(source_ind_float);
}
}
float interpolateMagnitude(const float *orig_mag, int mirrored_source_ind, float source_ind_float, int N_2) const { // cubic interpolation
float y0, y1, y2, y3;
if (mirrored_source_ind < 0 || mirrored_source_ind >= N_2 - 1) {
y0 = y1 = y2 = y3 = 0.0f;
} else {
y0 = orig_mag[mirrored_source_ind - 1];
y1 = orig_mag[mirrored_source_ind];
y2 = orig_mag[mirrored_source_ind + 1];
y3 = orig_mag[mirrored_source_ind + 2];
}
float interp_fac = source_ind_float - static_cast<float>(mirrored_source_ind);
interp_fac = fmax(0.0f, fmin(interp_fac, 1.0f)); // clamp to [0, 1]
float a0 = -0.5f * y0 + 1.5f * y1 - 1.5f * y2 + y3;
float a1 = y0 - 2.5f * y1 + 2.0f * y2 - 0.5f * y3;
float a2 = -0.5f * y0 + 0.5f * y2;
float a3 = y1;
return ((a0 * interp_fac + a1) * interp_fac + a2) * interp_fac + a3;
}

void scaleComplexBuffer(int dest_ind, float scale) { // Scale the complex buffer
float real_part = complex_2N_buffer[2 * dest_ind];
float imag_part = complex_2N_buffer[2 * dest_ind + 1];
complex_2N_buffer[2 * dest_ind] = real_part * scale;
complex_2N_buffer[2 * dest_ind + 1] = imag_part * scale;
}
};
// Improved update method with normalization and overlap-add
void AudioEffectFormantShiftFD_OA_F32::update() {
audio_block_f32_t *in_audio_block = AudioStream_F32::receiveReadOnly_f32();
if (!in_audio_block) return;
if (!enabled) { // if the effect is not enabled just pass the audio through directly
AudioStream_F32::transmit(in_audio_block);
AudioStream_F32::release(in_audio_block);
return;
}
bool has_denormalized_values = false;
const float denorm_threshold = 1e-20f;
for (int i = 0; i < BLOCK_SIZE; i++) { //
if (fabs(in_audio_block->data) < denorm_threshold && in_audio_block->data != 0.0f) { // check for denormalized values
has_denormalized_values = true;
break;
}
}
if (!has_denormalized_values) {
//normalizeInputAudio(in_audio_block); -- Not ideal for vocals
}
//applyWindow(in_audio_block); // apply the window function <--- This casues a bitcrush effect when enabled.
computeEnvelope(in_audio_block); // compute the envelope of the audio using the fir filter
performFFT(in_audio_block); // perform the fft
AudioStream_F32::release(in_audio_block); // release the input block
shiftFormants(); // shift the formants
performIFFT(); // rebuild the audio signal
}
#endif


I've implemented formant shifting logic using cubic interpolation and scaling of the complex buffer and ive used a FIR filter to shape the envelope of the audio signal.

The issues I have:

Ring Modulated Sound: My implementation produces an almost ring modulated sound, especially noticeable on sustained notes.
Bit Crushed Sound with Hann Window: Enabling the Hann window turns the sound into a bit-crushed effect.

I've tried increasing/decreasing block sizes, number of taps etc, but it tends to make it worse, I'm sure It's something glaringly obvious, but any help would be greatly appreciated!

Thanks!

I've included a sample of the audio: Dry - Formant Shift without Hann Window - Formant shift with Hann Window enabled

 
Hi,

I'm the original author of the Open Audio format shifter. I'm trying to figure out what you're doing, but (on my phone at least) the code you posted has lost all its indenting, so it's really hard to follow the code to see what you're trying to do.

Before I can help you debug the sound issues that you raised, I need to understand your basic algorithm. With the indenting gone, I'm getting lost in the code and can't follow the algorithm.

Can you doodle a block diagram of your algorithm or can you simplify your algorithm down to some pseudo code to help explain it? This simplification doesn't need to be detailed enough to expose your bug or anything; it just needs to be barely enough so that I can see the core elements or your algorithm. That'll act as a map so that I can better step through your code and maybe help you get the sound that you want.

Chip
 
What I have been able to maybe understand is that your `computeEnvelope()` probably isn't doing what you want it to do.

If you want it to compute the envelope of the audio, this isn't doing it. To my eye, it looks like this is just a low pass filter. While a low pass filter can be used to compute the envelope, you cannot apply the filter to the audio signal itself, you must apply the envelope to the absolute value of the audio signal.

To use the old fashioned language of radio signal processing, the absolute value operator "rectifies" the audio signal to give you a messy version of the envelope. The low pass filter tries to smooth out that messy envelope signal by attenuating the "ripple" on the envelope caused by signal components that are higher in frequency than you care about (or, at least, are willing to live without).

It's possible that, somewhere in your code, I missed the fact that you are rectifying your signal already. If so, sorry. But, if I'm right and you are not doing an `abs()` to each audio sample prior to that filter, give it a try! I bet it will make an important difference!

Chip
 
Last edited:
What I have been able to maybe understand is that your `computeEnvelope()` probably isn't doing what you wanted to do.

If you want it to compute the envelope of the audio, this isn't doing it. To my eye, it looks like this is just a low pass filter. While a low pass filter can be used to compute the envelope, you cannot apply the filter to the audio signal itself, you must apply the envelope to the absolute value of the audio signal.

To use the old fashioned language of radio signal processing, the absolute value operator "rectifies" the signal and the low pass filter tries to attenuate the "ripple" caused by signal components that are higher in frequency than you don't care about (or, at least, are willing to live without).

It's possible that you are correctly rectifying your signal somewhere else in the code and I'm missing it. If so, sorry. But, if you are not doing an `abs()` to each audio sample prior to that filter, give it a try, I bet it will make an important difference!

Chip
Hi Chip!

Thanks for your detailed response!

Not entirely sure why the formanting went all wonkey! Your formant shifter was a great starting platform! I've posted the code to pastecode for better readability!

Formatted code

Here's a summary of what I'm doing currently:

Initialize: Set up parameters and allocate memory.

FFT and IFFT Plans: Create the necessary FFT and IFFT plans.

Process Blocks: Loop through and process each audio block.

FIR Filter: Compute the envelope using the FIR filter to maintain natural dynamics.

FFT: Transform the audio block to the frequency domain.

Formant Shifting: Adjust the formants using frequency shifting and cubic interpolation. Apply the envelope to maintain dynamics.

IFFT: Convert back to the time domain.

Output: Play back or transmit the processed audio.

Hope this helps a bit!

Cheers
 
Thanks for posting your code with better formatting. It definitely helps.

First, if you really are looking to extract the envelope, I'm going to restate my suggestion from above...which is to do an absolute value operation prior to your fir filter in order to extract the envelope. If you want the envelope, you have to rectify the signal somehow...either through an absolute value operation, or squaring the signal, or doing Hilbert transform, or something. I don't see that you're doing that, so you're not getting the envelope that you think you're getting.

Second, I'm not a huge fan of how you're choosing to normalize the volume of the signal. You are normalizing by the maximum value seen in each audio block. But the audio blocks are pretty short in time. They're only a few milliseconds long. So you are manipulating the loudness at a rate of many hundreds of hertz. That, could sound like ring modulation. I'm not sure that volume normalization is really needed. So, you could comment this out of your code and see if it sounds better.

Third, back on your envelope estimation step, I don't actually understand why you're trying to get the envelope of the broadband signal. If you are trying to do format shifting, why are you extracting the envelope of the broadband signal?

If you were in the frequency domain already, I could understand some why you might like to explore some sort of per-frequency envelope or some sort of frequency smoothing. But, that's not what you're doing. When you do your envelope operation, you're still in the time domain with your full bandwidth signal. It is not obvious to me why you are doing that. What are you trying to do with that envelope?

Chip
 
Last edited:
Hi Chip

By computing the envelope i can better analyse and manipulate the signal in the frequency domain, as it provides information about the signal's amplitude characteristics over time. This is particularly useful in applications like formant shifting, where you might want to modify the spectral characteristics of the signal while preserving its natural amplitude variations.

Im fairly certain i disabled the normalization step, let me double-check.

Ultimately, this has kind of grown overtime as i've tried to improve the overall sound quality, so there may be bits of code that don't quite make sense from thing's i've tried to do to improve the overall sound. I'm still confused as to why enabling a Hann window causes a major bit-crushy sound.

The goal here is to make the processed sound as much like the Rovee formant shifter as I can, as it's for a real-time formant shifter for a male singer I'm currently building.

Rovee:

This Shifter:

As you can see, it's still a bit off. Even with your abs suggestion added.
 
Not entirely sure why the formanting went all wonkey! Your formant shifter was a great starting platform! I've posted the code to pastecode for better readability!
You did not insert your code using the CODE tags button </>.
I have inserted your code below using that button as an example that it works.

Code:
#ifndef _AudioEffectFormantShiftFD_OA_F32_h
#define _AudioEffectFormantShiftFD_OA_F32_h
#include "AudioStream_F32.h"
#include <arm_math.h>
#include "FFT_Overlapped_OA_F32.h"
#include <memory>
// Set frequency range constants
const float minFreq = 100.0f;
const float maxFreq = 20000.0f;
// FIR filter setup
#define BLOCK_SIZE 128 // Should grab this from audio audio_block_samples really to ensure its always correct
#define NUM_TAPS 256
#define CUTOFF_FREQ 100.0f // Hz
class AudioEffectFormantShiftFD_OA_F32 : public AudioStream_F32 {
public:
    AudioEffectFormantShiftFD_OA_F32() : AudioStream_F32(1, inputQueueArray_f32) {
    }

    AudioEffectFormantShiftFD_OA_F32(const AudioSettings_F32& settings) :
        AudioStream_F32(1, inputQueueArray_f32), sample_rate_Hz(settings.sample_rate_Hz) { }

    AudioEffectFormantShiftFD_OA_F32(const AudioSettings_F32& settings, int _N_FFT) :
        AudioStream_F32(1, inputQueueArray_f32) {
        setup(settings, _N_FFT);
    }
    void setOversamplingFactor(int factor) { // Set the oversampling factor (currently unused)
        oversampling_factor = factor; // Currently unused
    }
    int setup(const AudioSettings_F32& settings, int _N_FFT) {
        std::vector<float> firCoeffs;
        calculateFIRCoefficients(firCoeffs, NUM_TAPS, CUTOFF_FREQ);
        window_coeffs = std::make_unique<float[]>(BLOCK_SIZE);
        computeHannWindow(window_coeffs.get(), BLOCK_SIZE);
        sample_rate_Hz = settings.sample_rate_Hz;

        int N_FFT = myFFT.setup(settings, _N_FFT);
        if (N_FFT < 1) return N_FFT;

        N_FFT = myIFFT.setup(settings, _N_FFT);
        if (N_FFT < 1) return N_FFT;
        //Set windowing function
        bool useMyFFTWindow = true;
        if (useMyFFTWindow) {
            myFFT.getFFTObject()->useHanningWindow();
            if (myIFFT.getNBuffBlocks() > 3) {
                myIFFT.getIFFTObject()->useHanningWindow();
            }
        }
        // Print FFT parameters for debugging
        printFFTParameters(settings, N_FFT);
        // Allocate memory for frequency domain data
        complex_2N_buffer = std::make_unique<float32_t[]>(2 * N_FFT);
        enabled = 1;
        return N_FFT;
    }
    float setScaleFactor(float scale_fac) {
        shift_scale_fac = (scale_fac < 0.00001f) ? 0.00001f : scale_fac;
        return shift_scale_fac;
    }
    float getScaleFactor() const {
        return shift_scale_fac;
    }
    virtual void update();
private:
    int enabled = 0;
    int oversampling_factor = 1;
    float sample_rate_Hz = 44117.0f;
    float shift_scale_fac = 1.0f;
    std::unique_ptr<float32_t[]> complex_2N_buffer;
    audio_block_f32_t* inputQueueArray_f32[1];
    FFT_Overlapped_OA_F32 myFFT;
    IFFT_Overlapped_OA_F32 myIFFT;
    float envelope_coeffs[NUM_TAPS] = { };
    float32_t firStateF32[NUM_TAPS + BLOCK_SIZE - 1];
    float32_t envelope_buffer[BLOCK_SIZE];
    float32_t overlap_buffer[BLOCK_SIZE]; // Declaration of overlap buffer
    void printFFTParameters(const AudioSettings_F32& settings, int N_FFT) {
        Serial.println("AudioEffectFormantShiftFD_OA_F32: FFT parameters...");
        Serial.print(" : N_FFT = "); Serial.println(N_FFT);
        Serial.print(" : audio_block_samples = "); Serial.println(settings.audio_block_samples);
        Serial.print(" : FFT N_BUFF_BLOCKS = "); Serial.println(myFFT.getNBuffBlocks());
        Serial.print(" : IFFT N_BUFF_BLOCKS = "); Serial.println(myIFFT.getNBuffBlocks());
        Serial.print(" : FFT use window = "); Serial.println(myFFT.getFFTObject()->get_flagUseWindow());
        Serial.print(" : IFFT use window = "); Serial.println(myIFFT.getIFFTObject()->get_flagUseWindow());
    }
    std::unique_ptr<float[]> window_coeffs;
    void computeHannWindow(float* window_coeffs, int length) { // Compute a Hann window for the audio signal to be applied before the audio is proccesed
        for (int i = 0; i < length; i++) {
            window_coeffs = 0.5f * (1.0f - cosf(2.0f * M_PI * static_cast<float>(i) / static_cast<float>(length)));
            Serial.println(window_coeffs);
        }
    }
    void applyWindow(audio_block_f32_t* audio_block) { // Apply the window function to the envelope
        for (int i = 0; i < BLOCK_SIZE; i++) {
            audio_block->data *= window_coeffs;
        }
    }
    void normalizeInputAudio(audio_block_f32_t* audio_block) {
        const float noise_gate_threshold = 0.01f; // Adjust this value based on your needs
        float max_val = -1.0f;
        for (int i = 0; i < BLOCK_SIZE; i++) {
            max_val = fmax(max_val, fabs(audio_block->data));
        }
        if (max_val > noise_gate_threshold) { // Normalize the audio signal
            float normalization_factor = 1.0f / max_val;
            for (int i = 0; i < BLOCK_SIZE; i++) {
                audio_block->data *= normalization_factor;
            }
            Serial.print("Normalization factor: ");
            Serial.println(normalization_factor);
        }
    }
    void computeEnvelope(const audio_block_f32_t* audio_block) { // Compute the envelope of the audio signal using FIR filter
        arm_fir_instance_f32 firInstance;
        arm_fir_init_f32(&firInstance, NUM_TAPS, envelope_coeffs, firStateF32, BLOCK_SIZE);
        arm_fir_f32(&firInstance, audio_block->data, envelope_buffer, BLOCK_SIZE);
    }
    void performFFT(audio_block_f32_t* audio_block) { // Perform an FFT on the audio signal and store in complex buffer
        myFFT.execute(audio_block, complex_2N_buffer.get());
    }
    void performIFFT() { // Perform an IFFT on the complex buffer and rebuild the audio signal
        audio_block_f32_t* out_audio_block = myIFFT.execute(complex_2N_buffer.get());
        AudioStream_F32::transmit(out_audio_block);
    }
    void calculateIdealImpulseResponse(std::vector<float>& impulseResponse, int numTaps, float cutoffFreq) { // Calculate the ideal impluse response for the FIR Filter
        int midPoint = numTaps / 2;
        for (int i = 0; i < numTaps; i++) {
            if (i == midPoint) {
                impulseResponse = 2 * cutoffFreq;
            }
            else {
                impulseResponse = sin(2 * M_PI * cutoffFreq * (i - midPoint)) / (M_PI * (i - midPoint));
            }
        }
    }
    void applyHammingWindow(std::vector<float>& impulseResponse, int numTaps) { // Apply a Hamming window to the impulse response
        for (int i = 0; i < numTaps; i++) {
            impulseResponse *= 0.54 - 0.46 * cos(2 * M_PI * i / (numTaps - 1)); // Hamming Window
            envelope_coeffs = impulseResponse;
        }
    }
    // Function to calculate FIR filter coefficients dynamically
    void calculateFIRCoefficients(std::vector<float>& coeffs, int numTaps, float cutoffFreq) {
        coeffs.resize(numTaps);

        // Calculate the ideal impulse response
        calculateIdealImpulseResponse(coeffs, numTaps, cutoffFreq);

        // Apply the Hamming window
        applyHammingWindow(coeffs, numTaps);
    }

    void shiftFormants() { // Shift the formants
        int fftSize = myFFT.getNFFT();
        int N_2 = fftSize / 2 + 1;
        float orig_mag[N_2];
        arm_cmplx_mag_f32(complex_2N_buffer.get(), orig_mag, N_2); // Get the magnitude of the complez buffer
        for (int dest_ind = 0; dest_ind < N_2; dest_ind++) {
            float source_ind_float = static_cast<float>(dest_ind) / shift_scale_fac;
            int mirrored_source_ind = mirrorIndex(source_ind_float, N_2);
            float new_mag = interpolateMagnitude(orig_mag, mirrored_source_ind, source_ind_float, N_2);
            float scale = new_mag / orig_mag[dest_ind];
            scaleComplexBuffer(dest_ind, scale);
        }
        myFFT.rebuildNegativeFrequencySpace(complex_2N_buffer.get());
    }
    int mirrorIndex(float source_ind_float, int N_2) const { // mirror the index
        if (source_ind_float < 1.0f) {
            return 1 - static_cast<int>(source_ind_float);
        }
        else if (source_ind_float >= N_2 - 1) {
            return N_2 - 2 - static_cast<int>(source_ind_float - (N_2 - 1));
        }
        else {
            return static_cast<int>(source_ind_float);
        }
    }
    float interpolateMagnitude(const float* orig_mag, int mirrored_source_ind, float source_ind_float, int N_2) const { // cubic interpolation
        float y0, y1, y2, y3;
        if (mirrored_source_ind < 0 || mirrored_source_ind >= N_2 - 1) {
            y0 = y1 = y2 = y3 = 0.0f;
        }
        else {
            y0 = orig_mag[mirrored_source_ind - 1];
            y1 = orig_mag[mirrored_source_ind];
            y2 = orig_mag[mirrored_source_ind + 1];
            y3 = orig_mag[mirrored_source_ind + 2];
        }
        float interp_fac = source_ind_float - static_cast<float>(mirrored_source_ind);
        interp_fac = fmax(0.0f, fmin(interp_fac, 1.0f)); // clamp to [0, 1]
        float a0 = -0.5f * y0 + 1.5f * y1 - 1.5f * y2 + y3;
        float a1 = y0 - 2.5f * y1 + 2.0f * y2 - 0.5f * y3;
        float a2 = -0.5f * y0 + 0.5f * y2;
        float a3 = y1;
        return ((a0 * interp_fac + a1) * interp_fac + a2) * interp_fac + a3;
    }

    void scaleComplexBuffer(int dest_ind, float scale) { // Scale the complex buffer
        float real_part = complex_2N_buffer[2 * dest_ind];
        float imag_part = complex_2N_buffer[2 * dest_ind + 1];
        complex_2N_buffer[2 * dest_ind] = real_part * scale;
        complex_2N_buffer[2 * dest_ind + 1] = imag_part * scale;
    }
};
// Improved update method with normalization and overlap-add
void AudioEffectFormantShiftFD_OA_F32::update() {
    audio_block_f32_t* in_audio_block = AudioStream_F32::receiveReadOnly_f32();
    if (!in_audio_block) return;
    if (!enabled) { // if the effect is not enabled just pass the audio through directly
        AudioStream_F32::transmit(in_audio_block);
        AudioStream_F32::release(in_audio_block);
        return;
    }
    bool has_denormalized_values = false;
    const float denorm_threshold = 1e-20f;
    for (int i = 0; i < BLOCK_SIZE; i++) { //
        if (fabs(in_audio_block->data) < denorm_threshold && in_audio_block->data != 0.0f) { // check for denormalized values
            has_denormalized_values = true;
            break;
        }
    }
    if (!has_denormalized_values) {
        //normalizeInputAudio(in_audio_block); -- Not ideal for vocals
    }
    //applyWindow(in_audio_block); // apply the window function <--- This casues a bitcrush effect when enabled.
    computeEnvelope(in_audio_block); // compute the envelope of the audio using the fir filter
    performFFT(in_audio_block); // perform the fft
    AudioStream_F32::release(in_audio_block); // release the input block
    shiftFormants(); // shift the formants
    performIFFT(); // rebuild the audio signal
}
#endif
 
Your block size is 128. What is your FFT size? Presumably, it is 256?

Reading through your code, it isn't quite obvious to me whether you are still using the overlapping FFT approach employed in all of the Open Audio frequency domain processing blocks (including the formant shifter).

If you are not using the overlapping FFTs (ie, if your FFT size is equal to the block size), it will sound bad. Be sure to use the overlapping functionality and use an FFT size that is 2x or 4x your block size.

Chip
 
The FFT size is directly tied to the block size 256 is what it currently runs at yes.

I do apologise if my codes confusing! I was wondering if there's anything I could to your version of the formant shifter to make it more "sensitive" to a males singing voice. Unfortunately due to max block size being 128 I'm unsure as if I can increase the FFT size anymore but I could be wrong.
 
If you want higher frequency resolution, you can try lowering the sample rate. Cutting the sample rate in half will have a similar effect on frequency resolution as doubling the FFT length.

Or, you can double the FFT length by switching from 2x overlap to 4x overlap.

Or, you can build on the Tympan Library rather than the Open Audio library. The Tympan Library is what I've done after Open Audio. The Tympan Library can have audio blocks longer than 128. Since the FFT size must be linked to the audio block size, allowing longer blocks means that you can have longer FFTs.

With all that said, it's not always a great idea to use longer FFTs. Yes, you do want enough frequency resolution to resolve the formants, but you don't want so much resolution that you start revealing the individual frequency components of the voice. You're doing a formant shifter, not a pitch shifter. I absolutely agree that it can be tough to get the right balance between those two regimes. Does the OpenAudio format shifter have enough resolution for male voices? If so, then you know that the FFT size is acceptable and that the problem is somewhere else.

Good luck!

Chip
 
Last edited:
Back
Top