Triangular windowing

Gabe Walters · Oct 15, 2021

Hi,

I am attempting to do real-time audio compression on a Teensy 4.0 with audio shield by doing triangular windowing, FFT, peak picking, IFFT, then overlap samples by 50% and add. I have an attempt here, but the output sounds bad, even picking all 128 peaks (no compression). Does anyone have experience or insight they can share? Thanks,

Gabe

Code:

//the reason there are 3 of everything is: in order to overlap the output blocks, 2 are needed to start with. it takes 2 to make an overlap. this is done in setup().  
//after that, only 1 further block is needed at a time in order to overlap with the previous results. this is done in loop().   

#define ARM_MATH_CM7
#include <Audio.h>                                    //for audio functions
#include <Wire.h>                                     //for connecting objects together
#include <SD.h>                                       //for SD card
#include <SPI.h>                                    
#include <SerialFlash.h>  
#include <arm_math.h>                                 //for CMSIS_DSP FFT
#include <Bounce.h>                                   //for switches

//Declare constant variables
const int baudRate = 115200;                          
const uint32_t blockSize=AUDIO_BLOCK_SAMPLES;         //C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy4\AudioStream.h has #define AUDIO_BLOCK_SAMPLES  128 
const uint16_t fftLen=blockSize;                      //length of FFT, 128 if blockSize

//Create pointers to be assigned to input and output buffers, create input/output arrays 
//Teensy audio library works on int16_t (equivalent in size with q15 data), the FFT is running on f32 data
int16_t * ptr_in1;                                    //used in setup(), pointer to first block of 128 samples captured by Q_1_rec.readBuffer
int16_t * ptr_in2;                                    //used in setup(), pointer to second block of 128 samples captured by Q_1_rec.readBuffer
int16_t * ptr_in3;                                    //used in loop(), pointer to each successive block of 128 samples after the first 2
int16_t * ptr_out;                                    //used in setup and loop, pointer to output used by Q_1_play.getBuffer to playback the audio
float32_t inputbuffer1[blockSize]={0};                //output of arm_q15_to_float, input to trianglebuffer, for first audio block
float32_t inputbuffer2[blockSize]={0};                //output of arm_q15_to_float, input to trianglebuffer, for second audio block
float32_t inputbuffer3[blockSize]={0};                //output of arm_q15_to_float, input to trianglebuffer, for successive audio blocks
float32_t trianglebuffer1[blockSize]={0};             //make a triangle out of inputbuffer1 and feed to FFT
float32_t trianglebuffer2[blockSize]={0};             //make a triangle out of inputbuffer2 and feed to FFT
float32_t trianglebuffer3[blockSize]={0};             //make a triangle out of inputbuffer3 and feed to FFT
float32_t FFToutputbuffer1[blockSize]={0};            //output of FFT, input to peak picking buffer
float32_t FFToutputbuffer2[blockSize]={0};            //output of FFT, input to peak picking buffer
float32_t FFToutputbuffer3[blockSize]={0};            //output of FFT, input to peak picking buffer
float32_t peakpickbuffer1[blockSize]={0};             //input to IFFT
float32_t peakpickbuffer2[blockSize]={0};             //input to IFFT
float32_t peakpickbuffer3[blockSize]={0};             //input to IFFT
float32_t IFFToutputbuffer1[blockSize]={0};           //output of IFFT, take first 64 samples (index 0-63) and store in rising1, take last 64 samples (index 64-127) and store in falling1
float32_t IFFToutputbuffer2[blockSize]={0};           //output of IFFT, take first 64 samples (index 0-63) and store in rising2, take last 64 samples (index 64-127) and store in falling2
float32_t IFFToutputbuffer3[blockSize]={0};           //output of IFFT, take first 64 samples (index 0-63) and store in rising3, take last 64 samples (index 64-127) and store in falling3
float32_t overlapbuffer1[blockSize]={0};              //not used :(
float32_t overlapbuffer2[blockSize]={0};              //not used 
float32_t overlapbuffer3[blockSize]={0};              //not used
float32_t rising1[blockSize]={0};                     //the first 64 samples of IFFT1. the first 64 samples of the first and only the first output block 
float32_t rising2[blockSize]={0};                     //the first 64 samples of IFFT2
float32_t rising3[blockSize]={0};                     //the first 64 samples of IFFT3
float32_t falling1[blockSize]={0};                    //the last 64 samples of IFFT1
float32_t falling2[blockSize]={0};                    //the last 64 samples of IFFT2
float32_t falling3[blockSize]={0};                    //the last 64 samples of IFFT3
float32_t overlapbuffer2_last[blockSize]={0};         //not used 
float32_t overlapbuffer_f2r1[blockSize]={0};          //not used
float32_t overlapbuffer_f1r2[blockSize]={0};          //falling1 plus rising2. the last 64 samples of the first output block, and the first 64 samples of the next block
float32_t outputbuffer[blockSize]={0};                //used in setup and loop. it gets converted from f32 to q15 for playback by teensy. first 64 are the last 64 from previous loop. last 64 are previous falling plus current rising.
float32_t falling_prev_prev[blockSize]={0};           //not used
float32_t falling_prev[blockSize]={0};                //equal to falling2 in the first pass, on successive passes, equal to previous falling edge before output, equal to current falling edge after output, which on the next loop will be previous falling edge
float32_t rising_prev[blockSize]={0};                 //not used
float32_t rising_current[blockSize]={0};              //not used.  it's what rising3 should be, maybe a better name than rising3 would be _current, or _loop.  would apply to all 3's
float32_t prev64_127[blockSize]={0};                  //in setup which does the first output block, it's overlapbuffer_f1r2. in loop, outputbuffer gets its value in the first 64 indexes. after output, it gets the last 64 indexes of outputbuffer.

//Instantiate audio library objects, make connections
AudioInputI2S i2s_in;                                 //Audio input on i2s bus from the teensy audioshield adapter board
AudioOutputI2S i2s_out;                               //Audio output on i2s bus to the audioshield
AudioRecordQueue Q_1_rec;                             //Object for storing audio blocks and accessing as arrays
AudioPlayQueue Q_1_play;                              //object to write array to for output
AudioConnection patchcord1(i2s_in,0,Q_1_rec,0);       //connect audio input to the object for storing audio blocks and accessing as arrays
AudioConnection patchcord2(Q_1_play,0,i2s_out,0);     //connect the array object for output to audio output channel1 (L or R)
AudioConnection patchcord3(Q_1_play,0,i2s_out,1);     //connect the array object for output to audio output channel2 (L or R)
AudioControlSGTL5000 sgtl5000;                        //call the IC on the audioshield

arm_rfft_fast_instance_f32 FFT;                       //instantiate FFT as required by CMSIS_DSP https://arm-software.github.io/CMSIS_5/DSP/html/group__RealFFT.html#gac5fceb172551e7c11eb4d0e17ef15aa3

//Define input to system. Uncomment the input you want to use
//const int input = AUDIO_INPUT_MIC;
const int input = AUDIO_INPUT_LINEIN;

// a bit of helper for printing stuff to serial monitor in a nicer format
constexpr size_t length_format_buffer {64};
char format_buffer[length_format_buffer];

// a bit of helper for printing stuff to serial monitor in a nicer format
void serial_print_int_width_4(unsigned int input){
  snprintf(format_buffer, length_format_buffer, "%03i", input);
  Serial.print(format_buffer);
}

// a bit of helper for printing stuff to serial monitor in a nicer format
void serial_print_float_width_16_prec_8(float input){ 
  dtostrf(input, 8, 4, format_buffer);
  Serial.print(format_buffer);
}

// put your setup code here, to run once
void setup() {                                         
  
  Serial.begin(baudRate);                              //for serial print if needed?
  AudioMemory(10);                                     // Audio connections require memory to work.  For more detailed information, see the MemoryAndCpuUsage example
  
  //enable audio shield, select input/mic gain, and initialize output level
  sgtl5000.enable();
  sgtl5000.inputSelect(input);
  sgtl5000.micGain(60);
  sgtl5000.volume(0.5);
  
  arm_rfft_fast_init_f32(&FFT,fftLen);                  //initialize FFT as required by CMSIS_DSP arm_rfft_fast_f32

  //Begin the record audio queue
  Q_1_rec.begin();                                      
  delay (8);                                            //allow time for 2 sample blocks to be acquired.  empirically 8 ms.
  if(Q_1_rec.available() >=2 ) {                        //execute if 2 blocks have been acquired
    Serial.println(Q_1_rec.available());                //print number of blocks in queue to serial monitor.  should be 2
    ptr_in1 = Q_1_rec.readBuffer();                     //copy addresses of 128 samples in queue block to ptr_in1
    Q_1_rec.freeBuffer();                               //free the queue of the previous block
    Serial.println(Q_1_rec.available());                //print number of blocks in queue to serial monitor.  should be 1
    ptr_in2 = Q_1_rec.readBuffer();                     //copy addresses of 128 samples in queue block to ptr_in2
    Q_1_rec.freeBuffer();                               //free the queue of the previous block

    ptr_out = Q_1_play.getBuffer();                     //get pointer to output buffer "empty" array

    //teensy audio works in q15 format. FFT works better in F32 format due to the dedicated FPU on the teensy. convert and store in inputbuffer
    arm_q15_to_float(&ptr_in1[0],&inputbuffer1[0],blockSize);               
    arm_q15_to_float(&ptr_in2[0],&inputbuffer2[0],blockSize);  

    //make triangles of input blocks 1 and 2
    for (int i=0; i<64; i++){ 
      trianglebuffer1[i] = inputbuffer1[i] *(i+1)/64;                       //index 0 = input * 1/64, index 63 = input * 64/64 
    }
    for (int i=0; i<64; i++){ 
      trianglebuffer1[128-(i+1)] = inputbuffer1[128-(i+1)] *(i+1)/64;       //index 127 = input * 1/64, index 64 = 64/64 
    }
    for (int i=0; i<64; i++){ 
      trianglebuffer2[i] = inputbuffer2[i] *(i+1)/64;                       
    }
    for (int i=0; i<64; i++){ 
      trianglebuffer2[128-(i+1)] = inputbuffer2[128-(i+1)] *(i+1)/64;       
    }

    //do the FFT on the triangles and output to FFToutputbuffer.  Third argument 0 is for forward FFT, vs 1 for inverse FFT.
    arm_rfft_fast_f32(&FFT,&trianglebuffer1[0],FFToutputbuffer1,0);         
    arm_rfft_fast_f32(&FFT,&trianglebuffer2[0],FFToutputbuffer2,0);

    //peak picking. set i<peaks, 128 for all peaks
    for (int i=0; i<128; i++){                                              
      peakpickbuffer1[i] = FFToutputbuffer1[i];
    }

    for (int i=0; i<128; i++){
      peakpickbuffer2[i] = FFToutputbuffer2[i];
    }

    //Do IFFT on peaks picked.  Third argument 1 is for inverse FFT, vs 0 for forward FFT.
    arm_rfft_fast_f32(&FFT,peakpickbuffer1,IFFToutputbuffer1,1);
    arm_rfft_fast_f32(&FFT,peakpickbuffer2,IFFToutputbuffer2,1);

    //take each edge of each triangle to help follow wtf is going on
    for (int i=0; i<64; i++){
      rising1[i] = IFFToutputbuffer1[i];                                    //triangle1 rising edge
    }

    for (int i=0; i<64; i++){
      falling1[i] = IFFToutputbuffer1[64+i];                                // triangle1 falling edge
    }

    for (int i=0; i<64; i++){
      rising2[i] = IFFToutputbuffer2[i];                                    //triangle2 rising edge
    }

    for (int i=0; i<64; i++){
      falling2[i] = IFFToutputbuffer2[64+i];                                //triangle2 falling edge
    }

    for (int i=0; i<64; i++){
      overlapbuffer_f1r2[i] = falling1[i] + rising2[i];                     //falling1 plus rising2
    }

    for (int i=0; i<64; i++){
      prev64_127[i] = overlapbuffer_f1r2[i];                                //prev64_127 will be used in loop(), since the first part of each output is the same as the last part of the previous output
    } 

    for (int i=0; i<64; i++){
      falling_prev[i] = falling2[i];                                        //falling_prev will be used in loop(), added to the current rising edge for the 64-127 samples of each output block
    }     

    for (int i=0; i<64; i++){
      outputbuffer[i] = rising1[i];                                         //first part of output block is triangle1 rising edge
    }  

    for (int i=0; i<64; i++){
      outputbuffer[64+i] = overlapbuffer_f1r2[i];                           //second part of output block is falling1 plus rising2
    }  
   
    arm_float_to_q15(&outputbuffer[0],&ptr_out[0],blockSize);               //convert from F32 back to Q15 for teensy audio output

    Q_1_play.playBuffer();                                                  //output first block to headphone jack

    Serial.println("Setup done");                                           //indicate success to serial monitor
  }
  else{
    Serial.println("Something went wrong");                                 //indicate otherwise
  }
}                                                                           //end void setup() {  

// put your main code here, to run repeatedly
void loop() {
 
  if(Q_1_rec.available()){                                            //check to see if there is a queue block available. 

    ptr_in3 = Q_1_rec.readBuffer();                                   //copy addresses of 128 samples in queue block to ptr_in3
    
    ptr_out = Q_1_play.getBuffer();                                   //get pointer to output buffer "empty" array

    arm_q15_to_float(&ptr_in3[0],&inputbuffer3[0],blockSize);         //convert Q15 to F32

    //make triangle of input block
    for (int i=0; i<64; i++){ 
      trianglebuffer3[i] = inputbuffer3[i] *(i+1)/64;                 //index 0 = input * 1/64, index 63 = input * 64/64 
    }
    for (int i=0; i<64; i++){ 
      trianglebuffer3[128-(i+1)] = inputbuffer3[128-(i+1)] *(i+1)/64; //index 127 = input * 1/64, index 64 = 64/64 
    }

    arm_rfft_fast_f32(&FFT,&trianglebuffer3[0],FFToutputbuffer3,0);   //FFT on current triangle

    //peak pick current FFT output spectra.  set i<peaks.  set i<128 for all peaks.
    for (int i=0; i<128; i++){
      peakpickbuffer3[i] = FFToutputbuffer3[i];                       
    }

    arm_rfft_fast_f32(&FFT,peakpickbuffer3,IFFToutputbuffer3,1);      //IFFT of peaks picked

    //separate triangle edges for ease of understanding
    for (int i=0; i<64; i++){
      rising3[i] = IFFToutputbuffer3[i];                              //current triangle rising edge
    }

    for (int i=0; i<64; i++){
      falling3[i] = IFFToutputbuffer3[64+i];                          //current triangle falling edge
    }

    for (int i=0; i<64; i++){
      outputbuffer[i] = prev64_127[i];                                //the first part of the output block, index's 0-63, are always the last part, index's 64-127, of the previous block (overlapping)
    }

    for (int i=0; i<64; i++){
      outputbuffer[64+i] = falling_prev[i] + rising3[i];              //the last part of the output block, index's 64-127, are always the previous falling edge, plus the current rising edge (overlapping)
    }

    for (int i=0; i<64; i++){
      prev64_127[i] = outputbuffer[64+i];                             //after output, update prev64_127 with the last part of the output block, to be used as the first part of the next output block in the next loop
    } 

    for (int i=0; i<64; i++){
      falling_prev[i] = falling3[i];                                  //after output, update falling_prev with current falling edge, to be added to next rising edge, for last part of next output block
    }     

    arm_float_to_q15(&outputbuffer[0],&ptr_out[0],blockSize);         //convert the output block to Q15 for teensy audio playback
    
    Q_1_rec.freeBuffer();                                             //free the record buffer for the next block on the next loop
                                    
    Q_1_play.playBuffer();                                            //output the buffer to be played
  }                                                                   //end if(Q_1_rec.available()){ 
}                                                                     //end void loop() {

Frank B · Oct 15, 2021

Code:

    Serial.println(Q_1_rec.available());                //print number of blocks in queue to serial monitor.  should be 2
    ptr_in1 = Q_1_rec.readBuffer();                     //copy addresses of 128 samples in queue block to ptr_in1
    Q_1_rec.freeBuffer();                               //free the queue of the previous block

Hm.So ptr_in11 points to a freed buffer?
It may be not the reason, but that's an issue. Free it when you don't need the data anymore.

MarkT · Oct 15, 2021

I've had a quick scan of the code, and I think you need to use two blocks per FFT, so that the overlap
is 50% of the FFT period - the current code seems to have no overlap.

BTW a Hann window (raised cosine) has less spectral leakage than a triangular window and should give
less artifacts.

Are you taking account of the time-domain transients? Just from the spectral peaks you won't be able to
detect these as they have a wide-band spectrum - the risk is they may clip.

Gabe Walters · Oct 16, 2021

Thanks for the replies you guys. Frank I see your point about the pointer pointing to a freed buffer. How silly of me! Mark I started to do two blocks, and got wrapped around the axle again.

I am starting to wonder just how real-time this can be. If frames are being overlapped, each one is essentially being shifted back to overlap with the previous one. While this is happening, the input data just keeps coming. So it seems like a delay will arise and get worse over time.

I have a non-real-time matlab program that just does simple compression on a 3 second .wav file. When I made a crude chart of how the matlab program works, and then tried to make a matching chart for the teensy, there seemed to be this important difference where the input data is streaming in but the compression just keeps getting farther behind the teensy due to the overlapping. Because the teensy is going to give blocks of 128 constantly no matter what, and it has to, if it is to be real time.

There are other minor differences such as matlab being 256 and teensy being 128, and starting indexes of 1 and 0 respectively, but that wouldn't seem to change things much.

Does anyone know of this being done before?

Thanks,

Gabe

MATLAB code:

Code:

clear all; close all; clc;

num_peaks = 129; % the number of peaks to pick 

info = audioinfo('cleanspeech.wav'); % get number of samples in input wav
Length = info.TotalSamples; % store number of samples in Length
s_hat = zeros(Length,1); % prepare output vector of zeros, the same length as input 
s_A_rising = zeros(128,1); % prepare the vector of zeros to capture the rising edge of every other triangular frame A
s_A_falling = zeros(128,1); % prepare the vector of zeros to capture the falling edge of every other triangular frame A
s_B_rising = zeros(128,1); % prepare the vector of zeros to capture the rising edge of every other triangular frame B
s_B_falling = zeros(128,1); % prepare the vector of zeros to capture the falling edge of every other triangular frame B
s_C = zeros(128,1); % prepare the vector of zeros to add the rising edge of the next to the falling edge of the prev and vice versa
%filename = 'output1.wav'; % name of the output file, if needed

rows = 2; % number of rows in the plot
columns = 3; % number of columns in the plot

%Find the peaks, zero out the rest, then put its ifft into s_hat
n = (2 * Length / 256) - 1; % determine number of iterations needed to process entire input 
%with traingular overlapping frames of 256 samples each. overall twice as many frames as
%non-overlapping rectangular minus 1

for i = 0:n-1 % number of iterations beginning at zero
    c = 128 * i; % setup for moving window of frames. 1st frame is 1-256, 2nd is 129-384, etc.
    a = 1 + c; % starting index of current frame
    b = 256 + c; % ending index of current frame
    bb = 128 + c; % ending index of rising/falling edge addition
    [s, fs] = audioread('cleanspeech.wav',[a b]); % read current frame of 256 samples. fs is intrinsic to .wav
    
    % make triangular frame by multipling sample n = 1 thru 128 by (n /
    % 128).  The second loop is the downward slope.

        %upward slope
        for m = 1:128
            s(m) = s(m) * m/128;
        end
        %downward slope
        for m = 1:128
            s(257-m) = s(257-m) * m/128;
        end    
 
    S = fft(s); % take Fourier Transform of current frame
    mags = zeros(num_peaks,1); % mags is a vector of zeros, of the number of picked peaks

    %Find the highest magnitudes and put them in to mags
    mags = maxk(S,num_peaks);
    mags = abs(mags);

    %Zero out all parts of S that with magnitudes less than mags
    for iii = 1:256 % 256 iterations, one for each element in the current frame 
        if abs(S(iii)) < mags(end) % if absolute value of S(iii) is less than the last, smallest, element in mags, replace with 0
            S(iii) = 0;
        end
    end
    s = ifft(S); %take inverse FFT of S and store in s
    
    IS_EVEN = ~mod(i,2); %setup for every other iteration to store and add the triangle edges
    
    if IS_EVEN == 1; %first and every other iteration use the rising edge of current triangle added to falling edge of prev
      s_A_rising = s(1:128,1); %rising edge of this triangle
      s_A_falling = s(129:256,1); %falling edge of this triangle
      s_C = s_A_rising + s_B_falling; %add this rising edge to previous falling edge and store in 128 element s_C
    else     %second and every other iteration add the falling edge of the last triangle to the rising edge of the current
      s_B_rising = s(1:128,1); %rising edge of this triangle
      s_B_falling = s(129:256,1); %falling edge of this triangle
      s_C = s_A_falling + s_B_rising; %add last rising edge to current falling edge
    end
    
    s_hat(a:bb,1) = s_C; %build output vector s_hat 128 elements at a time (edges added together) 
end

soundsc(s_hat, fs); % play audio on speakers

%Plots
    s_ = audioread('cleanspeech.wav'); % read entire wav into memory for plots and calculations only
    
    % plot the single channel audio input
    subplot(rows, columns, 1); 
    plot (s_);
    title ('1-channel audio');

    % display number of peaks picked as integer
    ax = subplot(rows, columns, 2); % display PP at subplot position
    text(0.45,0.5,num2str(num_peaks)); % center PP text in subplot and get string from num_peaks
    title ('Peaks picked'); % subplot title
    set(gca,'xtick',[]); % remove axes
    set(gca,'ytick',[]); % remove axes
    
    % plot FFT in dB
    subplot(rows, columns, 3); 
    eps = 1e-7;
    plot(db(S(1:256)+eps)); 
    title('Freq --- dB Plot');
    
    % plot the reconstructed audio file
    subplot(rows, columns, 4);  
    plot (s_hat);
    title ('Reconstructed audio');

    % plot the input/output difference 
    temp = zeros(size(s_hat));
        for iiii = 1:length(s_)
            temp(iiii,1) =  s_(iiii,1);
        end
    diff = s_hat - temp;
    subplot(rows, columns, 5);  
    plot (diff);
    ylim([-1 1]);
    title ('Audio difference');
    
    %Signal to Noise Ratio
    e = s_(:) - s_hat(:);  % error = s(n) - s_hat(n) = input - output
    E = sum(e.^2); % Epsilon (error energy) = sum of squared elements of e
    P = sum(s_.^2); % P (signal energy) = sum of squared elements of single_channel
    SNR_dB = 10*log(P/E); % calculate SNR in dB where SNR = P/E
    ax = subplot(rows, columns, 6); % display SNR at subplot position
    text(0.25,0.5,num2str(SNR_dB)); % center SNR text in subplot and get string from SNR_dB
    title ('SNR_{dB}'); % subplot title
    set(gca,'xtick',[]); % remove axes
    set(gca,'ytick',[]); % remove axes

MarkT · Oct 16, 2021

I am starting to wonder just how real-time this can be. If frames are being overlapped, each one is essentially being shifted back to overlap with the previous one. While this is happening, the input data just keeps coming. So it seems like a delay will arise and get worse over time.

No, the latency is a fixed number of blocks, this is just like a pipeline.

Triangular windowing

Gabe Walters

New member

Frank B

Senior Member

MarkT

Well-known member

Gabe Walters

New member

MarkT

Well-known member