Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 5 of 5

Thread: Triangular windowing

  1. #1

    Triangular windowing

    Hi,

    I am attempting to do real-time audio compression on a Teensy 4.0 with audio shield by doing triangular windowing, FFT, peak picking, IFFT, then overlap samples by 50% and add. I have an attempt here, but the output sounds bad, even picking all 128 peaks (no compression). Does anyone have experience or insight they can share? Thanks,

    Gabe

    Code:
    //the reason there are 3 of everything is: in order to overlap the output blocks, 2 are needed to start with. it takes 2 to make an overlap. this is done in setup().  
    //after that, only 1 further block is needed at a time in order to overlap with the previous results. this is done in loop().   
    
    #define ARM_MATH_CM7
    #include <Audio.h>                                    //for audio functions
    #include <Wire.h>                                     //for connecting objects together
    #include <SD.h>                                       //for SD card
    #include <SPI.h>                                    
    #include <SerialFlash.h>  
    #include <arm_math.h>                                 //for CMSIS_DSP FFT
    #include <Bounce.h>                                   //for switches
    
    //Declare constant variables
    const int baudRate = 115200;                          
    const uint32_t blockSize=AUDIO_BLOCK_SAMPLES;         //C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy4\AudioStream.h has #define AUDIO_BLOCK_SAMPLES  128 
    const uint16_t fftLen=blockSize;                      //length of FFT, 128 if blockSize
    
    //Create pointers to be assigned to input and output buffers, create input/output arrays 
    //Teensy audio library works on int16_t (equivalent in size with q15 data), the FFT is running on f32 data
    int16_t * ptr_in1;                                    //used in setup(), pointer to first block of 128 samples captured by Q_1_rec.readBuffer
    int16_t * ptr_in2;                                    //used in setup(), pointer to second block of 128 samples captured by Q_1_rec.readBuffer
    int16_t * ptr_in3;                                    //used in loop(), pointer to each successive block of 128 samples after the first 2
    int16_t * ptr_out;                                    //used in setup and loop, pointer to output used by Q_1_play.getBuffer to playback the audio
    float32_t inputbuffer1[blockSize]={0};                //output of arm_q15_to_float, input to trianglebuffer, for first audio block
    float32_t inputbuffer2[blockSize]={0};                //output of arm_q15_to_float, input to trianglebuffer, for second audio block
    float32_t inputbuffer3[blockSize]={0};                //output of arm_q15_to_float, input to trianglebuffer, for successive audio blocks
    float32_t trianglebuffer1[blockSize]={0};             //make a triangle out of inputbuffer1 and feed to FFT
    float32_t trianglebuffer2[blockSize]={0};             //make a triangle out of inputbuffer2 and feed to FFT
    float32_t trianglebuffer3[blockSize]={0};             //make a triangle out of inputbuffer3 and feed to FFT
    float32_t FFToutputbuffer1[blockSize]={0};            //output of FFT, input to peak picking buffer
    float32_t FFToutputbuffer2[blockSize]={0};            //output of FFT, input to peak picking buffer
    float32_t FFToutputbuffer3[blockSize]={0};            //output of FFT, input to peak picking buffer
    float32_t peakpickbuffer1[blockSize]={0};             //input to IFFT
    float32_t peakpickbuffer2[blockSize]={0};             //input to IFFT
    float32_t peakpickbuffer3[blockSize]={0};             //input to IFFT
    float32_t IFFToutputbuffer1[blockSize]={0};           //output of IFFT, take first 64 samples (index 0-63) and store in rising1, take last 64 samples (index 64-127) and store in falling1
    float32_t IFFToutputbuffer2[blockSize]={0};           //output of IFFT, take first 64 samples (index 0-63) and store in rising2, take last 64 samples (index 64-127) and store in falling2
    float32_t IFFToutputbuffer3[blockSize]={0};           //output of IFFT, take first 64 samples (index 0-63) and store in rising3, take last 64 samples (index 64-127) and store in falling3
    float32_t overlapbuffer1[blockSize]={0};              //not used :(
    float32_t overlapbuffer2[blockSize]={0};              //not used 
    float32_t overlapbuffer3[blockSize]={0};              //not used
    float32_t rising1[blockSize]={0};                     //the first 64 samples of IFFT1. the first 64 samples of the first and only the first output block 
    float32_t rising2[blockSize]={0};                     //the first 64 samples of IFFT2
    float32_t rising3[blockSize]={0};                     //the first 64 samples of IFFT3
    float32_t falling1[blockSize]={0};                    //the last 64 samples of IFFT1
    float32_t falling2[blockSize]={0};                    //the last 64 samples of IFFT2
    float32_t falling3[blockSize]={0};                    //the last 64 samples of IFFT3
    float32_t overlapbuffer2_last[blockSize]={0};         //not used 
    float32_t overlapbuffer_f2r1[blockSize]={0};          //not used
    float32_t overlapbuffer_f1r2[blockSize]={0};          //falling1 plus rising2. the last 64 samples of the first output block, and the first 64 samples of the next block
    float32_t outputbuffer[blockSize]={0};                //used in setup and loop. it gets converted from f32 to q15 for playback by teensy. first 64 are the last 64 from previous loop. last 64 are previous falling plus current rising.
    float32_t falling_prev_prev[blockSize]={0};           //not used
    float32_t falling_prev[blockSize]={0};                //equal to falling2 in the first pass, on successive passes, equal to previous falling edge before output, equal to current falling edge after output, which on the next loop will be previous falling edge
    float32_t rising_prev[blockSize]={0};                 //not used
    float32_t rising_current[blockSize]={0};              //not used.  it's what rising3 should be, maybe a better name than rising3 would be _current, or _loop.  would apply to all 3's
    float32_t prev64_127[blockSize]={0};                  //in setup which does the first output block, it's overlapbuffer_f1r2. in loop, outputbuffer gets its value in the first 64 indexes. after output, it gets the last 64 indexes of outputbuffer.
    
    //Instantiate audio library objects, make connections
    AudioInputI2S i2s_in;                                 //Audio input on i2s bus from the teensy audioshield adapter board
    AudioOutputI2S i2s_out;                               //Audio output on i2s bus to the audioshield
    AudioRecordQueue Q_1_rec;                             //Object for storing audio blocks and accessing as arrays
    AudioPlayQueue Q_1_play;                              //object to write array to for output
    AudioConnection patchcord1(i2s_in,0,Q_1_rec,0);       //connect audio input to the object for storing audio blocks and accessing as arrays
    AudioConnection patchcord2(Q_1_play,0,i2s_out,0);     //connect the array object for output to audio output channel1 (L or R)
    AudioConnection patchcord3(Q_1_play,0,i2s_out,1);     //connect the array object for output to audio output channel2 (L or R)
    AudioControlSGTL5000 sgtl5000;                        //call the IC on the audioshield
    
    arm_rfft_fast_instance_f32 FFT;                       //instantiate FFT as required by CMSIS_DSP https://arm-software.github.io/CMSIS_5/DSP/html/group__RealFFT.html#gac5fceb172551e7c11eb4d0e17ef15aa3
    
    //Define input to system. Uncomment the input you want to use
    //const int input = AUDIO_INPUT_MIC;
    const int input = AUDIO_INPUT_LINEIN;
    
    // a bit of helper for printing stuff to serial monitor in a nicer format
    constexpr size_t length_format_buffer {64};
    char format_buffer[length_format_buffer];
    
    // a bit of helper for printing stuff to serial monitor in a nicer format
    void serial_print_int_width_4(unsigned int input){
      snprintf(format_buffer, length_format_buffer, "%03i", input);
      Serial.print(format_buffer);
    }
    
    // a bit of helper for printing stuff to serial monitor in a nicer format
    void serial_print_float_width_16_prec_8(float input){ 
      dtostrf(input, 8, 4, format_buffer);
      Serial.print(format_buffer);
    }
    
    // put your setup code here, to run once
    void setup() {                                         
      
      Serial.begin(baudRate);                              //for serial print if needed?
      AudioMemory(10);                                     // Audio connections require memory to work.  For more detailed information, see the MemoryAndCpuUsage example
      
      //enable audio shield, select input/mic gain, and initialize output level
      sgtl5000.enable();
      sgtl5000.inputSelect(input);
      sgtl5000.micGain(60);
      sgtl5000.volume(0.5);
      
      arm_rfft_fast_init_f32(&FFT,fftLen);                  //initialize FFT as required by CMSIS_DSP arm_rfft_fast_f32
    
      //Begin the record audio queue
      Q_1_rec.begin();                                      
      delay (8);                                            //allow time for 2 sample blocks to be acquired.  empirically 8 ms.
      if(Q_1_rec.available() >=2 ) {                        //execute if 2 blocks have been acquired
        Serial.println(Q_1_rec.available());                //print number of blocks in queue to serial monitor.  should be 2
        ptr_in1 = Q_1_rec.readBuffer();                     //copy addresses of 128 samples in queue block to ptr_in1
        Q_1_rec.freeBuffer();                               //free the queue of the previous block
        Serial.println(Q_1_rec.available());                //print number of blocks in queue to serial monitor.  should be 1
        ptr_in2 = Q_1_rec.readBuffer();                     //copy addresses of 128 samples in queue block to ptr_in2
        Q_1_rec.freeBuffer();                               //free the queue of the previous block
    
        ptr_out = Q_1_play.getBuffer();                     //get pointer to output buffer "empty" array
    
        //teensy audio works in q15 format. FFT works better in F32 format due to the dedicated FPU on the teensy. convert and store in inputbuffer
        arm_q15_to_float(&ptr_in1[0],&inputbuffer1[0],blockSize);               
        arm_q15_to_float(&ptr_in2[0],&inputbuffer2[0],blockSize);  
    
        //make triangles of input blocks 1 and 2
        for (int i=0; i<64; i++){ 
          trianglebuffer1[i] = inputbuffer1[i] *(i+1)/64;                       //index 0 = input * 1/64, index 63 = input * 64/64 
        }
        for (int i=0; i<64; i++){ 
          trianglebuffer1[128-(i+1)] = inputbuffer1[128-(i+1)] *(i+1)/64;       //index 127 = input * 1/64, index 64 = 64/64 
        }
        for (int i=0; i<64; i++){ 
          trianglebuffer2[i] = inputbuffer2[i] *(i+1)/64;                       
        }
        for (int i=0; i<64; i++){ 
          trianglebuffer2[128-(i+1)] = inputbuffer2[128-(i+1)] *(i+1)/64;       
        }
    
        //do the FFT on the triangles and output to FFToutputbuffer.  Third argument 0 is for forward FFT, vs 1 for inverse FFT.
        arm_rfft_fast_f32(&FFT,&trianglebuffer1[0],FFToutputbuffer1,0);         
        arm_rfft_fast_f32(&FFT,&trianglebuffer2[0],FFToutputbuffer2,0);
    
        //peak picking. set i<peaks, 128 for all peaks
        for (int i=0; i<128; i++){                                              
          peakpickbuffer1[i] = FFToutputbuffer1[i];
        }
    
        for (int i=0; i<128; i++){
          peakpickbuffer2[i] = FFToutputbuffer2[i];
        }
    
        //Do IFFT on peaks picked.  Third argument 1 is for inverse FFT, vs 0 for forward FFT.
        arm_rfft_fast_f32(&FFT,peakpickbuffer1,IFFToutputbuffer1,1);
        arm_rfft_fast_f32(&FFT,peakpickbuffer2,IFFToutputbuffer2,1);
    
        //take each edge of each triangle to help follow wtf is going on
        for (int i=0; i<64; i++){
          rising1[i] = IFFToutputbuffer1[i];                                    //triangle1 rising edge
        }
    
        for (int i=0; i<64; i++){
          falling1[i] = IFFToutputbuffer1[64+i];                                // triangle1 falling edge
        }
    
        for (int i=0; i<64; i++){
          rising2[i] = IFFToutputbuffer2[i];                                    //triangle2 rising edge
        }
    
        for (int i=0; i<64; i++){
          falling2[i] = IFFToutputbuffer2[64+i];                                //triangle2 falling edge
        }
    
        for (int i=0; i<64; i++){
          overlapbuffer_f1r2[i] = falling1[i] + rising2[i];                     //falling1 plus rising2
        }
    
        for (int i=0; i<64; i++){
          prev64_127[i] = overlapbuffer_f1r2[i];                                //prev64_127 will be used in loop(), since the first part of each output is the same as the last part of the previous output
        } 
    
        for (int i=0; i<64; i++){
          falling_prev[i] = falling2[i];                                        //falling_prev will be used in loop(), added to the current rising edge for the 64-127 samples of each output block
        }     
    
        for (int i=0; i<64; i++){
          outputbuffer[i] = rising1[i];                                         //first part of output block is triangle1 rising edge
        }  
    
        for (int i=0; i<64; i++){
          outputbuffer[64+i] = overlapbuffer_f1r2[i];                           //second part of output block is falling1 plus rising2
        }  
       
        arm_float_to_q15(&outputbuffer[0],&ptr_out[0],blockSize);               //convert from F32 back to Q15 for teensy audio output
    
        Q_1_play.playBuffer();                                                  //output first block to headphone jack
    
        Serial.println("Setup done");                                           //indicate success to serial monitor
      }
      else{
        Serial.println("Something went wrong");                                 //indicate otherwise
      }
    }                                                                           //end void setup() {  
    
    // put your main code here, to run repeatedly
    void loop() {
     
      if(Q_1_rec.available()){                                            //check to see if there is a queue block available. 
    
        ptr_in3 = Q_1_rec.readBuffer();                                   //copy addresses of 128 samples in queue block to ptr_in3
        
        ptr_out = Q_1_play.getBuffer();                                   //get pointer to output buffer "empty" array
    
        arm_q15_to_float(&ptr_in3[0],&inputbuffer3[0],blockSize);         //convert Q15 to F32
    
        //make triangle of input block
        for (int i=0; i<64; i++){ 
          trianglebuffer3[i] = inputbuffer3[i] *(i+1)/64;                 //index 0 = input * 1/64, index 63 = input * 64/64 
        }
        for (int i=0; i<64; i++){ 
          trianglebuffer3[128-(i+1)] = inputbuffer3[128-(i+1)] *(i+1)/64; //index 127 = input * 1/64, index 64 = 64/64 
        }
    
        arm_rfft_fast_f32(&FFT,&trianglebuffer3[0],FFToutputbuffer3,0);   //FFT on current triangle
    
        //peak pick current FFT output spectra.  set i<peaks.  set i<128 for all peaks.
        for (int i=0; i<128; i++){
          peakpickbuffer3[i] = FFToutputbuffer3[i];                       
        }
    
        arm_rfft_fast_f32(&FFT,peakpickbuffer3,IFFToutputbuffer3,1);      //IFFT of peaks picked
    
        //separate triangle edges for ease of understanding
        for (int i=0; i<64; i++){
          rising3[i] = IFFToutputbuffer3[i];                              //current triangle rising edge
        }
    
        for (int i=0; i<64; i++){
          falling3[i] = IFFToutputbuffer3[64+i];                          //current triangle falling edge
        }
    
        for (int i=0; i<64; i++){
          outputbuffer[i] = prev64_127[i];                                //the first part of the output block, index's 0-63, are always the last part, index's 64-127, of the previous block (overlapping)
        }
    
        for (int i=0; i<64; i++){
          outputbuffer[64+i] = falling_prev[i] + rising3[i];              //the last part of the output block, index's 64-127, are always the previous falling edge, plus the current rising edge (overlapping)
        }
    
        for (int i=0; i<64; i++){
          prev64_127[i] = outputbuffer[64+i];                             //after output, update prev64_127 with the last part of the output block, to be used as the first part of the next output block in the next loop
        } 
    
        for (int i=0; i<64; i++){
          falling_prev[i] = falling3[i];                                  //after output, update falling_prev with current falling edge, to be added to next rising edge, for last part of next output block
        }     
    
        arm_float_to_q15(&outputbuffer[0],&ptr_out[0],blockSize);         //convert the output block to Q15 for teensy audio playback
        
        Q_1_rec.freeBuffer();                                             //free the record buffer for the next block on the next loop
                                        
        Q_1_play.playBuffer();                                            //output the buffer to be played
      }                                                                   //end if(Q_1_rec.available()){ 
    }                                                                     //end void loop() {

  2. #2
    Senior Member
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    9,437
    Code:
        Serial.println(Q_1_rec.available());                //print number of blocks in queue to serial monitor.  should be 2
        ptr_in1 = Q_1_rec.readBuffer();                     //copy addresses of 128 samples in queue block to ptr_in1
        Q_1_rec.freeBuffer();                               //free the queue of the previous block
    Hm.So ptr_in11 points to a freed buffer?
    It may be not the reason, but that's an issue. Free it when you don't need the data anymore.

  3. #3
    Senior Member
    Join Date
    Jul 2020
    Posts
    1,371
    I've had a quick scan of the code, and I think you need to use two blocks per FFT, so that the overlap
    is 50% of the FFT period - the current code seems to have no overlap.

    BTW a Hann window (raised cosine) has less spectral leakage than a triangular window and should give
    less artifacts.

    Are you taking account of the time-domain transients? Just from the spectral peaks you won't be able to
    detect these as they have a wide-band spectrum - the risk is they may clip.

  4. #4
    Thanks for the replies you guys. Frank I see your point about the pointer pointing to a freed buffer. How silly of me! Mark I started to do two blocks, and got wrapped around the axle again.

    I am starting to wonder just how real-time this can be. If frames are being overlapped, each one is essentially being shifted back to overlap with the previous one. While this is happening, the input data just keeps coming. So it seems like a delay will arise and get worse over time.

    I have a non-real-time matlab program that just does simple compression on a 3 second .wav file. When I made a crude chart of how the matlab program works, and then tried to make a matching chart for the teensy, there seemed to be this important difference where the input data is streaming in but the compression just keeps getting farther behind the teensy due to the overlapping. Because the teensy is going to give blocks of 128 constantly no matter what, and it has to, if it is to be real time.

    There are other minor differences such as matlab being 256 and teensy being 128, and starting indexes of 1 and 0 respectively, but that wouldn't seem to change things much.

    Does anyone know of this being done before?

    Thanks,

    Gabe

    MATLAB code:
    Code:
    clear all; close all; clc;
    
    num_peaks = 129; % the number of peaks to pick 
    
    info = audioinfo('cleanspeech.wav'); % get number of samples in input wav
    Length = info.TotalSamples; % store number of samples in Length
    s_hat = zeros(Length,1); % prepare output vector of zeros, the same length as input 
    s_A_rising = zeros(128,1); % prepare the vector of zeros to capture the rising edge of every other triangular frame A
    s_A_falling = zeros(128,1); % prepare the vector of zeros to capture the falling edge of every other triangular frame A
    s_B_rising = zeros(128,1); % prepare the vector of zeros to capture the rising edge of every other triangular frame B
    s_B_falling = zeros(128,1); % prepare the vector of zeros to capture the falling edge of every other triangular frame B
    s_C = zeros(128,1); % prepare the vector of zeros to add the rising edge of the next to the falling edge of the prev and vice versa
    %filename = 'output1.wav'; % name of the output file, if needed
    
    rows = 2; % number of rows in the plot
    columns = 3; % number of columns in the plot
    
    %Find the peaks, zero out the rest, then put its ifft into s_hat
    n = (2 * Length / 256) - 1; % determine number of iterations needed to process entire input 
    %with traingular overlapping frames of 256 samples each. overall twice as many frames as
    %non-overlapping rectangular minus 1
    
    for i = 0:n-1 % number of iterations beginning at zero
        c = 128 * i; % setup for moving window of frames. 1st frame is 1-256, 2nd is 129-384, etc.
        a = 1 + c; % starting index of current frame
        b = 256 + c; % ending index of current frame
        bb = 128 + c; % ending index of rising/falling edge addition
        [s, fs] = audioread('cleanspeech.wav',[a b]); % read current frame of 256 samples. fs is intrinsic to .wav
        
        % make triangular frame by multipling sample n = 1 thru 128 by (n /
        % 128).  The second loop is the downward slope.
    
            %upward slope
            for m = 1:128
                s(m) = s(m) * m/128;
            end
            %downward slope
            for m = 1:128
                s(257-m) = s(257-m) * m/128;
            end    
     
        S = fft(s); % take Fourier Transform of current frame
        mags = zeros(num_peaks,1); % mags is a vector of zeros, of the number of picked peaks
    
        %Find the highest magnitudes and put them in to mags
        mags = maxk(S,num_peaks);
        mags = abs(mags);
    
        %Zero out all parts of S that with magnitudes less than mags
        for iii = 1:256 % 256 iterations, one for each element in the current frame 
            if abs(S(iii)) < mags(end) % if absolute value of S(iii) is less than the last, smallest, element in mags, replace with 0
                S(iii) = 0;
            end
        end
        s = ifft(S); %take inverse FFT of S and store in s
        
        IS_EVEN = ~mod(i,2); %setup for every other iteration to store and add the triangle edges
        
        if IS_EVEN == 1; %first and every other iteration use the rising edge of current triangle added to falling edge of prev
          s_A_rising = s(1:128,1); %rising edge of this triangle
          s_A_falling = s(129:256,1); %falling edge of this triangle
          s_C = s_A_rising + s_B_falling; %add this rising edge to previous falling edge and store in 128 element s_C
        else     %second and every other iteration add the falling edge of the last triangle to the rising edge of the current
          s_B_rising = s(1:128,1); %rising edge of this triangle
          s_B_falling = s(129:256,1); %falling edge of this triangle
          s_C = s_A_falling + s_B_rising; %add last rising edge to current falling edge
        end
        
        s_hat(a:bb,1) = s_C; %build output vector s_hat 128 elements at a time (edges added together) 
    end
    
    soundsc(s_hat, fs); % play audio on speakers
    
    %Plots
        s_ = audioread('cleanspeech.wav'); % read entire wav into memory for plots and calculations only
        
        % plot the single channel audio input
        subplot(rows, columns, 1); 
        plot (s_);
        title ('1-channel audio');
    
        % display number of peaks picked as integer
        ax = subplot(rows, columns, 2); % display PP at subplot position
        text(0.45,0.5,num2str(num_peaks)); % center PP text in subplot and get string from num_peaks
        title ('Peaks picked'); % subplot title
        set(gca,'xtick',[]); % remove axes
        set(gca,'ytick',[]); % remove axes
        
        % plot FFT in dB
        subplot(rows, columns, 3); 
        eps = 1e-7;
        plot(db(S(1:256)+eps)); 
        title('Freq --- dB Plot');
        
        % plot the reconstructed audio file
        subplot(rows, columns, 4);  
        plot (s_hat);
        title ('Reconstructed audio');
    
        % plot the input/output difference 
        temp = zeros(size(s_hat));
            for iiii = 1:length(s_)
                temp(iiii,1) =  s_(iiii,1);
            end
        diff = s_hat - temp;
        subplot(rows, columns, 5);  
        plot (diff);
        ylim([-1 1]);
        title ('Audio difference');
        
        %Signal to Noise Ratio
        e = s_(:) - s_hat(:);  % error = s(n) - s_hat(n) = input - output
        E = sum(e.^2); % Epsilon (error energy) = sum of squared elements of e
        P = sum(s_.^2); % P (signal energy) = sum of squared elements of single_channel
        SNR_dB = 10*log(P/E); % calculate SNR in dB where SNR = P/E
        ax = subplot(rows, columns, 6); % display SNR at subplot position
        text(0.25,0.5,num2str(SNR_dB)); % center SNR text in subplot and get string from SNR_dB
        title ('SNR_{dB}'); % subplot title
        set(gca,'xtick',[]); % remove axes
        set(gca,'ytick',[]); % remove axes
    Click image for larger version. 

Name:	MATLAB.jpg 
Views:	16 
Size:	61.2 KB 
ID:	26196
    Click image for larger version. 

Name:	Teensy.jpg 
Views:	12 
Size:	36.0 KB 
ID:	26197

  5. #5
    Senior Member
    Join Date
    Jul 2020
    Posts
    1,371
    I am starting to wonder just how real-time this can be. If frames are being overlapped, each one is essentially being shifted back to overlap with the previous one. While this is happening, the input data just keeps coming. So it seems like a delay will arise and get worse over time.
    No, the latency is a fixed number of blocks, this is just like a pipeline.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •