Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 6 of 6

Thread: DSP block - Optimize FIR filters?

  1. #1
    Junior Member
    Join Date
    Aug 2017
    Posts
    19

    DSP block - Optimize FIR filters?

    Hello guys,

    I'm using this code, for the teensy 3.6, to do system identification in the audio range based on adaptive filtering.

    Is working really well but I would really like to know if there is a way of optimizing the code so I can use bigger filters with the same sampling rate. I've been working with signal processing for a while but I'm fairly new to embedded systems.

    I read that the ARM cortex M4 has some built-in optimized DSP functions. How can I use them to make the code run faster? How can I make the Fir filtering with those functions?

    I'm accepting any advices and tips.

    Obs: Im not using the audio shield




    /*

    Created in : 15/06/2017 11:26:03
    Author: Artur Zorzo

    */

    /* System Identification with LMS and white noise //

    // Teensy 3.6 */

    #include <arduino.h>

    // Declaration

    long int a = 0; // Stop criterion

    const int taps = 50; // Fir filter size

    float w[taps]; // Fir filter vector

    unsigned int x[taps]; // Signal Vector

    signed int e; // Error variable

    unsigned int r; // Random number

    unsigned int y = 0; // Filter output

    unsigned int d; // Input

    float g; // Sum

    //CONFIGURATIONS

    void setup() {

    analogReadResolution(12); // AD Resolution

    analogWriteResolution(12); // DAC Resolution

    Serial.begin(19200); // Serial communication


    analogWrite(A22, 2048);

    delayMicroseconds(30);


    }// Setup end

    //Loop


    void loop() {



    r = random(4095); // White noise generation

    x[0] = r;

    analogWrite(A22, r);

    y = 0; // y initialization


    for (int i = 0; i <= (taps - 1); i++) // Convolution
    {
    y = y + w[i] * x[i];
    }

    d = analogRead(A0);

    e = d - y; // Error calculation


    for (int i = 0; i <= (taps - 1); i++) // Filter refresh
    {
    g = (((float)x[i] * e) / (400000000.00));
    w[i] = w[i] + g;
    }



    for (int i = (taps - 1); i >= 1; i--) // Signal vector refresh
    x[i] = x[i - 1];
    }


    // Show filter coeficients

    if (a == 50000)
    {

    for (int i = 0; i <= (taps - 1); i++)
    {
    Serial.print(w[i]);
    Serial.print("\t");
    }

    while (1)
    {
    }

    }

    a = a + 1;


    } // End Loop

  2. #2
    Senior Member+ Theremingenieur's Avatar
    Join Date
    Feb 2014
    Location
    Colmar, France
    Posts
    2,384
    Even without the audio shield, you should use the audio library (using the ADC and DAC objects as input and output). This gives you great access to a large selection of DSP optimized audio generation, modification and analysis objects with a few mouse clicks.

  3. #3
    Junior Member
    Join Date
    Aug 2017
    Posts
    19
    Hi Theremingenieur,

    I tried to use them, but afterwards I couldn't find a way of acessing the individual samples and adjusting the sampling rate :/. I need to do this processing at each sample read by the ADC, no buffers involved.

  4. #4
    Senior Member+ Theremingenieur's Avatar
    Join Date
    Feb 2014
    Location
    Colmar, France
    Posts
    2,384
    ok... If you need to go with a variable sampling rate, these audio objects might not be the best solution. But studying their source code should tell you how to use the Cortex-M4 DSP extensions, though.

  5. #5
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    19,927
    The optimized FIR filter stuff is in arm_math.h.

    These are the files from the audio lib which use it. Maybe they can help you?

    https://github.com/PaulStoffregen/Au...r/filter_fir.h
    https://github.com/PaulStoffregen/Au...filter_fir.cpp

    The basic idea is you create an "arm_fir_instance_q15" and an array of "q15". Before doing any filtering, you call arm_fir_init_q15(). Then to actually compute the filter output, call arm_fir_fast_q15().

    The speed optimization comes from operating on blocks. If you process just 1 sample at a time, you'll never reap the benefit of the Cortex-M4 DSP extensions.

    Likewise, by using analogRead() and consuming CPU time to handle each sample, you're never going to get anywhere close to the kind of performance the audio library achieves using efficient DMA to get the data in and out of the chip.

    If you absolultely must have "no buffers involved", the cost will be poor performance. All the good ways to optimize involve working with buffers.

  6. #6
    Junior Member
    Join Date
    Aug 2017
    Posts
    19
    Thank you Paul.

    That really cleared things for me. Im using this codes to run an active noise control system, and the processing has to be done at each sample. The latency that the buffering brings impairs significantly the control's performance since it is a phase-based control.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •