Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 8 of 8

Thread: Best way to process many audio samples (blocks)

  1. #1
    Member DIYLAB's Avatar
    Join Date
    Jun 2020
    Location
    Germany
    Posts
    62

    Best way to process many audio samples (blocks)

    Hello all,

    I have some modules for audio visualization, one of them is a goniometer.

    Click image for larger version. 

Name:	gonio_pixel.png 
Views:	21 
Size:	3.7 KB 
ID:	25305

    That gets its data from audio samples for the left and right channel. There are 2x 16 blocks each, so 2x 2048 samples.

    This takes time, a lot of time :/

    Code:
    short samplesLeft[2048];
    short samplesRight[2048];
    In the loop, the blocks are read one after the other:

    Code:
    /// <summary>
    /// Get samples for left and right channel.
    /// </summary>
    void getSamples(byte blocks) {
        if (queue1.available() >= blocks && queue2.available() >= blocks) {
            for (byte i = 0; i < blocks; i++) {
                memcpy(&samplesLeft[128 * i], queue1.readBuffer(), 256);
                memcpy(&samplesRight[128 * i], queue2.readBuffer(), 256);
                queue1.freeBuffer();
                queue2.freeBuffer();
            }
        }
    }
    I am looking for the best way to get the samples without the program waiting so long.
    Would a circular buffer be the solution?
    So always push the blocks one after the other into the circular buffer?
    How do you do that?

    I am very grateful for any suggestions!

    Kind regards
    DIYLAB

  2. #2
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    1,057
    You could shift the array 128 samples then read in 128 new samples. This yields an partially updated result every 128 samples.

  3. #3
    Member DIYLAB's Avatar
    Join Date
    Jun 2020
    Location
    Germany
    Posts
    62
    That's a good idea, thank you!
    So do not use a circular buffer, but push to the right and fill left?

    I tried it that way but either memmove is slow or I'm doing something wrong:

    Code:
       if (queue1.available() >= 1 && queue2.available() >= 1) {
            memmove(&samplesLeft[128], &samplesLeft[0], 3840);
            memmove(&samplesRight[128], &samplesRight[0], 3840);
    
            memcpy(&samplesLeft[0], queue1.readBuffer(), 256);
            memcpy(&samplesRight[0], queue2.readBuffer(), 256);
            queue1.freeBuffer();
            queue2.freeBuffer();
        }
    How would it have to be done correctly and especially quickly?

  4. #4
    Member DIYLAB's Avatar
    Join Date
    Jun 2020
    Location
    Germany
    Posts
    62
    Or like this?
    I don't know

    Code:
       if (queue1.available() && queue2.available()) {
            for (int i = 2047; i >= 0; i--) {
                samplesLeft[i] = samplesLeft[i - 128];
                samplesRight[i] = samplesRight[i - 128];
            }
    
            memcpy(&samplesLeft[0], queue1.readBuffer(), 256);
            memcpy(&samplesRight[0], queue2.readBuffer(), 256);
    
            queue1.freeBuffer();
            queue2.freeBuffer();
        }
    Last edited by DIYLAB; 07-21-2021 at 12:43 PM.

  5. #5
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,788
    Quote Originally Posted by DIYLAB View Post
    I tried it that way but either memmove is slow or I'm doing something wrong:
    How did you evaluate memmove() speed? And which Teensy are you using?

    Correctly measuring the timing can be tricky. First you need the source buffer filled with data the compiler can not predict. Interrupts should be disabled, so you don't accidentally measure an interrupt's execution if it happens during the test. On M7, you also need a DSB instruction to cause the processor to complete any in-progress memory transfer before you complete the measurement. Whether the source data is in the M7's cache can also affect the speed. The best way to measure short elapsed time is with the ARM_DWT_CYCCNT cycle counter, which requires more effort than simpler ways like elapsedMicros.

    For example:

    Code:
    #include <Entropy.h>
    
    short data1[2048];
    short data2[2048];
    
    void setup() {
      ARM_DEMCR |= ARM_DEMCR_TRCENA;
      ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; // turn on cycle counter
      Entropy.Initialize();
      for (int i=0; i < 2048; i++) {
        data1[i] = Entropy.random(65536);    // fill data1 with unpredictable data
      }
      Serial.begin(9600);
      while (!Serial) ;                      // wait for serial monitor
      asm("DSB" ::: "memory");               // complete everything before begin test
    
      noInterrupts();
      uint32_t begin_cycle = ARM_DWT_CYCCNT;
      memmove(data2, data1, sizeof(data1));
      asm("DSB" ::: "memory");               // wait for all memory operations complete
      uint32_t end_cycle = ARM_DWT_CYCCNT;
      interrupts();
      
      Serial.print("memmove took ");         // report results
      Serial.print(end_cycle - begin_cycle);
      Serial.println(" clock cycles");
    }
    
    void loop() {
    }
    When I run this on Teensy 4.0, the result is 1135 cycles. At 600 MHz, that's almost 2 microseconds, or about 10% of the time for 1 audio sample at 44100 Hz.

    However, that test allows the caches to help. For arrays in tightly coupled memory the cache isn't used.

    But if you wanted to test a worst case of non-cached memory and require nothing in the cache before the test and everything written back out to real memory (rather than just a DSB instruction), you'd measure like this:

    Code:
    #include <Entropy.h>
    
    DMAMEM short data1[2048];
    DMAMEM short data2[2048];
    
    void setup() {
      ARM_DEMCR |= ARM_DEMCR_TRCENA;
      ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; // turn on cycle counter
      Entropy.Initialize();
    
      Serial.begin(9600);
      while (!Serial) ;                      // wait for serial monitor
      Serial.println("generating random test data...");
      for (int i = 0; i < 2048; i++) {
        data1[i] = Entropy.random(65536);    // fill data1 with unpredictable data
      }
      arm_dcache_flush_delete(data1, sizeof(data1));  // do not leave anything in cache
    
      noInterrupts();
      uint32_t begin_cycle = ARM_DWT_CYCCNT;
      memmove(data2, data1, sizeof(data1));
      arm_dcache_flush(data2, sizeof(data2)); // force all cached data write to memory
      uint32_t end_cycle = ARM_DWT_CYCCNT;
      interrupts();
    
      Serial.print("memmove took ");         // report results
      Serial.print(end_cycle - begin_cycle);
      Serial.println(" clock cycles");
    }
    
    void loop() {
    }
    This takes 5161 cycles. Whether that worst case is meaningfull is a good question, as a normal application would allow the caches to help in most cases.

    The point is there isn't just 1 way to measure, and even if you do, getting an accurate measurement isn't easy. Hopefully this code helps.
    Last edited by PaulStoffregen; 07-21-2021 at 02:04 PM.

  6. #6
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    1,057
    > push to the right and fill left?

    Shift to the left and add new samples on the right (end of array with higher index).

  7. #7
    Member DIYLAB's Avatar
    Join Date
    Jun 2020
    Location
    Germany
    Posts
    62
    Hi Paul,

    I am using Teensy 4.0 with 600MHz clock and Teensyduino 1.54.
    Unfortunately, I guess I have a gross error in thinking and take back everything I said to memmove ;o)

    Something brakes there tremendously when I use memmove in interaction with:

    Optimized ILI9341 screen driver library for Teensy 4/4.1, with vsync and differential updates and tgx - a tiny/teensy graphics library.

    Previously I used ILI9341_t3n from KurtE and wanted to give the above driver and GFX library a chance, as I find a lot there that should do well in my project. Unfortunately it seems to be stuck in some corners, including memmove.
    What still ran very stable with ILI9341_t3n now crashes uncontrollably after different distances, too bad. Also your new CrashReport does not give any results, everything is ok.

    Anyway, either I'm just not smart enough to understand the new driver (there's not much to understand) or it's buggy.

    Thank you for your time and I will now take a closer look at moving the data in the buffer.

  8. #8
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,788
    Might be bugs, might be misunderstanding, might be something simple. Who knows?

    If you're stuck and want help, please put your effort into trimming the code down to a reasonably small program which shows the problem. If you look over the many prior threads on this forum, we're usually pretty good at helping and figuring out what's wrong when the problem is reproducible. A complete program anyone can copy into Arduino and upload to a Teensy with the right hardware is the critical first step.

    Kurt is on this forum almost every day, so if something isn't working with ILI9341_t3n, I'm pretty sure he'll see it and chime in. I'll probably take a look too. But there's usually little point to blind guessing. We really need to have a complete program to recreate the issue without guessing any of the required code.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •