Best way to process many audio samples (blocks)

Status
Not open for further replies.

DIYLAB

Well-known member
Hello all,

I have some modules for audio visualization, one of them is a goniometer.

gonio_pixel.png

That gets its data from audio samples for the left and right channel. There are 2x 16 blocks each, so 2x 2048 samples.

This takes time, a lot of time :/

Code:
short samplesLeft[2048];
short samplesRight[2048];

In the loop, the blocks are read one after the other:

Code:
/// <summary>
/// Get samples for left and right channel.
/// </summary>
void getSamples(byte blocks) {
    if (queue1.available() >= blocks && queue2.available() >= blocks) {
        for (byte i = 0; i < blocks; i++) {
            memcpy(&samplesLeft[128 * i], queue1.readBuffer(), 256);
            memcpy(&samplesRight[128 * i], queue2.readBuffer(), 256);
            queue1.freeBuffer();
            queue2.freeBuffer();
        }
    }
}

I am looking for the best way to get the samples without the program waiting so long.
Would a circular buffer be the solution?
So always push the blocks one after the other into the circular buffer?
How do you do that?

I am very grateful for any suggestions!

Kind regards
DIYLAB
 
You could shift the array 128 samples then read in 128 new samples. This yields an partially updated result every 128 samples.
 
That's a good idea, thank you!
So do not use a circular buffer, but push to the right and fill left?

I tried it that way but either memmove is slow or I'm doing something wrong:

Code:
   if (queue1.available() >= 1 && queue2.available() >= 1) {
        memmove(&samplesLeft[128], &samplesLeft[0], 3840);
        memmove(&samplesRight[128], &samplesRight[0], 3840);

        memcpy(&samplesLeft[0], queue1.readBuffer(), 256);
        memcpy(&samplesRight[0], queue2.readBuffer(), 256);
        queue1.freeBuffer();
        queue2.freeBuffer();
    }

How would it have to be done correctly and especially quickly?
 
Or like this?
I don't know :confused:

Code:
   if (queue1.available() && queue2.available()) {
        for (int i = 2047; i >= 0; i--) {
            samplesLeft[i] = samplesLeft[i - 128];
            samplesRight[i] = samplesRight[i - 128];
        }

        memcpy(&samplesLeft[0], queue1.readBuffer(), 256);
        memcpy(&samplesRight[0], queue2.readBuffer(), 256);

        queue1.freeBuffer();
        queue2.freeBuffer();
    }
 
Last edited:
I tried it that way but either memmove is slow or I'm doing something wrong:

How did you evaluate memmove() speed? And which Teensy are you using?

Correctly measuring the timing can be tricky. First you need the source buffer filled with data the compiler can not predict. Interrupts should be disabled, so you don't accidentally measure an interrupt's execution if it happens during the test. On M7, you also need a DSB instruction to cause the processor to complete any in-progress memory transfer before you complete the measurement. Whether the source data is in the M7's cache can also affect the speed. The best way to measure short elapsed time is with the ARM_DWT_CYCCNT cycle counter, which requires more effort than simpler ways like elapsedMicros.

For example:

Code:
#include <Entropy.h>

short data1[2048];
short data2[2048];

void setup() {
  ARM_DEMCR |= ARM_DEMCR_TRCENA;
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; // turn on cycle counter
  Entropy.Initialize();
  for (int i=0; i < 2048; i++) {
    data1[i] = Entropy.random(65536);    // fill data1 with unpredictable data
  }
  Serial.begin(9600);
  while (!Serial) ;                      // wait for serial monitor
  asm("DSB" ::: "memory");               // complete everything before begin test

  noInterrupts();
  uint32_t begin_cycle = ARM_DWT_CYCCNT;
  memmove(data2, data1, sizeof(data1));
  asm("DSB" ::: "memory");               // wait for all memory operations complete
  uint32_t end_cycle = ARM_DWT_CYCCNT;
  interrupts();
  
  Serial.print("memmove took ");         // report results
  Serial.print(end_cycle - begin_cycle);
  Serial.println(" clock cycles");
}

void loop() {
}

When I run this on Teensy 4.0, the result is 1135 cycles. At 600 MHz, that's almost 2 microseconds, or about 10% of the time for 1 audio sample at 44100 Hz.

However, that test allows the caches to help. For arrays in tightly coupled memory the cache isn't used.

But if you wanted to test a worst case of non-cached memory and require nothing in the cache before the test and everything written back out to real memory (rather than just a DSB instruction), you'd measure like this:

Code:
#include <Entropy.h>

DMAMEM short data1[2048];
DMAMEM short data2[2048];

void setup() {
  ARM_DEMCR |= ARM_DEMCR_TRCENA;
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; // turn on cycle counter
  Entropy.Initialize();

  Serial.begin(9600);
  while (!Serial) ;                      // wait for serial monitor
  Serial.println("generating random test data...");
  for (int i = 0; i < 2048; i++) {
    data1[i] = Entropy.random(65536);    // fill data1 with unpredictable data
  }
  arm_dcache_flush_delete(data1, sizeof(data1));  // do not leave anything in cache

  noInterrupts();
  uint32_t begin_cycle = ARM_DWT_CYCCNT;
  memmove(data2, data1, sizeof(data1));
  arm_dcache_flush(data2, sizeof(data2)); // force all cached data write to memory
  uint32_t end_cycle = ARM_DWT_CYCCNT;
  interrupts();

  Serial.print("memmove took ");         // report results
  Serial.print(end_cycle - begin_cycle);
  Serial.println(" clock cycles");
}

void loop() {
}

This takes 5161 cycles. Whether that worst case is meaningfull is a good question, as a normal application would allow the caches to help in most cases.

The point is there isn't just 1 way to measure, and even if you do, getting an accurate measurement isn't easy. Hopefully this code helps.
 
Last edited:
> push to the right and fill left?

Shift to the left and add new samples on the right (end of array with higher index).
 
Hi Paul,

I am using Teensy 4.0 with 600MHz clock and Teensyduino 1.54.
Unfortunately, I guess I have a gross error in thinking and take back everything I said to memmove ;o)

Something brakes there tremendously when I use memmove in interaction with:

Optimized ILI9341 screen driver library for Teensy 4/4.1, with vsync and differential updates and tgx - a tiny/teensy graphics library.

Previously I used ILI9341_t3n from KurtE and wanted to give the above driver and GFX library a chance, as I find a lot there that should do well in my project. Unfortunately it seems to be stuck in some corners, including memmove.
What still ran very stable with ILI9341_t3n now crashes uncontrollably after different distances, too bad. Also your new CrashReport does not give any results, everything is ok.

Anyway, either I'm just not smart enough to understand the new driver (there's not much to understand) or it's buggy.

Thank you for your time and I will now take a closer look at moving the data in the buffer.
 
Might be bugs, might be misunderstanding, might be something simple. Who knows?

If you're stuck and want help, please put your effort into trimming the code down to a reasonably small program which shows the problem. If you look over the many prior threads on this forum, we're usually pretty good at helping and figuring out what's wrong when the problem is reproducible. A complete program anyone can copy into Arduino and upload to a Teensy with the right hardware is the critical first step.

Kurt is on this forum almost every day, so if something isn't working with ILI9341_t3n, I'm pretty sure he'll see it and chime in. I'll probably take a look too. But there's usually little point to blind guessing. We really need to have a complete program to recreate the issue without guessing any of the required code.
 
Status
Not open for further replies.
Back
Top