My attempt at a multirate WAV file player

Status
Not open for further replies.

MarkT

Well-known member
Based on the interface of AudioPlaySdWav, but with support for both rate conversion and asynchronous buffering of
SDcard reads.

https://github.com/MarkTillotson/Audio/tree/multirate_sd_player

[ The goodness is in play_sd_univ_wav.{h,cpp}, plus a couple of lines added to Audio.h ]

Here's the performance I'm getting on a T4.0 @ 600MHz plus Audio adapter rev D:

Code:
Sample rate   bitdepth       stereo/mono          %age cpu time     %age cpu time 
                                                  in audio ints     reading SD
 8000             8               mono               1.38%             1.1%
11025            24               mono               1 -- 2%           4.5%
22050            16              stereo              ~3.5%            12.4%
32000            16              stereo              5 -- 9%          17.8%
44100            16               mono               0.49%            12.9%
44100            16              stereo              1.01%            22.6%
44100            24              stereo              1.41%            33%
48000            16              stereo              12.0%            26.5%
48000            24              stereo              12.3%            38.8%
96000            16              stereo              12.5%            52.7%
96000            24              stereo              -- fails to keep up --
The supported rates are 8k/11k/16k/22k/32k/44.1k/48k/96k, mono/stero, 8/16/24 bit.
I've not tested all combinations(!)

Rate conversion is done using 16 bits throughout, samples and filter coefficients, and
filter sizes are modest due to the multi-step factorization of the conversion.

I've defined some circular buffer classes as helpers, one to buffer SDcard sector data,
one to buffer 16 bit samples between filter stages.

The user code has to callback to a readaheadTask() method during playback to drive
the SD reads to the buffer. If the SDcard stalls and this buffer drains the code inserts
504 (a multiple of 24 bytes) zeros rather than 512 to fill the gap (being a multiple of
2,3,4 and 6 this not disturbing the registration with samples whatever the bit depth
when the SDcard finally spits out data).

44100 sample rate is the easy case, no rate conversion.
11025 and 22050 use a single interpolation stage, and all the other rates use three
stages of rate conversion, the first being typically a modest up-sampling to create
headroom for easy filtering.

The resamplings are:
8000 -> 12000 -> 12600 -> 44100
11025 -> 44100
16000 -> 24000 -> 25200 -> 44100
22050 -> 44100
32000 -> 48000 -> 50400 -> 44100
48000 -> 72000 -> 100800 -> 44100
96000 -> 72000 -> 100800 -> 44100

This is the testcode I've been using - obviously you'd need to find/make testfiles in various formats
to play with this (my test files are long, 100's of MB in some cases).

I've seen some SDcard glitching sporadically, sometimes a stream of many slow reads happens
(several dozen all taking 50ms or so, which is far beyond how can be buffered without aux memory)
Sometimes there's just a couple of slow reads and the buffering handles it just fine.

The AudioStartUsingSPI and AudioStopUsingSPI (are these hacks?) are avoided as the SDcard is handled
asynchonously/in the background.

Comments are welcome of course, and I'll be continuing to try to improve this and do more testing
including checking the filter responses aren't anything unexpected (compared to the prototyped
versions in Python/Scipy)

edit: Oh, and my test sketch:
Code:
#include <Audio.h>
#include <SPI.h>

AudioPlaySdWavUniv       play;
AudioOutputI2S           out;
AudioConnection          con0(play, 0, out, 0);
AudioConnection          con1(play, 1, out, 1);
AudioControlSGTL5000     sgtl5000 ;

// Use these with the Teensy Audio Shield
#define SDCARD_CS_PIN    10
#define SDCARD_MOSI_PIN  7
#define SDCARD_SCK_PIN   14

// Use these with the Teensy 3.5 & 3.6 SD card
//#define SDCARD_CS_PIN    BUILTIN_SDCARD
//#define SDCARD_MOSI_PIN  11  // not actually used
//#define SDCARD_SCK_PIN   13  // not actually used

// Use these for the SD+Wiz820 or other adaptors
//#define SDCARD_CS_PIN    4
//#define SDCARD_MOSI_PIN  11
//#define SDCARD_SCK_PIN   13

void setup() 
{
  Serial.begin(115200);

  // Audio connections require memory to work.  For more
  // detailed information, see the MemoryAndCpuUsage example
  AudioMemory(18);

  // Comment these out if not using the audio adaptor board.
  // This may wait forever if the SDA & SCL pins lack
  // pullup resistors
  sgtl5000.enable ();
  sgtl5000.volume (0.4);
  sgtl5000.lineOutLevel (15);

  SPI.setMOSI (SDCARD_MOSI_PIN);
  SPI.setSCK (SDCARD_SCK_PIN);
  if (!(SD.begin (SDCARD_CS_PIN))) 
  {
    // stop here, but print a message repetitively
    while (true) 
    {
      Serial.println("Unable to access the SD card");
      delay(500);
    }
  }
}

void playFile(const char *filename)
{
  Serial.printf ("Playing file: %s\n", filename);

  play.play(filename);

  int len = play.lengthMillis();
  Serial.printf ("length %i ms\n", len);
  AudioProcessorUsageMaxReset();
  AudioMemoryUsageMaxReset();

  int min_lwm = 10000 ;
  uint32_t last_task = millis() ;
  uint32_t start_micros = micros() ;
  uint32_t readahead_micros = 0 ;
  bool done_a_pause = false ;
  while (play.isPlaying())
  {
    readahead_micros -= micros() ;
    play.readaheadTask();
    readahead_micros += micros() ;
    delay (5) ;
    uint32_t now_task = millis() ;
    if (now_task - last_task > 15)
      Serial.printf ("task delayed %i ms\n", now_task - last_task) ;
    last_task = now_task ;
   
    static unsigned long last_time = millis();
    if (millis() - last_time >= 1000) 
    {
      last_time += 1000;
      float perc_readahead = readahead_micros * 100.0 / (micros() - start_micros) ;
      Serial.printf ("SD time = %4.2f%%, Audiolib = %4.2f%% (max %4.2f%%), Blks = %i (max %i)", 
         perc_readahead, 
         AudioProcessorUsage(), AudioProcessorUsageMax(),
         AudioMemoryUsage(), AudioMemoryUsageMax()) ;
         
      AudioNoInterrupts() ;
      int lwm = play.getLWM();
      play.resetWM();
      AudioInterrupts() ;
      min_lwm = min (min_lwm, lwm) ;
      float pos = play.positionMillis();
      Serial.printf (" low-water %i(%i) at %4.2fs (%4.2f%%)\n", lwm, min_lwm, pos/1e3, 100.0*pos/len);

      // stop after 30s
      if (pos > 30e3)
      {
        play.stop();
        break ;
      }
      if (pos > 22e3 and !done_a_pause)
      {
        done_a_pause = true ;
        Serial.print ("pause...") ;
        play.togglePlayPause();
        for (int j = 0 ; j < 500 ; j++)
        {
          readahead_micros -= micros() ;
          play.readaheadTask();
          readahead_micros += micros() ;
          delay (5) ;
        }
        Serial.println ("unpause") ;
        play.togglePlayPause();
      }
    }
  }
  Serial.printf ("stopped playing %s\n", filename) ;
  Serial.println () ;
}


void loop() 
{
  // add your own testfiles to an SDcard on the audio adapter...
  playFile("SDTEST1.WAV"); delay(500) ; // filenames are always uppercase 8.3 format
  //playFile("AUDIO.WAV");delay(500);  // filenames are always uppercase 8.3 format
  //playFile ("LOWRES.WAV") ; delay (500) ;
  //playFile("ECH8X8.WAV"); delay (500) ;
  //playFile("ECH11X24.WAV"); delay (500) ;
  //playFile("ECH32X16.WAV"); delay (500) ;
  //playFile ("ECH44X24.WAV") ; delay (500) ;
  //playFile("ECH48X16.WAV"); delay (500) ;
  //playFile("ECH48X24.WAV"); delay (500) ;
  //playFile("ECH96X16.WAV"); delay (500) ;
  //playFile("ECH96X24.WAV"); delay (500) ;  // this is too much for T4 @ 600MHz: 24 bit stereo 96k
  //playFile ("ECHOES2.WAV") ;  delay (500) ;// 44.1k mono 16 bit
 
}
 
If you want to play 24bit/96Khz I suggest you convert the file into FLAC and play that from the card since it needs to read less data.

Another solution could be to increase the size of the SD card buffer in de SD code.
 
But if the SD buffer is larger then surely some reads would take longer, risking exceeding the 2.9ms available,
thus necessitating asynchronous buffering anyway? Currently single-sector reads on my SDcard take somewhat
more than 1ms most of the time, but when the SDcard decides to do something internally it can stretch to much
longer (10 to 50ms).
I've arranged than that doesn't affect the audio system other than if my buffer drains completely the play object
has to stutter, but any other audio object continues normally without delay.
 
I've been tracking down an issue with this code sometimes misbehaving, and I believe I've fixed it now,
the arm_dot_prod_q15() library call I think is expecting 4-byte alignment, and although misaligned data
works often, it fails sometimes (perhaps page boundary issue or similar?) - its sensitive to the particular
files played and the order they are played and various other things.

So I've added some complexity to ensure 4-byte alignment in the calls.

Anyway in the course of this I discovered a subtle bug with arm_dot_prod_q15():

Code:
#include <arm_math.h>

__attribute__((aligned(4))) int16_t vec1[] = {  -4, -2, +1, -1 } ;
__attribute__((aligned(4))) int16_t vec2[] = {  +1, -2, -1, -1 } ;

void correct_dot_prod (q15_t * v1, q15_t * v2, uint16_t count, q63_t * res)
{
  q63_t sum = 0 ;
  while (count-- > 0u)
    sum += *v1++ * *v2++ ;
  *res = sum ;
}

void compare_dot_prods (uint32_t count)
{
  q63_t sum, sum2;
  correct_dot_prod (vec1, vec2, count, &sum) ;
  arm_dot_prod_q15 (vec1, vec2, count, &sum2) ;
  Serial.printf ("vec%i case correct=%3lli arm_dot_prod->%3lli, difference %lli\n", count, sum, sum2, abs(sum2-sum)) ;
}

void setup() 
{ 
  delay (500) ;
  Serial.begin (115200);
  for (int i = 1 ; i <= 3 ; i++)
    compare_dot_prods (i) ;
  Serial.println () ;
  delay (10000) ;
}

void loop () {}

gives

Code:
vec1 case correct= -4 arm_dot_prod-> -4, difference 0
vec2 case correct=  0 arm_dot_prod->  1, difference 1
vec3 case correct= -1 arm_dot_prod->  0, difference 1

The code for arm_dot_prod_q15 is doing signed 16 bit loads at the very end (after the unrolled loop)
and then using the SMLALD instruction which multiplies top and bottom 16bit chunks of the registers
simultaneously and sums them.

Alas if both 16 bit values are negative both registers look like 0xffffxxxx and the top half product is 1, not 0,
causing a single LSB error to accumulate. There are only upto 3 times this can happen so the error is limited
to 3 LSBs, so I wonder if this was deemed acceptable - however note the current code can yield zero when its
definitely not zero and vice-versa, which isn't very intuitive...
 
Status
Not open for further replies.
Back
Top