How Does Output Object Timing Work?

Status
Not open for further replies.

grinch

Well-known member
Hi, I am working on a project for which I'd like to write a custom output object for the Teensy audio library (using Teensy 3.6). To this end I have been studying the existing output objects in the Audio Library github. The code for these objects involves a lot of bit register stuff that gets pretty cryptic, so I wanted to make a post here to help check / expand my understanding.

The essential thing I'm interested in understanding is how the output objects convert from audio buffer processing, which updates at an interval determined by [sample rate / buffersize], to individual sample output, which has to output individual sample values at the sample rate.

I see in the source code for AudioOutputAnalog that the object has its own interrupt service routine which is attached to a direct memory transfer object. When this gets called it fills a source buffer for the DMA with the samples from the input buffer to AudioOutputAnalog. This DMA object then updates the DAC value sample by sample at a rate set by DMAMUX_SOURCE_PDB (not sure what this stands for). Does this seem mostly correct?

In my use case, I want to create an output object that writes to a series of shift registers chips using the Teensy's SPI port. I am going to use this to output to shift registers based on values received via the Teensy's USB audio port or possible from SD card audio files. To do this I need to call the SPI transfer function at the sample rate, or some predictable division thereof, I realize that I probably can't use DMA for this, so I'm wondering what the best method would be to ensure synchronicity and avoid issues with buffer overruns. I'm thinking I probably need to create some sort of ISR to execute the SPI transfer, but I wanted to get some advice to determine best practices before starting.

Thanks!
 
Each one works slightly differently, depending on the specific hardware it controls.

For AudioOutputAnalog, the timing is controlled by the PDB timer. You can learn more about this timer in the MK66FX1M0 reference manual.

https://www.pjrc.com/teensy/datasheets.html

The PDB timer is documented in chapter 44, starting on page 1103.
 
Each one works slightly differently, depending on the specific hardware it controls.

For AudioOutputAnalog, the timing is controlled by the PDB timer. You can learn more about this timer in the MK66FX1M0 reference manual.

https://www.pjrc.com/teensy/datasheets.html

The PDB timer is documented in chapter 44, starting on page 1103.

Do all the audio output objects rely on the DMA object interrupt? What would you recommend using as a timing source for an output that doesn't use DMA?
 
Why don't you want to use DMA? That's what I'd try first, using the same PDB timer method as the AudioOutputAnalog class.
 
They all use DMA.

Theoretically you could build an input or output using only interrupts without DMA. But if you generate 1 interrupt per audio sample, that's only 22 us between interrupts. You would need to do your work in less than 22 us. Suppose you manage to write an efficient interrupt that does its work in 4 us. That means your code could tolerate only 18 us of interrupt latency imposed by other libraries before your update misses the timing by more than 1 bit (but could catch up if the problem doesn't repeat), or a one-time latency of ~36 us, before you miss output timing. Even if you don't suffer those problems, using 4 or every 22 us is quite a lot of CPU overhead, and it gets worse if your code isn't highly efficient to do its work in just a few microseconds.

This is why we use DMA.
 
Why don't you want to use DMA?

I don't think DMA will work for my specific application. While there may theoretically be a way to hook up DMA transfers to a SPI buffer, in this case I need to transfer 64 bits at a time, and not have the chip select pin changing state during the write. I need my output interrupt to call the following function:

void WriteFunction(void) {
SPI.beginTransaction(spiSettings);
digitalWrite (SS_PIN, LOW);
SPI.transfer16(outreg[3]);
SPI.transfer16(outreg[2]);
SPI.transfer16(outreg[1]);
SPI.transfer16(outreg[0]);
digitalWrite (SS_PIN, HIGH); // sspin polarity is inverted from normal SPI operation since it is controlling the shift register output latches
SPI.endTransaction();
}

It has to write 64 bits at a time and trigger the output latch for all the registers simultaneously. Seems like an interrupt function is the only thing that's going to work here unless y'all have any suggestions for getting around that.

As far as CPU overhead, the program will be fairly simple, probably just two objects, a USB audio input and the SPI register output object. It's essentially using the Teensy to translate an audio stream from a computer to a SPI output.
 
They all use DMA.

Theoretically you could build an input or output using only interrupts without DMA. But if you generate 1 interrupt per audio sample, that's only 22 us between interrupts. You would need to do your work in less than 22 us. Suppose you manage to write an efficient interrupt that does its work in 4 us. That means your code could tolerate only 18 us of interrupt latency imposed by other libraries before your update misses the timing by more than 1 bit (but could catch up if the problem doesn't repeat), or a one-time latency of ~36 us, before you miss output timing. Even if you don't suffer those problems, using 4 or every 22 us is quite a lot of CPU overhead, and it gets worse if your code isn't highly efficient to do its work in just a few microseconds.

This is why we use DMA.

So I actually got a dirty proof of concept version of this working using a circular ring buffer and the IntervalTimer object. I've got a custom output object written that copies incoming audio blocks to an array, which the interrupt function then reads from. I'm seeing proper output values, but the timing is pretty shaky when I look at it on a scope.

It seems like part of this might be related to the resolution of interval timer, since it executes at a rate set in ultraseconds. I set this to 22 for testing, but the actual resolution of the sample rate in ultraseconds is between 22 and 23 (1,000,000 / 44117.64706 = 22.6666666661).

What do you recommend I use for creating a sample accurate timer interrupt? Would it be possible to use PDB timer to trigger an interrupt function? If so, where is the timing for PDB set up in the audio library so I can take a look at that?
 
IntervalTimer takes a float as its input, so you're not limited to integer microseconds. Internally it converts the microseconds float into the closes timing the hardware can actually provide.
 
Thoughts:

* Post the code you have.

* Post a schematic

* I'll bet you can nail the interrupt timing exactly by working with a PIT directly rather than indirectly with IntervalTimer (see processor's datasheet) .

* I think you can do this using DMA SPI in 64-byte bursts and let the interrupt handle your latch control and manipulation of the buffer pointer. That way, the interrupt only has to fire at 1/4 the sample rate. And it will require a lot less code to run in the ISR.

* Use buffer ping-pong so that update() can fill one while SPI is emptying the other.

* Since your latch pin is probably known at compile-time, use digitalWriteFast().

* Since the SPI bus is probably dedicated to the shift registers with no other devices present, you can call SPI.beginTransaction() once in your class's begin() method and just hold on to the bus.
 
Last edited:
Agreed, SPI probably can work with DMA for this case. But the SPI hardware and DMA controller are both quite complex in this chip. The learning curve is steep to program the SPI hardware directly and have it automatically generate the CS signal. Using DMA would require programming one of the timers to trigger the DMA, which might require routing its trigger output through the crossbar switch to one of the crossbar's special DMA request circuits (as OctoWS2811 does). The more straightforward way of having the timer directly generate DMA requests often doesn't work because the DMA request isn't ack'd until the timer has certain registers read or written, which doesn't happen in case like this where you want the timer to trigger a DMA channel that access another peripheral unrelated to the timer. Or another workaround for that situation involves triggering 2 DMA channels, or more specifically triggering a DMA channel that makes the timer happy, which is configured to trigger another DMA channel as it completes each minor loop. Either way (I have done both, fwiw), this gets into some pretty low-level programming that involves an extremely steep learning curve. Especially with DMA, troubleshoot when things don't work is extremely tough.

While inefficient, the interrupt approach is probably much more achievable. All of those suggestions seem like a good approach, except direct PIT timer access should not be needed. Using a float input (with 23 bit mantissa) to IntervalTimer gives you access to the full resolution of the timer (at least at relatively short intervals like this case, where the interval will use far less than the full 32 bit range of the timer).
 
Interesting problem. Actually, from the description it doesn't seem like @grinch needs to synchronize the SPI transfers to the Audio Library sample rate. Only the latch input of the shift register needs to be synchronous. The SPI can clock the samples into the register at any (sufficiently fast) rate desired (asynchronous to Audio sample rate) and then the synchronized timer interrupt handles the latching. That makes it much simpler as you can then use the DMA SPI method already supported by the SPI library.

In rough skeletal form:
Code:
#include <Audio.h>
#include <SPI.h>
#include <EventResponder.h>

class AudioSpi: public AudioStream, public EventResponder {
  public:
    AudioSpi(void) : AudioStream(1, inputQueueArray) {
      begin();
    }
    void begin();
    virtual void triggerEvent(int status = 0, void *data = nullptr);
    virtual void update(void);
    static void timerISR();

  private:
    audio_block_t *inputQueueArray[1];
};

void AudioSpi::begin() {
  // Set up circular buffer and read / write pointers
  // Set up PIT or IntervalTimer to call timerISR() at 1/4 Audio Library sample rate
  // digitalWriteFast(SS_PIN, HIGH);
  // SPI.beginTransaction();
}

void AudioSpi::update() {
  // copy 128 samples from input to circular buffer
  // writePointer += 256;  --- 128 samples * 2 bytes / sample
}

void AudioSpi::timerISR() {
  //  digitalWriteFast(SS_PIN, HIGH);   ---- latch data previosuly shifted in. synchronous with Audio Library
  //  if (# of bytes in circular buffer >= 64) {
  //    digitalWriteFast(SS_PIN, LOW);  ---- do we need a delay before this write? what is timing requirement of latch pulse?
  //    SPI.transfer((uint8_t *) circular buffer + readPointer, nullptr, 64, *this);
  //  }
}

void AudioSpi::triggerEvent(int status, void *data) {
  // readPointer += 64;
}
 
Last edited:
If the shift register's latch input is LEVEL sensitive rather than EDGE sensitive, it might look more like this:
Code:
#include <Audio.h>
#include <SPI.h>
#include <EventResponder.h>

class AudioSpi: public AudioStream, public EventResponder {
  public:
    AudioSpi(void) : AudioStream(1, inputQueueArray) {
      begin();
    }
    void begin();
    virtual void triggerEvent(int status = 0, void *data = nullptr);
    virtual void update(void);
    static void timerISR();

  private:
    audio_block_t *inputQueueArray[1];
};

void AudioSpi::begin() {
  // Set up circular buffer and read / write pointers
  // Set up PIT or IntervalTimer to call timerISR() at 1/4 Audio Library sample rate
  // digitalWriteFast(SS_PIN, HIGH);
  // SPI.beginTransaction();
}

void AudioSpi::update() {
  // copy 128 samples from input to circular buffer
  // writePointer += 256;  --- 128 samples * 2 bytes / sample
}

void AudioSpi::timerISR() {
  //  if (# of bytes in circular buffer >= 64) {
  //    digitalWriteFast(SS_PIN, LOW);
  //    SPI.transfer((uint8_t *) circular buffer + readPointer, nullptr, 64, *this);
  //  }
}

void AudioSpi::triggerEvent(int status, void *data) {
  // digitalWriteFast(SS_PIN, HIGH);
  // readPointer += 64;
}
 
Last edited:
Thoughts:

* Post the code you have.

* Post a schematic

* I'll bet you can nail the interrupt timing exactly by working with a PIT directly rather than indirectly with IntervalTimer (see processor's datasheet) .

* I think you can do this using DMA SPI in 64-byte bursts and let the interrupt handle your latch control and manipulation of the buffer pointer. That way, the interrupt only has to fire at 1/4 the sample rate. And it will require a lot less code to run in the ISR.

* Use buffer ping-pong so that update() can fill one while SPI is emptying the other.

* Since your latch pin is probably known at compile-time, use digitalWriteFast().

* Since the SPI bus is probably dedicated to the shift registers with no other devices present, you can call SPI.beginTransaction() once in your class's begin() method and just hold on to the bus.

Actually working quite well now. The digitalWriteFast() function helped a ton. Removed a bunch of wobble from the waveform. I'm also calling the latch write first thing in my output function, so that it calls synchronously every time. I then write the data via SPI which will get transferred to the output on the next latch write. Only adds one sample of latency and makes the output timing very stable. Calling myTimer.begin(doOutput, 22.6666666661), seems to be creating a sample accurate interrupt. My class for writing / reading the SPI buffer has a flag to catch read / write overruns, and it looks like this doesn't ever get triggered as long as I have my computer connected to the Teensy USB Audio device.

Big thanks to everyone for their help, still open to suggestions for how to further optimize this if y'all see anything, but also pretty satisfied with how it's working thusfar.

Here is my code:

Teensy Sketch:
Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

AudioInputUSB            usb1;           
AudioOutputAnalog        dac;           //just to trigger audio library update 
AudioOutputCopyBuffer    copyBuffer;
AudioConnection          patchCord1(usb1, 0, dac, 0);
AudioConnection          patchCord2(usb1, 0, copyBuffer, 0);
AudioConnection          patchCord3(usb1, 1, copyBuffer, 1);

SPISettings spiSettings(25000000, MSBFIRST, SPI_MODE0); 
IntervalTimer myTimer;

#define SS_PIN 10

uint16_t out1, out2;

void doOutput(){
  __disable_irq();
  digitalWriteFast(SS_PIN, HIGH);
  copyBuffer.readFromBuffers(&out1, &out2);
  digitalWriteFast(SS_PIN, LOW);
  SPI.transfer16(out2); 
  SPI.transfer16(out1); 
  __enable_irq();
}

void setup() {                
  AudioMemory(12);
  Serial.begin(115200);
  pinMode(SS_PIN, OUTPUT);
  digitalWriteFast(SS_PIN, LOW);
  SPI.begin();
  SPI.beginTransaction(spiSettings);
  myTimer.priority(0);
  myTimer.begin(doOutput, 22.6666666661); 
  copyBuffer.begin();
}

void loop() {
  Serial.println(out1);
  Serial.println(out2);
  if(copyBuffer.checkOverrunAndClear()){
    Serial.println("Error! Overrun!");
  }
  Serial.println("");
  delay(1000);
}

output_copybuffer.h:
Code:
#ifndef output_copybuffer_h_
#define output_copybuffer_h_

#define COPY_BUFFER_COUNT 4

#include "Arduino.h"
#include "AudioStream.h"
#include "DMAChannel.h"

class AudioOutputCopyBuffer : public AudioStream
{
public:
	AudioOutputCopyBuffer(void) : AudioStream(2, inputQueueArray) { begin(); }
	virtual void update(void);
	void begin(void);
	void readFromBuffers(int16_t *p1, int16_t *p2);
	bool checkOverrunAndClear(){
		bool check = error;
		error = false;
		return check;
	};
private:
	int16_t leftBuffer[COPY_BUFFER_COUNT][AUDIO_BLOCK_SAMPLES];
	int16_t rightBuffer[COPY_BUFFER_COUNT][AUDIO_BLOCK_SAMPLES];
	int16_t write, read;
	int16_t index = 0;
	bool error = false;
	audio_block_t *inputQueueArray[2];
	static bool update_responsibility;
};

#endif

output_copybuffer.cpp:
Code:
#include <Arduino.h>
#include "output_copybuffer.h"
#include "utility/pdb.h"

bool AudioOutputCopyBuffer::update_responsibility = false;

void AudioOutputCopyBuffer::begin(void)
{
	write = 0;
	read = 0;
	error = false;
	index = 0;
}


void AudioOutputCopyBuffer::update(void)
{
	audio_block_t *b1;
	audio_block_t *b2;
	b1 = receiveReadOnly(0); // input 0
	b2 = receiveReadOnly(1); // input 1
	if (!b1 || !b2) {
		return;
	}

	__disable_irq();
	memcpy(leftBuffer[write], b1->data, AUDIO_BLOCK_SAMPLES * 2);
	memcpy(rightBuffer[write], b2->data, AUDIO_BLOCK_SAMPLES * 2);
	write++;
	if(write >= COPY_BUFFER_COUNT){ write = 0;}
	__enable_irq();
	release(b1);
	release(b2);
}


void AudioOutputCopyBuffer::readFromBuffers(int16_t *p1, int16_t *p2)
{
	if(write != read){
		*p1 = leftBuffer[read][index];
		*p2 = rightBuffer[read][index];
		index++;
		if(index >= AUDIO_BLOCK_SAMPLES){
			index = 0;
			read++;
			if(read >= COPY_BUFFER_COUNT){
				read = 0;
			}
		}
	}else{
		error = true;
	}
}
 
Hey all, this is mostly working well, but I'm experiencing an issue with dropped audio frames. Would be incredibly helpful if y'all were able to advise. Made a post about it here: POST
 
Status
Not open for further replies.
Back
Top