minimizing AudioOutputUSB latency for syncing purpose

lutzray

Member
Bonjour, I'm building a GPS based timecode generator with a T4. I already have a working solution with a SAMD21, outputting an analog audio signal that goes into the camera MIC input. But now I want an USB audio class device doing the same thing, but plugged into a cellphone.

With the SAMD21 analog onboard DAC, when post processing audio and video files on a desktop, I can obtain syncing precisions of around 0.08 ms. I understand the Teensy Audio library imposes a latency of around 2.9 ms... determined by the AUDIO_BLOCK_SIZE value, fixed at 128 minimum when using AudioOutputUSB.

I don't mind if the FSK word is more or less at the right place (I already wrote a custom Audio synth object for FSK, my first c++ object!) but I would really like to position the syncing spike (identified here by a yellow vertical marker in the middle of the plot) within a precision of say, 0.1 ms... Although most people don't perceive out of sync interval (between audio and video) shorter than 10 ms, one of my selling points is "sub millisecond syncing" :)

spike.png


I know usb.c is a hell of a beast... I read this post about USB and DMA from Paul and it discouraged me...

But could it be possible to inject somewhere a 32767 value inside the DMA buffer just before it is read by the USB controller/USB PHY ?

Without any help, I'm stuck: looking into schedule_transfer(), my head turns, with or without consulting p.2468 in the i.MX RT1060 Reference Manual Rev. 1

Thanks in advance!
 
That sounds like an interesting project.

Not sure if I understand the idea/question. The setup is:
- Teensy with GPS module provides timestamp signal over USB audio
- Smartphone (Android?) records video signal from integrated camera together with this audio signal from USB
- via software several video recordings can be synchronized precisely afterwards
Do I understand that right?

Isn't it necessary to understand what the smartphone is doing in the recording / receiving procedure there? Some buffering is necessary. And some sort of sample synchronization to the video has to be done somehow because there are no synchronized clocks connected.

If there would be a fixed latency and just a running timestamp, why isn't it possible to simply shift the according data?
 
Not sure if I understand the idea/question. The setup is:
- Teensy with GPS module provides timestamp signal over USB audio
- Smartphone (Android?) records video signal from integrated camera together with this audio signal from USB
- via software several video recordings can be synchronized precisely afterwards
Do I understand that right?
This is correct but more frequently the two files synced up are audio and video files... In the filming industry this is called dual system sound. Simultaneously recording a scene, you have one camera with its timecode generator and a separate audio recorder also with its own TC generator. Each recording device is 'sacrifying' an audio channel to record the timecode signal generated by the GPS modules. My workflow: before editing the video on the desktop, a custom program decodes the sync track present in both media files to determine at which UTC time each one debuted and merges them accordingly, using the necessary offset. Then, importing into any video editing program, sound is already in sync with frames... no more use for the iconic "clap" aka "slate"... I show the process in this video demonstration. As you wrote, an audio sync track is useful too for syncing multicam shootings, aligning them on the timeline.

Isn't it necessary to understand what the smartphone is doing in the recording / receiving procedure there? Some buffering is necessary. And some sort of sample synchronization to the video has to be done somehow because there are no synchronized clocks connected.
If there would be a fixed latency and just a running timestamp, why isn't it possible to simply shift the according data?

Yes, that's exactly what I plan to do and I devised a set up for measuring that in-camera latency, exposed here: Spin Clap, a subframe audio video sync measurement device.
 
Last edited:
Let me give some more details: the FSK word I output doesn't contain any syncing section as opposed to the Manchester encoded SMPTE LTC signal (see Phil Rees description. note: eventually, I want to dislodge this standard and incite camera manufacturers to implement GPS time of day based microsecond time stamping of their recordings). So the syncing portion of my signal is this narrow spike I'm producing when the GPS module sends a square pulse aligned straight on the start of each UTC seconds (± 10 ns).

With the Teensy, upon receiving the GPS hardware interrupt, I would like to punch in a spike into the USB audio data stream the fastest possible way, outside of my AudioSynthBFSK.update(). Is it feasible without rewriting usb.c?
 
So the syncing portion of my signal is this narrow spike I'm producing when the GPS module sends a square pulse aligned straight on the start of each UTC seconds (± 10 ns).
My first thought is that you can't get better than 22.7µs alignment, because of the basic 44.1kHz sample rate of the stock audio library. That would improve slightly to 10.4µs if you change the sample rate to 96kHz, which is non-trivial.

A second thought is that you could improve the alignment by making the "square pulse" encode more information, e.g. a +32767 sample to signal the rough time, immediately followed by a second sample whose amplitude gives a high-resolution offset.

upon receiving the GPS hardware interrupt, I would like to punch in a spike into the USB audio data stream the fastest possible way, outside of my AudioSynthBFSK.update(). Is it feasible without rewriting usb.c?
Possibly. Bear in mind that there's inherent and uncontrolled (by you, typically) latency at all stages of the Teensy / cellphone audio system, so simply punching a spike in at the USB level doesn't actually give you real-time information - it probably makes it worse.

What you do have is the Teensy's audio interrupt, which occurs every ~2.902ms. On every interrupt, a new block will be generated and fed to the USB, and eventually get to the cellphone, presumably to be recorded in a file.

Let's say you write your AudioSyncPunch object to do three things.

Firstly, when your GPS interrupts, you need to capture the time; your best bet here is to use one of the Teensy hardware timers in capture mode. I'm not familiar with this, so you may have to do a bit of searching, but I'm pretty sure there are some good libraries. IIRC the peripherals run at 150MHz, so your resolution and accuracy are going to be about 6.67ns at best.

Secondly, when AudioSyncPunch::update() runs, it captures the current value of your capture timer: note the value you get is not accurate to 6.67ns, because you have no idea how long it is since the update was triggered. But it's the best you're going to get; there may be a reasonably-consistent offset you determine by experiment.

Thirdly, you generate an output audio block. If a GPS interrupt has not been captured, then the block is silent (about 343 out of 344 blocks). If it has, then you can compute which of the 128 samples should have your spike in from the captured and current timer values. Care is needed because the GPS interrupt might actually have occurred after the audio interrupt, but before your update() code was called.

To improve on the second point you could modify your copy of AudioStream::update_all() (in cores/AudioStream.h) so it not only triggers the audio software ISR, but also captures your timer value. This is still not 100% latency-free, because it's usually called from somewhere in a DMA interrupt. Better is to put it right at the start of the DMA interrupt itself (of whichever hardware you're using), and then you're only suffering latency from being blocked by higher-priority interrupts.

Alternatively, as you know the timer captures are going to happen at exact 1 second intervals, you could just use them to compute the actual Teensy clock rate and place your spikes accordingly. All Teensy clocks are generated using PLLs from a single crystal, so in theory should remain in lock-step; I don't know if that's true in practice, I've never needed to find out :)
 
Good idea! All I need to do is to predict when the next PPS pulse will arrive and generate whatever audio data I want for this moment.
The prediction is based on how much offset there was between the last pair of those events: AudioStream::update() and the PPS irq.
The offset can be measured using a RT1060 Counter or with micros(), as explained in this thread.
Assessment of jitter/latency could be done on the scope toggling a pair of pins.
 
Not sure if it is relevant ... seems related ... but not GPS sync'd. Frank_B wrote this LTC for the Audio library ... 6 years back ...
It is relevant to my project in the sens that I want to kill LTC. :)

The SMPTE standard doesn't (yet) support higher framerates than 30 fps (!); DF vs NDF is a dumpster fire of confusion and editors nightmares. Jam syncing is a pain in the ass; vendor implementation is bug prone, see here in Davinci Resolve. The list goes on and on...

I want framerate agnostic time of day metadata tags for "microseconds after the second", for start and end of each recordings (audio and video).
 
It is relevant to my project
Awesome I'll take that as a win :)
Good luck with edits to refine the time sync'ing to high res.

If you like millis() for seconds ref - tied to GPS PPS - then looking at micros() you'll have seen the use of ARM_CLK to get time since last Millis_sys_tick that @defragster wrote.
 
Let's say you write your AudioSyncPunch object to do three things.

Firstly, when your GPS interrupts, you need to capture the time; your best bet here is to use one of the Teensy hardware timers in capture mode. I'm not familiar with this, so you may have to do a bit of searching, but I'm pretty sure there are some good libraries. IIRC the peripherals run at 150MHz, so your resolution and accuracy are going to be about 6.67ns at best.

Secondly, when AudioSyncPunch::update() runs, it captures the current value of your capture timer: note the value you get is not accurate to 6.67ns, because you have no idea how long it is since the update was triggered. But it's the best you're going to get; there may be a reasonably-consistent offset you determine by experiment.

Thirdly, you generate an output audio block. If a GPS interrupt has not been captured, then the block is silent (about 343 out of 344 blocks). If it has, then you can compute which of the 128 samples should have your spike in from the captured and current timer values. Care is needed because the GPS interrupt might actually have occurred after the audio interrupt, but before your update() code was called.

Alternatively, as you know the timer captures are going to happen at exact 1 second intervals, you could just use them to compute the actual Teensy clock rate and place your spikes accordingly. All Teensy clocks are generated using PLLs from a single crystal, so in theory should remain in lock-step; I don't know if that's true in practice, I've never needed to find out :)

Below is my implementation (it simply places full scale 32767 spikes at each PPS into the otherwise silent audio stream ). I haven't yet implemented h4yn0nnym0u5e's clock drift correction suggestion (my T4 clock is over by only 48 ppm, ie two samples too short each second: not problematic. Next assignment: sniffing the USB data to evaluate latency between the hardware interrupt and the actual spike position with my DSLogic).

It seems to work but from time to time there's a miss (around 11 misses in this 11 minutes recording):
Screen Shot 2025-02-02 at 8.22.50 PM.png


Is there such a thing as interrupts collision? What if the hardware 1 PPS arrives more or less at the same time as the update() call? Should I block interrupts somewhere in my code?

Code:
#include <Audio.h>
#define CYCLES_PER_SAMPLE 13605  // 600 MHz/44100 Hz

class AudioSyncPunch : public AudioStream {
public:
  AudioSyncPunch()
    : AudioStream(0, NULL) {}
  void update(void);
  volatile int blocks_to_go_before_PPS{ 0 };
  volatile int samples_to_go_before_PPS{ 0 };
  volatile uint32_t audio_update_CLCK_COUNT;
};
void AudioSyncPunch::update(void) {
  audio_block_t *block;
  blocks_to_go_before_PPS--;
  audio_update_CLCK_COUNT = ARM_DWT_CYCCNT;
  block = allocate();
  if (block) {
    memset(block->data, 0, sizeof(block->data));
    if (blocks_to_go_before_PPS == 0) {               // PPS in this block
      block->data[samples_to_go_before_PPS] = 32767;  // max signed int16
    }
    transmit(block);
    release(block);
    return;
  }
}
AudioSyncPunch punch1;
AudioOutputUSB usb1;
AudioConnection patchCord1(punch1, 0, usb1, 0);
AudioConnection patchCord2(punch1, 0, usb1, 1);
const byte PPSpin = 22;
// ISR PPS flag
volatile bool PPSarrived = false;
volatile uint32_t cycles_count_on_PPS, cycles_after_update_call;
void PPS_isr(void) {
  PPSarrived = true;
  cycles_count_on_PPS = ARM_DWT_CYCCNT;
}
void setup() {
  AudioMemory(10);
  attachInterrupt(digitalPinToInterrupt(PPSpin), PPS_isr, RISING);
}
void loop() {
  if (PPSarrived) {
    int samples_after_update_call = (cycles_count_on_PPS - punch1.audio_update_CLCK_COUNT) / CYCLES_PER_SAMPLE;
    int whole_blocks = (samples_after_update_call + 44100) / AUDIO_BLOCK_SAMPLES;
    int remaining_samples = (samples_after_update_call + 44100) % AUDIO_BLOCK_SAMPLES;
    punch1.blocks_to_go_before_PPS = whole_blocks;
    punch1.samples_to_go_before_PPS = remaining_samples;
    PPSarrived = false;
  }
}
 

Attachments

  • Screen Shot 2025-02-02 at 8.20.24 PM.png
    Screen Shot 2025-02-02 at 8.20.24 PM.png
    13.5 KB · Views: 7
Last edited:
Yes, the Cortex M7 used by Teensy has what's termed "nested" interrupts, which means that a higher priority interrupt can interrupt a lower-priority one that's already in progress.

In this case, the pin interrupt is (I think) at level 128, whereas the audio interrupt is level 208; by the weird system used by ARM, this means the pin interrupt is higher priority... You need to think very carefully about the possible sequences and overlaps to ensure PPS pulses aren't lost, and the audio update doesn't use partial / stale information, and loop() doesn't overwrite information not yet consumed by the audio update.

To give one example, what happens if the audio update occurs between the following two lines of loop()? And what happens if these lines are executed when blocks_to_go_before_PPS hasn't counted to zero?
C++:
punch1.blocks_to_go_before_PPS = whole_blocks;
punch1.samples_to_go_before_PPS = remaining_samples;
 
Yes, the Cortex M7 used by Teensy has what's termed "nested" interrupts, which means that a higher priority interrupt can interrupt a lower-priority one that's already in progress.

In this case, the pin interrupt is (I think) at level 128, whereas the audio interrupt is level 208; by the weird system used by ARM, this means the pin interrupt is higher priority... You need to think very carefully about the possible sequences and overlaps to ensure PPS pulses aren't lost, and the audio update doesn't use partial / stale information, and loop() doesn't overwrite information not yet consumed by the audio update.

To give one example, what happens if the audio update occurs between the following two lines of loop()? And what happens if these lines are executed when blocks_to_go_before_PPS hasn't counted to zero?
C++:
punch1.blocks_to_go_before_PPS = whole_blocks;
punch1.samples_to_go_before_PPS = remaining_samples;

And if an interrupt is temporarily blocked (eg with AudioNoInterrupts() for the Audio Library) , is it lost, or simply postponed?
Maybe I could suspend hardware interrupts only (is this possible? found this), since the 1 PPS signal stays up for a long time (more than 10 ms IIRC).
 
Last edited:
For the most part a masked interrupt will not be lost, just delayed for (up to) however long it takes your code between the disable/enable calls to run. The accepted wisdom is thus to keep such code as short as possible, and never make calls to library functions while interrupts are masked (or indeed from inside an ISR).

The duration of your PPS signal is irrelevant, as you’ve rightly got it triggered by the rising edge. Once triggered it will remain pending until your ISR is serviced, which will not necessarily be immediately, if a higher priority interrupt is executing or interrupts are masked. If bad code results in it remaining pending for over a second, then of course you will have lost a PPS pulse…

To mask all interrupts you’d use __disable_irq(), and __enable_irq() to unmask them. AudioNoInterrupts() is the correct choice if all you care about is masking the audio update.
 
All I need to do is to predict when the next PPS pulse will arrive and generate whatever audio data I want for this moment.
The prediction is based on how much offset there was between the last pair of those events: AudioStream::update() and the PPS irq.
The offset can be measured using a RT1060 Counter or with micros(), as explained in this thread.
Assessment of jitter/latency could be done on the scope toggling a pair of pins.

Regarding the use of GPS for accurate timestamps, that is the essence of the T4.1 NTP server at the github and forum links below. The period of the PPS signal is measured via timer input capture. The measurements are filtered via PID to compute a "conversion" from timer count to absolute time. The code is organized as classes, with a getTime() method that returns the current time as a 64-bit integer with UNIX seconds in the high word and fractional seconds in the low word. In my testing, with a reliable GPS signal, I found the method good to better than 0.1 us.


 
Back
Top