Teensy 4 SPDIF input + ASRC

Is the results what you expected?
I could only test the algorithm with my hardware and until now nobody else posted results of some tests. So there is not much empirical data. I assumed that
the results are much better than with results of your test with the analog signal.

Regarding your questions about the resampling algorithm: First I want to mention that I wasn't involved in the developement of the algorithm.
You are probably aware of that, but I want to mention it in case somebody skims the thread and gets that impression.

In the introduction of the document, the authors write:
"The algorithm effectively implements the “analog interpretation” of rate conversion, as discussed in [1], in which a certain lowpass-filter impulse response must be available as a continuous function. Continuity of the impulse response is simulated by linearly interpolating between samples of the impulse response stored in a table."
In section 3.3 they further desribe the “analog interpretation”. If you stick to this interpretation, then there is no L and M factor. A new sample can be computed at arbitrary times t
by shifting the sinc filter to t, computation of the interpolation weights by evaluating the sinc-function and computing the linear combination of the input samples.

If you want to interpret the algorithm in the "standard" way, I can tell you at least what L is: The table with the lowpass-filter contains a densely sampled version of the filter.
L is the factor by which the lowpass-filter is more densely sampled than the input signal.
The algorithm first interpolates by a factor L and new samples are computed at arbitrary positions by linear interpolation of the signal.


At the Teensy implementation a Kaiser-windowed sinc-filter is used as lowpass-filter and the limiting factor is the size of the table that stores the coefficients of the filter.
L depends on the length of the filter. So I can't tell you a single number, but give you an examples of N:

Currently the size of the table is 20*1024 + 1 samples (MAX_FILTER_SAMPLES in resampler.h / only one wing of the filter is stored).
The default resampling parameters are:
attenuation=100
minHalfFilterLength=20
maxHalfFilterLength=80

Let's say the input frequency is 48kHz.
The lowpass-filter is designed as described here:
https://tomroelandts.com/articles/how-to-create-a-configurable-filter-using-a-kaiser-window
Parameter b is chosen in order to prevent aliasing frequencies below 20kHz.

This results in a filter length of 77 ( 2*input.getHalfFilterLength() +1).
N=floor(20*1024/38)=538
 
Last edited:
I could only test the algorithm with my hardware and until now nobody else posted results of some tests. So there is not much empirical data. I assumed that
the results are much better than with results of your test with the analog signal.

thanks. here's the results on the analog signal. I think I recorded it with a lower volume than the SPDIF one I showed you in my last post.

linein.jpg
 
Thanks for sharing the result.

You're welcome. As you see, the results seem better for the spdif input. However, if I understand correctly, the Teensy processes 16-bit data which will not show much information on the noise floor below 96dB. As results on the audio input, are nevertheless quite good, I have decided not to use the spdif optical i/o connectors for my AGC/Compressor box. The box is a little too small to accomodate the additional space required by them in comparison to the actual audio jacks. However, I will keep in mind the possibility of using the spdif asrc input for future projects.
 
However, if I understand correctly, the Teensy processes 16-bit data which will not show much information on the noise floor below 96dB.
The quantization noise is about 98dB below max signal power for 16 bits (for a square wave, or 95dB for a sinusoid).
This is the noise power, not the noise's PSD (power spectral density, which is per unit frequency), note, which for a bandwidth
of 20kHz would be 43 dB lower than the total noise power - the units of PSD are W/Hz though often are proxied by
an amplitude measurement of V/√Hz (since the actual power depends on impedance which is often not known).

You have to be careful interpreting FFT plots comparing signal peaks to noise floors, since the size of the FFT and
the window function's noise-bandwidth both affect the scaling of noise PSD relative to signal peak amplitude. By using
larger and larger FFT sizes you can push the noise floor lower and lower without limit if you don't correct for the FFT
bin width in Hz.

There's a thorough treatment of this here:
https://holometer.fnal.gov/GH_FFT.pdf
 
The quantization noise is about 98dB below max signal power for 16 bits (for a square wave, or 95dB for a sinusoid).
This is the noise power, not the noise's PSD (power spectral density, which is per unit frequency), note, which for a bandwidth
of 20kHz would be 43 dB lower than the total noise power - the units of PSD are W/Hz though often are proxied by
an amplitude measurement of V/√Hz (since the actual power depends on impedance which is often not known).

You have to be careful interpreting FFT plots comparing signal peaks to noise floors, since the size of the FFT and
the window function's noise-bandwidth both affect the scaling of noise PSD relative to signal peak amplitude. By using
larger and larger FFT sizes you can push the noise floor lower and lower without limit if you don't correct for the FFT
bin width in Hz.

There's a thorough treatment of this here:
https://holometer.fnal.gov/GH_FFT.pdf

OK. I will look at it. However I used the same method for analyzing the recorded samples from the audio and the Async spdif inputs.
 
@MarkT @alex6679

I haven't yet looked at https://holometer.fnal.gov/GH_FFT.pdf but I also evaluate an audio wav file with a C tool. The tool does not do FFT analysis but rather calculates energy ratios. Total energy divided by remaining energy after passing the audio thru a narrow notch filter tuned to the signal frequency (in this case, 10kHz). For spdif.wav, I obtain 87.7 dB. For linein.wav, I obtain 71.6dB.
 
I could only test the algorithm with my hardware and until now nobody else posted results of some tests. So there is not much empirical data. I assumed that
the results are much better than with results of your test with the analog signal.

Regarding your questions about the resampling algorithm: First I want to mention that I wasn't involved in the developement of the algorithm.
You are probably aware of that, but I want to mention it in case somebody skims the thread and gets that impression.

In the introduction of the document, the authors write:
"The algorithm effectively implements the “analog interpretation” of rate conversion, as discussed in [1], in which a certain lowpass-filter impulse response must be available as a continuous function. Continuity of the impulse response is simulated by linearly interpolating between samples of the impulse response stored in a table."
In section 3.3 they further desribe the “analog interpretation”. If you stick to this interpretation, then there is no L and M factor. A new sample can be computed at arbitrary times t
by shifting the sinc filter to t, computation of the interpolation weights by evaluating the sinc-function and computing the linear combination of the input samples.

If you want to interpret the algorithm in the "standard" way, I can tell you at least what L is: The table with the lowpass-filter contains a densely sampled version of the filter.
L is the factor by which the lowpass-filter is more densely sampled than the input signal.
The algorithm first interpolates by a factor L and new samples are computed at arbitrary positions by linear interpolation of the signal.


At the Teensy implementation a Kaiser-windowed sinc-filter is used as lowpass-filter and the limiting factor is the size of the table that stores the coefficients of the filter.
L depends on the length of the filter. So I can't tell you a single number, but give you an examples of N:

Currently the size of the table is 20*1024 + 1 samples (MAX_FILTER_SAMPLES in resampler.h / only one wing of the filter is stored).
The default resampling parameters are:
attenuation=100
minHalfFilterLength=20
maxHalfFilterLength=80

Let's say the input frequency is 48kHz.
The lowpass-filter is designed as described here:
https://tomroelandts.com/articles/how-to-create-a-configurable-filter-using-a-kaiser-window
Parameter b is chosen in order to prevent aliasing frequencies below 20kHz.

This results in a filter length of 77 ( 2*input.getHalfFilterLength() +1).
N=floor(20*1024/38)=538


If I correctly understood your explanations:

My understanding is:

AsyncSpdif
----------------
delay_line_length=68 (for half filter length of 34)
Ncoefs=20481
983 MACS/sample (600MHZ=>13605MACS/sample@44100Kz) at 100% or 983 for 7.23% actual processor usage)

However, I imagine there's other stuff going on besides the filtering when the command, double pUsageIn=spdifIn.processorUsage() , is issued.

And here's what I got via the serial monitor:

ASRCinfo.jpg
 
That sounds very like the title of a whitepaper from analog devices or similar, let me search:
Ah, not quite, I think this might be what I was remembering:
https://www.maximintegrated.com/en/design/technical-documents/tutorials/7/728.html

Yes. I found that it's the inverse of the THD+N (90dB is good for Sinad and -90dB for THD+N). I always thought my tool was measuring THD+N but it's really measuring SINAD.

https://www.ap.com/blog/thd-and-thdn-similar-but-not-the-same/
 
However, I imagine there's other stuff going on besides the filtering when the command, double pUsageIn=spdifIn.processorUsage() , is issued.
You are right, two things are happening:
1. The resampling of the signal.
2. Every 128 samples/ at each call of the update function, the number of samples in the input buffer is monitored (the 'buffered time' that you see in your serial monitor output.). Based on this buffered time, the step width, at which the incoming signal is resampled, is slightly updated to prevent buffer under- or overflow.

At a filterlength of 69, most of the time there are 2*68 MACS for a mono signal. If a new sample is computed at an integer value, it's 2*69.
For a stereo signal, we need to double that.
Then some other stuff needs to be done:
-Filling a small buffer inside the resampler class, so that the next chunk of samples can be resampled properly.
-The aforementioned linear interpolation at each sample.
...

Point 2 does not need many resources. It is only performed every 128 samples.

Unfortunately, I can't tell you exactly how all that computation sum up to about 7% processor usage.
 
Currently the size of the table is 20*1024 + 1 samples (MAX_FILTER_SAMPLES in resampler.h / only one wing of the filter is stored).

Thank you for all your answers!

I wanted to know if there are really 20481 (16-bit, 32-bit?) coefficients?
I'm asking all these questions because I am very interested in the subject.

The ASRC that I simulated in C (floating point) has:

delay_line_length=89
Ncoefs=2700 (45*15*(3+1) (oversamplingRate=15, polynomialOrder=3)
365 MACS/sample (delay_line_length*(polynomialOrder+1)).
thd+N=-130dB using 24-bit mono wav files with 1 & 18kHZ sine waves.

I would like to try it on the Teensy (not to compete with the current one but to experiment and learn). Unfortunately, I lack experience and knowledge of the audio library structure and the hardware associated. Ideally, I would like to code it in a sketch to start with. However, it seems difficult to cope with fixed-length 128 sample queues and access to the current fractional delay value without diving into the audio library not knowing how to swim.
 
Hi,
yes, there are really 20481 coefficients (32bit float).

(not to compete with the current one but to experiment and learn)

Why not? If your algorithm is e. g. not slower and doesn't need more memory, but at the same time has lower distortion, then I would replace my implementation with yours if you share your code- at least in my projects.

Unfortunately, I lack experience and knowledge of the audio library structure and the hardware associated
Luckily, it is not too difficult to understand the audio library and you can draw inspiration from all the classes that already exist and work. When I ported the sample rate conversion to the Teensy, I first implemented a class that only contains the bare algorithm and that has nothing to do with the audio library. In a second step I implemented an audio library class based on AudioInputSPDIF3 at which I added the resampling class.
In AudioInputSPDIF3 the two most important functions to understand are probably:
AudioInputSPDIF3::isr(): here the samples are received in chunks from spdif input.
AudioInputSPDIF3::update(): sends out the blocks of audio samples to whatever is connected to the AudioInputSPDIF3

access to the current fractional delay value

I guess you refer to estimating how many samples there are currently in the buffer? The returned value of 'getBufferedTime()' of my implementation?
I basically use the number of samples that are currently really in the buffer + (time of the last isr call)*input frequency (+ low pass filtering) to get this value.
I spend by far the most time on that. In fact, this discussion motivated me to have a look at the problem again. At the first implementation I used for example micros() to measure time and I replaced that now with ARM_DWT_CYCCNT, which improved the accuracy of the estimation of buffered time.
 
Hi,
yes, there are really 20481 coefficients (32bit float).



Why not? If your algorithm is e. g. not slower and doesn't need more memory, but at the same time has lower distortion, then I would replace my implementation with yours if you share your code- at least in my projects.

Actually the current asrc that you have implemented has a very low distortion and I have no intention to try and replace it. I am only trying to validate my studies using real hardware on an asrc with polyphase sinc filter and dynamic coefficients. Up till now I was only able to simulate it in C and Matlab with wav files. I started another thread if you are interested or can make suggestions. Thanx!

Please see: https://forum.pjrc.com/threads/6888...olynomial-approximations-for-its-coefficients
 
As mentioned above, I spent some in December to evaluate the algorithms of the AsyncAudioInputSPDIF3. I was able to improve several points and finally found some time to share my results here.

The buffer, in which incoming samples are stored, is constantly monitored and the step width at which the incoming signal is resampled is constantly slightly adjusted in order to prevent buffer under- and overflow. I improved the estimation of the number of buffered samples and these necessary adjustments were therefore reduced. The 'jitter', that is caused by incorrect buffer estimation is therefore reduced. The figures below show the FFT of a 48kHz j-test signal that I sent from one T4 to a T4.1 via spdif.
Over all the frequency responses doesn't differ that much:
Old algorithm:
oldAlg.jpg
New algorithm:
newAlg.jpg
If we zoom in and look at the details close to the 12kHz bin, we see that the response is much smoother at the improved algorithm.
Old algorithm:
oldAlgZoom.jpg
New algorithm:
newAlgZoom.jpg
 
The other improvements concern the distortion and noise of the resampling algorithm.

At first I briefly want describe how I measured THD+N:
I wasn't sure about which notch filter to use in order to remove the fundamental. I therefore subtracted a sine wave in time domain from the result signal of the resampler. The remaining
signal should then only contain noise and distortion.
The 'ground truth' signal, that I subtracted, is easy to compute since we know how the result of the resampler should look like. The frequency is obvioudly known. The phase offset can be computed from the length of the sinc-filter of the resampler and the amplitude can be estimated from the result signal.

The improvements are:

1. The window, that I used at the sinc-filter, is a Kaiser window and the computation of the Kaiser window is quite expensive. I therefore do not compute the Kaiser window at each coefficient of the sinc function, but only at 1025 positions (for one wing of the symmetric window). The Kaiser window is then linear interpolated at all the other positions. That seemed reasonable since the curvature of the Kaiser window is much lower than the curvature of the sinc function. It turned out, that 1025 exact evaluations was a bit low and I increased the number to 4097. Higher numbers don't seem to further lower THD+N.
2. At the computation of the output samples, I optimized the order of the multiply-add operations. The sum now starts at the left and right ends of the impulse response of the sinc-filter,
where the coefficients approach zero. Therefore, first only small floating point numbers are added up. Close to the center of the impulse response there are large and small coefficients,
but the new order still improved the accuracy of the result and lowers the distortion.

I tested my changes with various input sample rates from around 44.1kHz to around 192kHz. Mostly with odd numbers like 191990Hz, since the exact standard sample rates normally never occur in real world scenarios. The output sample rates, that I tested, were 44.1 and 48Khz. The improvements in THD+N vary and in some case the distortion is lowered by about 20dB. Most of the time THD+N is now in the range of -130dB to -140dB. However, the larger the ratio from input to output sampling frequency, the higher the distortion increases. Downsampling a 1kHz sine wave from 96kHz to 44.1Khz results in about -120 THD+N.

The figure below shows the improvement achieved by changing the order of the multiply-adds:

input sampling frequency: 95900Hz
output sampling frequency: 48000Hz
sinc filter length: 95

waveform: 1kHz sine wave
THD+N old order: -137 dB
THD+N new order: -144 dB

The orange spectrum shows the result with the old order. The optimized order was used at the green spectrum. The blue spectrum belongs to the 'ground truth' wave form, that I used to remove the fundamental at the computation of THD+N.
1khz_95900_2_48k_47hfl_comparison.jpg
 
The other improvements concern the distortion and noise of the resampling algorithm.



waveform: 1kHz sine wave
THD+N old order: -137 dB
THD+N new order: -144 dB

The orange spectrum shows the result with the old order. The optimized order was used at the green spectrum. The blue spectrum belongs to the 'ground truth' wave form, that I used to remove the fundamental at the computation of THD+N.
View attachment 27466

Good job! It's more than worthy of a 24-bit audio stream
 
Back
Top