Synchronizing the two I2S ports on Teensy 4.0

Status
Not open for further replies.

PerM

Active member
I am working on an SDR project where I need to sample and then synchronously process two IQ IF signals using WM8731 audio codecs. I have the codecs and RF parts working fine, but I have not yet found a way to reliably synchronize the two I2S buses and codecs so that my SDR code gets fed samples taken at exactly the same time from the two codecs. There seems to be an uncertain skew between the sampling of the two codecs from power cycle to power cycle.

Is there a way of making sure that I2S1 and I2S2 of the Teensy 4 start operating at precisely the same clock cycle? I have read through the relevant parts of the processor data sheet, but not found any obvious solution. I tried the following, but it does not seem to resolve the issue:

Code:
  AudioOutputI2S::config_i2s(false);
  AudioOutputI2S2::config_i2s(false);
  // Disable the DMA channels
  AudioInputI2S::dma.disable();
  AudioInputI2S2::dma.disable();
  // Reset the I2S receivers
  uint32_t I2S_RCSR_SR = 1<<24;
  I2S1_RCSR = I2S_RCSR_FR | I2S_RCSR_SR; // page 2000, reset FIFO, software reset
  I2S2_RCSR = I2S_RCSR_FR | I2S_RCSR_SR; // page 2000, reset FIFO, software reset

And later:

Code:
  // Enable I2S receivers
  I2S1_RCSR = I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE | I2S_RCSR_FR; // Enable, reset FIFO
  I2S2_RCSR = I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE | I2S_RCSR_FR; // Enable, reset FIFO
  // Enable DMA
  AudioInputI2S::dma.enable();
  AudioInputI2S2::dma.enable();

I currently run the two I2S interfaces of the Teensy 4 as masters and the WM8731s as slaves. Here is the part of the schematic showing the codecs:

codecs.png

Would it perhaps be easier to synchronize things if e.g. only I2S1 is master and route its clocks to both codecs while I2S2 would then operate as a slave? I do not have much experience with I2S, so I might be missing something obvious.
 
I would not use 2 different I2S devices, but use the two ADC in parallel with 1 I2S device (I2S1 has 4 data lines)
MCLK, BCLK, LRCLK do in parallel to both ADC
data lines are kept separate (on T4.x IN1(=IN1_1), OUT1D(=IN1_2) go to ADC1 and ADC2, OUT1A goes to ADC1)
 
Thanks WMXZ!

You are probably right that that is a better approach. I will look into it and probably patch the board and modify the software.
 
WMXZ, that is a good idea. I have a similar application that I would like to try this on. Where is the reference that shows the full set of 4 input and output pins for the I2S1 on the teensy 4.x?
 
Neal, I think you will need to dig into the big "i.MX RT1060 Processor Reference Manual". Sections 11.7.353 - 11.7.356 tell you to which pins you can mux the SAI1 (I2S1) data input pins. E.g. RX_DATA1 can be muxed to either B1_00 or B0_10. If you look at the Teensy 4.0 schematic, you can see that B0_10 is called pin 6 on the Teensy 4.0, so this should be a a usable pin for this function. (Double check since I am writing this in a bit of a hurry.)

I have not looked into which, if any, modifications one needs to do to the audio library to accommodate the additional I2S data inputs.
 
Where is the reference that shows the full set of 4 input and output pins for the I2S1 on the teensy 4.x?

This info can be found in 3 places.

1: The design tool right side documentation panel has the pins listed for each of the I2S objects. While this is several different objects, it is the most direct info about the actual usage of the software.

2: The pinout card shows the digital audio signals in yellow highlight.

card11a_rev3_web.png


However, 3 of the signals can function either as input or output and this card is limited on space, so those pins are described as outputs.

3: The reference manual is the original source for all this info. But it can be difficult to read and the way those 3 special pins work as either input or output is also just barely mentioned.
 
I see. So if I want to use SAI1_TX_DATA1 and SAI1_RX_DATA1 for a second synchronized data channel, I could use teensy 4.1 pin#32 (GPIO_B0_12) and pin# 6 (GPIO_B0_10) respectively, correct? And does
CORE_PIN32_CONFIG = 3;
CORE_PIN6_CONFIG = 3;
takes care of setting the correct direction in or out?
 
Wow. Somehow I totally missed the fact that the Audio Library already had an I2S quad object. That makes things easy. Thanks
 
Thanks everyone who helped!

I patched my board so that both codecs are clocked by I2S1 while the ADCDATA signal from the second codec is connected to pin 6 of the T4. Fortunately, this pin was not used for anything critical. By using the quad I2S input block (or rather a modified version of it that receives 32-bit data from the codecs and emits floats using Audiostream_F32), the random phase offset is gone and I get consistent results when I measure the phase between the two IF signals. By using two inputs on the same I2S block, I also saved a few pins, and most importantly, the only pin I really thought I needed among the pins on the SMD pads on the bottom of the T4. So I can get rid of using these somewhat inconvenient pins on the PCB next version. But I also just discovered the Teensy MicroMod (thanks houtson), which seems like an even better fit. At least if/when Sparkfun starts saying it is RoHS compatible (which I would think it might already be).

The Teensy boards are really a joy to work with, in part because of the great libraries, but also thanks to this forum and the helpful people here.
 
I was a bit too hasty in my conclusion of having solved the phase offset problem between the I2S inputs.

It turns out that I often do not get the desired synchronization between the two I2S inputs. After some power-ups I do indeed get zero phase offset, but usually (80-90% of the time) I get an offset which seems to be 32 samples (~0.725 ms). Since the I2S FIFO is 32 words long, I somewhat suspect that it is involved, although this may be a red herring. I have tried various ways to reset the FIFO before DMA is enabled, but with no change in behavior. I have not tried the ordinary input_i2s_quad.cpp, but the code in my version of the begin() method is very similar, except for the setup of the DMA engine that in my case pumps 32-bit words from I2S instead of 16-bit words. I suspect the ordinary input_i2s_quad.cpp has the same issue, although this might not be noticeable in most audio projects.

Here is the code for my begin() method, which I think is where this needs to be fixed.

Code:
void AudioInputI2SQuad_I32_F32::begin(bool transferUsing32bit)
{
	dma.begin(true); // Allocate the DMA channel first

	AudioOutputI2S_F32::sample_rate_Hz = sample_rate_Hz; //these were given in the AudioSettings in the constructor
	AudioOutputI2S_F32::audio_block_samples = audio_block_samples;//these were given in the AudioSettings in the constructor
	// TODO: should we set & clear the I2S_RCSR_SR bit here?
	AudioOutputI2S_F32::config_i2s(transferUsing32bit);

	const int pinoffset = 0; // TODO: make this configurable...
	I2S1_RCR3 = I2S_RCR3_RCE_2CH << pinoffset;
	switch (pinoffset) {
	  case 0:
		CORE_PIN8_CONFIG = 3;
		CORE_PIN6_CONFIG = 3;
		IOMUXC_SAI1_RX_DATA0_SELECT_INPUT = 2; // GPIO_B1_00_ALT3, pg 873
		IOMUXC_SAI1_RX_DATA1_SELECT_INPUT = 1; // GPIO_B0_10_ALT3, pg 873
		break;
	  case 1:
		CORE_PIN6_CONFIG = 3;
		CORE_PIN9_CONFIG = 3;
		IOMUXC_SAI1_RX_DATA1_SELECT_INPUT = 1; // GPIO_B0_10_ALT3, pg 873
		IOMUXC_SAI1_RX_DATA2_SELECT_INPUT = 1; // GPIO_B0_11_ALT3, pg 874
		break;
	  case 2:
		CORE_PIN9_CONFIG = 3;
		CORE_PIN32_CONFIG = 3;
		IOMUXC_SAI1_RX_DATA2_SELECT_INPUT = 1; // GPIO_B0_11_ALT3, pg 874
		IOMUXC_SAI1_RX_DATA3_SELECT_INPUT = 1; // GPIO_B0_12_ALT3, pg 875
		break;
	}
	dma.TCD->SADDR = (void *)((uint32_t)&I2S1_RDR0 + 0 + pinoffset * 4); // 2 -> 0 PM
	dma.TCD->SOFF = 4; // Step 4 bytes for each DMA read
	dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2); // SIZE(1) -> SIZE(2), 2 means 32 bit PM
	dma.TCD->NBYTES_MLOFFYES = DMA_TCD_NBYTES_SMLOE | // Minor loop enabled
	  DMA_TCD_NBYTES_MLOFFYES_MLOFF(-8) |             // Jump back 8 bytes after each minor loop
	  DMA_TCD_NBYTES_MLOFFYES_NBYTES(8);              // Transfer 8 bytes per minor loop (4->8 PM)
	dma.TCD->SLAST = -8;  // Jump back 8 bytes after each major loop
	dma.TCD->DADDR = i2s_rx_buffer;
	dma.TCD->DOFF = 4; // Destination offset, 2 -> 4, write 4 bytes at a time PM
	dma.TCD->CITER_ELINKNO = sizeof(i2s_rx_buffer) / 8; // Major loop count, 8 bytes per minor loop, PM
	dma.TCD->DLASTSGA = -sizeof(i2s_rx_buffer);
	dma.TCD->BITER_ELINKNO = sizeof(i2s_rx_buffer) / 8; // Must be the same as CITER_ELINKNO. PM
	dma.TCD->CSR = DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;
	dma.triggerAtHardwareEvent(DMAMUX_SOURCE_SAI1_RX);

	I2S1_RCSR = 0;
	I2S1_RCR3 = I2S_RCR3_RCE_2CH << pinoffset;

	// Attempt to reset the FIFO in case it is causing the common 32-sample delay between channels.
	// Does not seem to help at all...
	I2S1_RCSR = I2S_RCSR_BCE; // Disable receiver while leaving bit clock on
	while(I2S1_RCSR & I2S_RCSR_RE)
	  ; // Wait for end of frame
	I2S1_RCSR = I2S_RCSR_FR; // Reset FIFO

	// Normal register write from input_i2s_quad.cpp
	// Enable receiver and bit clock, enable DMA requests, 
	I2S1_RCSR = I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE;

	update_responsibility = update_setup();
	dma.enable();
	dma.attachInterrupt(isr);
}
 
Not sure I understand.
offset to what? 32 samples is a lot.
how did you determine offset?
 
Sorry if I was unclear and thanks for asking for clarification.

I have two WM8731 codecs. As you suggested, I rewired my board so that both are driven by the I2S1 clocks from the Teensy while the ADCDATA signals are received on different pins on the Teensy (pin 8 and pin 6) by my modified version of the input_i2s_quad block. In my application, I need to measure the phase difference between sinusoidal signals sampled by the two codecs, so it is very important that samples taken at the same time move in lock-step through my code. However, I usually do not get this desirable behavior. Most times, one of the sample streams is 32 samples ahead of the other.

The way I determined the amount of offset between the streams is a little technical, but since you asked:

The application is a kind of 2-channel software defined radio with two different antennas that receive essentially the same RF signal, but with some phase difference.
I have one local oscillator that I mix with both RF signals so that I get intermediate frequency signals in the audio range that can be sampled by the audio codecs.
I have set up a transmitter that sends out a simple unmodulated sinewave which is picked up by the two antennas and mixed down and sampled by the codecs.
My code then does FFTs of the two signals and estimates the frequency (same for both signals of course) and the phase difference between them.
If I then change the LO by say 100 Hz, the frequency seen by the codecs should of course also change by 100 Hz, but the phase difference should remain exactly the same.
Most of the time, however, I see the phase changing by about 26 degrees. 26 degrees of a 100 Hz period is 0.72 ms which is 32 samples at 44100 ksps.
So my conclusion is that after most power cycles, the system starts up in such a way that one of the two audio streams from the input_i2s_quad block is delayed by 32 samples compared to the other.
In a few "lucky" instances, the system instead starts up without this "offset" and the phase remains stable when I tune the LO around.

I hope that made my problem and observations more clear. Do not hesitate to ask again if I am still not making sense.

Per
 
If I then change the LO by say 100 Hz, the frequency seen by the codecs should of course also change by 100 Hz, but the phase difference should remain exactly the same.
Most of the time, however, I see the phase changing by about 26 degrees. 26 degrees of a 100 Hz period is 0.72 ms which is 32 samples at 44100 ksps.

Not sure, if I understand that statement. should you not relate the time delay to the frequency and not to the change in frequency, right?

not clear if you measure phase shift or time delay.
 
This might not be perfectly intuitive, but the math works out the way I tried to describe. So while I am indeed measuring the change in phase differences between the IF signals for two different LO frequencies, the result can be used to calculate the underlying undesired delay. I am pretty sure I have the math right, but if not I am certainly eager to be proven wrong.

Here is a more detailed description of my reasoning:

From the two mixers, we have two sinusoids of the same frequency, f, but with different phases. Lets call the phase difference "theta". Ignoring the unimportant amplitude and letting one of the sinusoids have zero phase, we have:

a(t) = sin(2*pi*f*t)
b(t) = sin(2*pi*f*t + theta)

This phase difference theta is the same as the original phase difference between the RF signals at the antennas. It is independent of the LO frequency, although the frequency of these IF signals, f, of course depends on the LO frequency. I can show this fact mathematically too if it would help. It is not hard.

So if the two IF signals were sampled and processed without any delay difference, my SDR code would calculate the phase difference between these signals and spit out the correct value theta. Since theta is independent of the LO frequency (and thus of f), the result would be theta regardless of my LO setting. This has happened a few times after power cycling, but usually it does not happen and the observed phase difference becomes dependent on the LO (and of the RF frequency).

Let's see what happens if one signal is delayed by a certain time, dt. The signals now seen by the SDR code are:

ad1(t) = sin(2*pi*f*t) (no change here, the delay is applied to the other signal)
bd1(t) = sin(2*pi*f*(t - dt) + theta) = sin(2*pi*f*t + theta - 2*pi*f*dt)

The SDR code will now see a phase difference of:

d_theta_1 = theta - 2*pi*f*dt.

Now let's see what happens if the LO and therefore f is changed by some amount, df.

ad2(t) = sin(2*pi*(f+df)*t)
bd2(t) = sin(2*pi*(f+df)*(t - dt) + theta) = sin(2*pi*(f+df)*t + theta - 2*pi*(f+df)*dt)

The phase difference between ad2(t) and bd2(t) is now:

d_theta_2 = theta - 2*pi*(f+df)*dt = theta - 2*pi*f*dt - 2*pi*df*dt

Thus, when changing the LO by df, the phase difference observed by the SDR (which should stay the same) instead jumps by:

dph = d_theta_1 - d_theta_2 = 2*pi*df*dt.

This expression does not contain the absolute frequencies, just the change in frequency.

The above equation is easily solved for dt:

dt = dph/(2*pi*df)

I observed the phase changing (dph) by 26 degrees = 26*2*pi/360 radians when changing the LO by df = 100 Hz. Thus:

dt = 26*2*pi/360/(2*pi*100 Hz) ~ 0.722 ms

This amounts to 44100 Hz * 0.722e-3 s ~ 32 samples.
 
As I have probably already mentioned, measuring the phase difference between the two RF signals is pretty much the most important function of this project, so it is not just something I added as a quick debugging test. The way I do it in the Teensy code is as follows:

Bandpass-filter the IF signal which is at approximately 12 kHz.
Mix it down to about 700 Hz.
Bandpass-filter it again.
Now there is essentially no signal above 1.5 kHz.
Decimate the 44.1 ksps data stream by 8, i.e. throw away 7 out of 8 samples. This gives a sample rate of 5.5125 ksps.
Collect a batch of 1024 samples (8 audio blocks) at this lower rate.
Apply a gaussian window.
Do an FFT.
Look for the peak in the FFT.
If there is a sufficiently significant peak, continue with the below steps, otherwise wait for a new batch of samples.
Interpolate the FFT bins around the peak to more precisely estimate the frequency of the signal.
Create synthetic sine and cosine waves of this frequency.
Remove the DC level of the windowed signal.
Correlate the DC-free signal with the sine and cosine.
The arctan of the ratio between the correlation factors gives the phase angle of the signal.

I do the above for the signals from both antennas and my estimate for the phase difference between the two signals is obviously just the difference between the respective phases. It works perfectly after some power-ups and I can then change the LO frequency by small amounts without affecting the phase estimates. However, after most power-ups, the estimates change if I nudge the LO or RF frequencies.

As described at length previously, the phase changes as a function of LO change are consistent with there being an unwanted delay of 32 samples at 44.1 ksps for one of the signal chains.

If the delay had instead been 128 samples, I would have guessed that the signal chains were off by one audio block, but this is not the case.
So the big question is where the mysterious and somewhat random 32-sample delay comes from.
Is it a coincidence that the delay is the same as the FIFO-length of the I2S/SAI block?
If not, how the heck do the FIFOs get out of sync in a way to cause this problem?
Otherwise, what else could cause such a delay?
 
That seems to be a little bit complex. I will try to understand it.
But,
if you have not done yet, you could (I would)
- use a low frequency signal, say around 100 Hz
- apply a FFT
- select the bin of the frequency
- divide the complex numbers (channel 2 /channel 1)
- take the natural logarithm
- take the imaginary part
- you should get the phase difference in radian

phi = imag(log(exp(1i*(om*t+phi))/exp(1i*om*t)))

This is equivalent to the arctan method but is only to understand the phase shift
It requires that you can feed date into both ADCs directly
 
Last edited:
The frequency shifting and filtering is done by code from the AudioSDR project, https://forum.pjrc.com/threads/5736...software-defined-radio)-processor-demodulator
It is actually even a bit more complicated than what I described since it implements a phasing receiver with a complex input (I and Q, delivered via the left and right audio channels) and a Hilbert filter to get rid of the unwanted side-band, but that is probably beyond the scope of this discussion.
The FFT part and phase estimation is code I have written myself. As I described, I went a step beyond looking at the phase of the strongest bin in the FFT, since the signal is usually not coherent with the FFT length (i.e. has an integer number of periods in the batch of data the FFT operates on). This spreads out the energy of the signal into a bunch of FFT bins around the peak and just looking at the phase of the peak might not give a good estimate of the phase of the underlying signal. If using a Gaussian window, there is a simple way of estimating the frequency from the FFT. Here is a comment from my code describing it:

Code:
      // Calculate the peak frequency using the peak estimation method from:
      // https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html
      // When using a Gaussian window, quadratic interpolation of the log of the power 
      // spectrum becomes an exact way of determining the frequency of the peak 
      // (assuming no noise).

Then I use this frequency estimate to create two 90-degree phase shifted sinusoids and correlate with the windowed input data to find the phase of the input signal. It is not too far from what you suggest, but a bit more involved and precise.

Anyway, after describing the phase estimation method, I realized that a random delay in the frequency shifting in the SDR code could lead to the observed results, so I started looking into that code a bit more. I first found an uninitialized phase state variable, phase_SSB (and also phase_AM which I do not use) in the AudioSDR class. Then I found a stupid error in my own code, namely that I accidentally initialized the two SDR receivers for the two antennas to have different audio filters. After fixing these bugs, my estimate of the phase difference between the two RF signals no longer changes when I change the LO frequency!

This is really good news. The initialization of the I2S block and its FIFO is no longer a suspect.

The bad news is that while the phase estimate no longer changes when I nudge the LO, it still starts up at a different value for each power cycle. :(
So there is another bug lurking, but now I am pretty sure it is hiding in code that I should be able to get on top of. It no longer looks like a random delay, but rather like a random phase is somehow added, even though my inspection of the code has not revealed any further uncontrolled or uninitialized phase.

Many thanks for the discussion so far!
 
I might have solved it or at least found a workaround. I added a state variable to my input_i2s_quad class that controls whether or not the isr() routine should call update_all() and copy data from the DMA buffer.

The state is initialized so that update_all() is not called until the sketch calls the new member function start_streaming(). I am not quite sure why this works, but it seems to work and give consistent phase information after each of at least 30 or so power cycles. By induction it is proven that it will work 100% of the time... :rolleyes:

Previously, I had tried to do something similar by calling AudioNoInterrupts() at the beginning of setup() and AudioInterrupts() at the end, but this only worked most of the time. Probably since the audio objects are created before setup() is called and the streaming that takes place in this brief period of time somehow can cause the SDR to get into an undesired state which is not properly reset during the configuration of the SDR blocks.

Is there some more elegant and official way of preventing any audio streaming from happening until the sketch tells the AudioStream system that it is time to start?
 
Is there some more elegant and official way of preventing any audio streaming from happening until the sketch tells the AudioStream system that it is time to start?

What I do with my I2S system is to start I2S (set the enable bits) programmatically.
This means you should comment the line in audio object and call it directly from sketch.
the two lines of interest are
Code:
	I2S2_RCSR = I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE | I2S_RCSR_FR; // page 2099
	I2S2_TCSR |= I2S_TCSR_TE | I2S_TCSR_BCE; // page 2087
 
Interesting. Maybe it would be a good idea to include this as a standard feature in the I2S audio blocks as it seems like at least the both of us, and presumably more people, sometimes have the need?

I guess one non-breaking way of doing it would be to add a constructor that takes an argument that determines whether the streaming should be enabled as soon as the blocks are created, or whether an explicit call to a new member function (e.g. start_streaming()) is required before the action begins.
 
Status
Not open for further replies.
Back
Top