Issues when modifying input_i2s.cpp

Status
Not open for further replies.

PerM

Active member
I am trying to make a version of input_i2s.cpp for Teensy 4 that would allow me to capture all 24 bits from a WM8731 audio codec and shift the bits left by a selectable number of steps before using the resulting 16 MSBs as ordinary 16-bit Teensy audio. The application is an SDR radio and I sometimes receive very weak signals where the MSBs are not exercised, while it would be great to not chop off all of the eight LSBs that are not captured by the ordinary 16-bit Teensy audio routines. (I also use the same codec to drive head phones, but I do not intend to change from 16 bits on the DAC side.)

I have not gotten this to work despite a few days of troubleshooting, reading the existing source code as well as the IMXRT1060 datasheet sections for I2S and DMA. I did put a Serial.println() in the ISR and I can see that it gets called between 1900 and 2000 times and then it does not get called any more. The loop() function in the sketch does get called. At least if I power cycle the board after reprogramming it.

While writing this post, I discovered the https://github.com/WMXZ-EU/microSoundRecorder project where 32-bit audio is captured. It seems to do very similar changes to the DMA setup as I have done, so I think I am on the right track. Although that project is for Teensy 3.6, not 4.0, so some of the changes are not directly transferable.

This is what I have done so far:

In control_wm8731.cpp I changed from 16-bit to 32-bit mode:

Code:
write(WM8731_REG_INTERFACE, 0x4E); // I2S, 32 bit, MCLK master (was 0x42, 16-bit)

This does not seem to break anything as there are plenty of unused bits in the I2S frames and the 16 bits per sample that were sent before remains where they were (MSB aligned), while extra bits are sent in previously unused clock cycles. So this is compatible with the normal audio routines.

The big changes that break things are in my version of input_i2s.cpp (called input_i2s_32.cpp). I made a new class called AudioInputI2S_32 for this purpose.

  • I added the new class as a friend class in AudioOutputI2S so that I could call AudioOutputI2S::config_i2s();
  • New constant for the buffer size, which has to be twice as big as before:
    Code:
    const static int32_t I2S_RX_BUFFER_SIZE = 2*AUDIO_BLOCK_SAMPLES; 
    DMAMEM __attribute__((aligned(32))) static uint32_t i2s_rx_buffer[I2S_RX_BUFFER_SIZE];
  • Changed SADDR from I2S1_RDR0+2 to I2S1_RDR0+0 since we want all 32 bits. (Not so sure about this change.)
  • Change ATTR SSIZE from 1 (16 bits) to 2 (32 bits)
  • Increase NBYTES_MLNO from 2 to 4.
  • Increase DOFF from 2 to 4.
  • Set CITER and BITER_ELINKNO to buffer size divided by 4 instead of 2.
  • Add the bit shifting (right shift by 16 to initially get the same result as before) in the do/while loop in the isr.
  • Removed the slave support since I do not intend to use it.

I also made the same change to ATTR_SSIZE(2) in output_i2s.c. Not sure it is necessary

As mentioned, the symptoms is that the ISR gets called almost 2000 times and then it stops getting called. loop() gets called, but possibly much less frequently. After about 25 minutes of this, the ISR routine gets called about 80 times again and after that nothing more seems to happen.

Here is the code from my new file, input_i2s_32.cpp where most of the changes I described above are located:

Code:
#include <Arduino.h>
#include "input_i2s_32.h"
#include "output_i2s.h"

const static int32_t I2S_RX_BUFFER_SIZE = 2*AUDIO_BLOCK_SAMPLES; // PM, 2*32 bits per stereo sample
//const static int32_t I2S_RX_BUFFER_SIZE = AUDIO_BLOCK_SAMPLES; // PM, 2*16 bits per stereo sample

DMAMEM __attribute__((aligned(32))) static uint32_t i2s_rx_buffer[I2S_RX_BUFFER_SIZE]; // PM
audio_block_t * AudioInputI2S_32::block_left = NULL;
audio_block_t * AudioInputI2S_32::block_right = NULL;
uint16_t AudioInputI2S_32::block_offset = 0;
bool AudioInputI2S_32::update_responsibility = false;
DMAChannel AudioInputI2S_32::dma(false);


void AudioInputI2S_32::begin(void)
{
	dma.begin(true); // Allocate the DMA channel first

	//block_left_1st = NULL;
	//block_right_1st = NULL;

	// TODO: should we set & clear the I2S_RCSR_SR bit here?
	AudioOutputI2S::config_i2s();

#if defined(KINETISK)
	CORE_PIN13_CONFIG = PORT_PCR_MUX(4); // pin 13, PTC5, I2S0_RXD0
	dma.TCD->SADDR = (void *)((uint32_t)&I2S0_RDR0 + 2);
	dma.TCD->SOFF = 0;
	dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(1) | DMA_TCD_ATTR_DSIZE(1);
	dma.TCD->NBYTES_MLNO = 4; // PM
	dma.TCD->SLAST = 0;
	dma.TCD->DADDR = i2s_rx_buffer;
	dma.TCD->DOFF = 4; // PM
	dma.TCD->CITER_ELINKNO = sizeof(i2s_rx_buffer) / 4; // PM
	dma.TCD->DLASTSGA = -sizeof(i2s_rx_buffer);
	dma.TCD->BITER_ELINKNO = sizeof(i2s_rx_buffer) / 4; // PM
	dma.TCD->CSR = DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;
	dma.triggerAtHardwareEvent(DMAMUX_SOURCE_I2S0_RX);

	I2S0_RCSR |= I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE | I2S_RCSR_FR;
	I2S0_TCSR |= I2S_TCSR_TE | I2S_TCSR_BCE; // TX clock enable, because sync'd to TX

#elif defined(__IMXRT1062__)
	CORE_PIN8_CONFIG  = 3;  //1:RX_DATA0
	IOMUXC_SAI1_RX_DATA0_SELECT_INPUT = 2; // Use pin D8 on Teensy 4 as I2S ADCDATA

	dma.TCD->SADDR = (void *)((uint32_t)&I2S1_RDR0 + 0); // 2 -> 0 PM 
	dma.TCD->SOFF = 0; // 0: always read from the same address
	dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(1); // SSIZE(1) -> SSIZE(2) PM
	dma.TCD->NBYTES_MLNO = 4; // 2 -> 4, read 4 bytes at a time PM
	dma.TCD->SLAST = 0;
	dma.TCD->DADDR = i2s_rx_buffer;
	dma.TCD->DOFF = 4; // Destination offset, 2 -> 4, read 4 bytes at a time PM
	dma.TCD->CITER_ELINKNO = sizeof(i2s_rx_buffer) / 4; // 2 -> 4 PM
	dma.TCD->DLASTSGA = -sizeof(i2s_rx_buffer);
	dma.TCD->BITER_ELINKNO = sizeof(i2s_rx_buffer) / 4; // 2 -> 4 PM
	dma.TCD->CSR = DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;
	dma.triggerAtHardwareEvent(DMAMUX_SOURCE_SAI1_RX);

	I2S1_RCSR = I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE | I2S_RCSR_FR;
#endif
	update_responsibility = update_setup();
	dma.enable();
	dma.attachInterrupt(isr);
}

void AudioInputI2S_32::isr(void)
{
	uint32_t daddr, offset;
	const int32_t *src, *end; // int16 -> int32 PM
	int16_t *dest_left, *dest_right;
	audio_block_t *left, *right;
	static int32_t count = 0; // PM debug

#if defined(KINETISK) || defined(__IMXRT1062__)
	daddr = (uint32_t)(dma.TCD->DADDR);
	dma.clearInterrupt();
	Serial.print("isr"); // PM debug

	if (daddr < (uint32_t)i2s_rx_buffer + sizeof(i2s_rx_buffer) / 2) {
		// DMA is receiving to the first half of the buffer
		// need to remove data from the second half
		src = (int32_t *)&i2s_rx_buffer[I2S_RX_BUFFER_SIZE/2]; // int16 -> int32 PM
		end = (int32_t *)&i2s_rx_buffer[I2S_RX_BUFFER_SIZE]; // int16 -> int32 PM
		if (AudioInputI2S_32::update_responsibility) AudioStream::update_all();
	} else {
		// DMA is receiving to the second half of the buffer
		// need to remove data from the first half
		src = (int32_t *)&i2s_rx_buffer[0]; // 16 -> 32 PM
		end = (int32_t *)&i2s_rx_buffer[I2S_RX_BUFFER_SIZE/2]; // 16 -> 32 PM
	}
	left = AudioInputI2S_32::block_left;
	right = AudioInputI2S_32::block_right;
	if (left != NULL && right != NULL) {
		offset = AudioInputI2S_32::block_offset;
		if (offset <= AUDIO_BLOCK_SAMPLES/2) {
			dest_left = &(left->data[offset]);
			dest_right = &(right->data[offset]);
			AudioInputI2S_32::block_offset = offset + AUDIO_BLOCK_SAMPLES/2;
			arm_dcache_delete((void*)src, sizeof(i2s_rx_buffer) / 2);
			do {
			  // Copy and convert to 16-bit values
			  *dest_left++ = (*src)>>16; // PM
			  src++; // PM
			  *dest_right++ = (*src)>>16; // PM
			  src++; // PM
			} while (src < end);
		}
	}
	Serial.print(" exit ISR "); // PM debug
	Serial.println(count++);
#endif
}



void AudioInputI2S_32::update(void)
{
	audio_block_t *new_left=NULL, *new_right=NULL, *out_left=NULL, *out_right=NULL;

	// allocate 2 new blocks, but if one fails, allocate neither
	new_left = allocate();
	if (new_left != NULL) {
		new_right = allocate();
		if (new_right == NULL) {
			release(new_left);
			new_left = NULL;
		}
	}
	__disable_irq();
	if (block_offset >= AUDIO_BLOCK_SAMPLES) {
		// the DMA filled 2 blocks, so grab them and get the
		// 2 new blocks to the DMA, as quickly as possible
		out_left = block_left;
		block_left = new_left;
		out_right = block_right;
		block_right = new_right;
		block_offset = 0;
		__enable_irq();
		// then transmit the DMA's former blocks
		transmit(out_left, 0);
		release(out_left);
		transmit(out_right, 1);
		release(out_right);
		//Serial.print(".");
	} else if (new_left != NULL) {
		// the DMA didn't fill blocks, but we allocated blocks
		if (block_left == NULL) {
			// the DMA doesn't have any blocks to fill, so
			// give it the ones we just allocated
			block_left = new_left;
			block_right = new_right;
			block_offset = 0;
			__enable_irq();
		} else {
			// the DMA already has blocks, doesn't need these
			__enable_irq();
			release(new_left);
			release(new_right);
		}
	} else {
		// The DMA didn't fill blocks, and we could not allocate
		// memory... the system is likely starving for memory!
		// Sadly, there's nothing we can do.
		__enable_irq();
	}
}

It is probably very hard to pinpoint the error from this information, but general suggestions on what more to look into and how to continue troubleshooting would be much appreciated.
 
At the beginning of the update function, I think you should return if you can't allocate both blocks, otherwise you may try to work with NULL pointers later on in the function.
Code:
	if (new_left != NULL) {
		new_right = allocate();
		if (new_right == NULL) {
			release(new_left);
			new_left = NULL;
                        return;
		}
	}

[+edit] Hmmm. Maybe not. Looks like it handles that situation. How many blocks do you allocate with AllocMem in setup() ?

Pete
 
Hi Pete,

Thanks for your reply.

That part of the code is taken without modification from input_i2s.cpp. I tried to add a return as you suggest (even another one for the case that allocation of new_left failed), but it made no difference.

I allocate 50 audio blocks at the top of setup(). When my code is running with the original i2s routine, the peak usage is about half of that.

The main loop occasionally prints out the audio stats and it looks like nothing is being used with my modified library:

Audio processor usage max:0.50
Audio memory usage max:0
Audio processor usage:0.50
Audio memory usage:0

I noticed that if I call the begin() method of my i2s input instance, I get another thousand or so calls to the ISR routine before it stops. I do not think one is supposed to call begin() explicitly as it does not seem to be in any of the audio examples I have looked at.

Another observation is that when I add a Serial.println to the original ISR in input_i2s.cpp, I get no printouts before an early printout in my setup(), while my modified ISR does execute a number of times before the printout from setup(). Maybe I am failing to do something that should prevent the DMA from starting before everything else that needs to be set up really has been set up? But so far I have been unable to see any difference in this regard between the original code and mine.
 
Some more findings:

It seems like the DMA is for some reason running amok and not stopping at the end of the destination buffer. I added the following printout to the ISR to check the destination address of the DM

Code:
Serial.print("isr "); // PM debug
Serial.print(((uint8_t*)dma.TCD->DADDR) - (uint8_t*)&i2s_rx_buffer[0]);  // PM debug
Serial.print(" ");  // PM debug
Serial.println(count++); // PM debug

I.e. it prints out the number of bytes between the DMA destination address and the start of the buffer. If I add this to the original input_i2s.cpp file, it toggles between 0 and 256 (except for some transient case in the beginning where it is at 200), but with my code, it just ramps up and up and up. After some initial irregular behavior (like for the original routine), it settles to a pattern where two values are the same, then it jumps by 1024 etc. Like this:

isr 8192 6
isr 8192 7
isr 9216 8
isr 9216 9
isr 10240 10
isr 10240 11

So something is fishy in my DMA setup.

---

I think I found it. If I change not only from SSIZE(1) to SSIZE(2), but also DSIZE(1) to DSIZE(2), the DMA no longer runs out of control.

Code:
dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2);

I had made the stupid assumption that destination in this case had something to do with i2s as a destination (output to headphones), but it is rather the destination of the DMA write (my buffer) which is of course also 32 bits per write. So now that part works! Feels like a major step forward.

The next issue immediately popped up: I still only get 16-bit values (lower 16 bits are 0) into the buffer... There should be 24 bits toggling. More debugging ahead.
 
The 16-bit values (instead of 24) was due to a mistake I made in the control_WM8731.cpp. I had changed to 32-bit mode in the master enable method, while I am using it in slave mode. The corrected code in the appropriate enable metod is:

Code:
write(WM8731_REG_INTERFACE, 0x0E); // I2S, 32 bit, MCLK slave

Now I can see that the SDR routines are getting proper data again. E.g. a waterfall plot works. However, my headphones are now completely silent. They are driven by the same WM8731. Maybe I need to make changes in output_i2s.cpp now that the codec is in 32-bit mode.
 
Hi PerM, just to give you a heads up, the SNR of the ADC for the WM8731 is spec'd at 90dB. To convert this to bits, divide by 6db. 90 / 6 = 15 bits. (NOTE: 6db = a factor of 2, ie. one-bit).

So the WM8731 will have 15 bits of signal and 1-bit of noise in 16-bit mode. The SGTL5000 is even worse at 85dB SNR, meaning 14-bits of signal and 2-bits of noise.

Normally this is more than adequate for lots of audio applications, as humans can't really distinguish things that are more than about 80 dB apart (they hear the louder signal, not the quieter noise).

If you put the WM8731 in 24-bit mode, you will get no more than 15 bits of signal, and 9 bits of noise. So 24-bits on these mid-grade codecs won't gain you anything. It's provided in the CODECs like WM8731 and SGTL5000 for audio workflow purposes. So, you setup the codec for 24-bit mode, right-shift out the lower 8 bits to a 16-bit value in a 24-bit processing flow to give yourself more headroom.

Here is a great article explaining how 24-bit audio is used in production.
https://www.provideocoalition.com/understanding-24-bit-vs-16-bit-audio-production-distribution/

The super-high quality, studio grade codecs tend to have a max SNR of about 120 dB I think, or about 20-bits of signal, 4-bits of noise. And then you only get that when you've got all your equipment running on expensive power signal conditioners, etc.

In summary, 24-bit mode in mid-grade CODECs is to support 24-bit processing workflows, not to provide valid 24-bit data from the ADC. Those extra bits are all noise unless you go to a higher spec codec. And then, you only achieve those higher SNRs with some seriously good PCB and analog power supply design.
 
Hi Blackaddr,

Thanks for your explanations. I mostly agree with your analysis and I do not expect any drastic improvements in weak signal SNR through my changes in the I2S functions. The one point that may make it worth shifting one or two bits when the signal is weak is that it pushes the quantization noise down a little. The theoretical maximum SNR of a perfect 16-bit ADC is (as you surely know) 96 dB. So if you have 90 dB of SNR using 24-bit data and then throw away the 8 LSBs, there will be a small contribution (about 1 dB) of additional noise.

I do not intend to always do the bit shift. Just when I am receiving weak signals and getting an extra dB of SNR then could actually be of value.

There are no plots in the WM8731 datasheet of the noise density over frequency. One could assume it is flat, but who knows? I use a low IF of 13 kHz in my SDR and am only interested in a small bandwidth around this frequency. If it turns out that there is a lucky coincidence and the noise density is lower than average in this band, the bit shifting would be even more helpful. But I admit this is unlikely.

Also, there seems to be no SNR information for the WM8731 when running it at 44.1 kSPS rather than 48. Many ADCs have better SNR when sampling slower. Not sure it applies to WM8731.

A confusing (for me at least) additional factor is the A-weighting used in the datasheet. I know what it is, but not how and if it affects the value of using more than 16 bits of the ADC data. I think it gives a better SNR banner spec than a flat weighting since the band above 8 kHz is attenuated so any noise there does not count as much. This would speak against any benefit from bit shifting.

In a future board revision, I will probably include a wider range of analog IF gain adjustability.

Per
 
Status
Not open for further replies.
Back
Top