Setting up custom I2S communication

Status
Not open for further replies.

i2sflew

Active member
Hi everyone,

I want to port an existing synthesizer project from STM to the Teensy. Looking at the Audio library it seems to be very powerful, but in general very restricted in itself.
I'd like to set up my own project with a custom I2S sample rate and DMA transfers and maybe I2S interrupts. Doesn't seem like much, but I can't figure out a starting point for that. Which libraries would I need to include to get set up the basic clock configuration for the teensy and then get some defines for the registers concerning I2S and DMA?

Cheers
 
best starting point is indeed the audio library.
understand the code and modify it to your requirements.
for I2S you need to look into input_i2s and output_i2s
 
Yeah, I thought I could work with output_i2s, but that also seems to be very integrated with the other audio library functions. Every interaction with the libarary requires audio blocks and altering the sample rate is also not supported. I also don't need most of the functionality the audio library provides, since I already implemented that myself.
i was hoping to find some header files and maybe helper functions that define the DMA and I2S registers, so that I could modify these myself and some setup code to get the teensy in a working state (core clock setup, interrupt vector setup etc.)
 
output_i2s is not at all integrated with other audio library function, it needs only cores functions, that you need anyhow for teensy, if you do not want to re-invent everything.
Output_i2s is linked to Audiostream.h
To help you with changing sampling rate we need to know which teensy you wanted to use.
 
Okay, I must have missed something skimming through the modules functions.
I am trying to use 48 kHz with a Teensy 4.0 since I am trying to interface with another controller instead of an I2S audio module. My synthesizer functions spit out an int16_t that I used to pick up with DMA and write it to I2S on the STM32. On the Teensy however I have no idea how to interface with the output_i2s module without using audio block types etc.
 
The DMA & I2S code you seek is part of the I2S output object in the audio library, in output_i2s.cpp.

The entire audio library is just a large collection of audio processing objects which pass blocks of 16 bit samples to each other. It's really only all integrated together by a common API for passing blocks in and out (to the other audio objects), and common update() functions which get run automatically by 1 of the input or output objects which has "update responsibility". The common block allocation and communication code is in AudioStream.cpp and AudioStream.h, about 500 lines of code which mostly just manages a pool of memory using linked lists.

If you want chunks of code to demonstrate how to use the I2S and DMA hardware, it's certainly in there. The entire output_i2s.cpp file is only about 500 lines of code. If that seems like too much to digest, you can see from the #ifdef checks about half of the code is for the Teensy 3.x hardware and the other half is for Teensy 4.x. Maybe start with just deleting the half you don't plan to use, so you have a smaller and easier to read file.
 
On the Teensy however I have no idea how to interface with the output_i2s module without using audio block types etc.

Here's a bit more explanation about how the code works. I want to emphasize this is intended to help you (or anyone else who later finds this thread) get started. It's not meant to be step-by-step instructions.

Near the top of output_i2s.cpp you'll see the begin() function. It calls config_i2s(), which turns on the audio clock and sets up the I2S hardware. You can find the config_i2s() near the end of that file. You'll see it's mostly just writing to the many hardware registers to turn on the I2S communication. Those registers are all documented in the reference manual for the chip you're using.

You can see that config_i2s() sets up the clocks and audio sample rate, and that clock config code is very different depending on which Teensy you use. The good news is Teensy 4.0 has highly configurable hardware, so it just takes the actual sample frequency. On the older boards things were much harder, requiring integer divisors from specific clocks.

Once the I2S hardware is turned on, you can see the begin() function sets up a DMA channel which will automatically copy data from the "i2s_tx_buffer" array to "I2S0_TDR0", which is the actual hardware register which causes data to go into the I2S FIFO and ultimately transmit on the data pin. The triggerAtHardwareEvent() function is what causes the DMA channel to actually service the I2S hardware requests for data. You'll also see "dma.attachInterrupt(isr);" which causes the isr() function to run when the DMA controller needs more data copied into the i2s_tx_buffer array.

The "dma" object is a hardware abstraction for the DMA controller, which is defined in DMAChannel.h. In this case, we're really only using DMAChannel to assign which of the DMA hardware channels get used (so we automatically share DMA hardware with other libraries and even other parts of this audio library), but the actual DMA config is done by directly accessing the DMA controller's TCD registers. Those TCD registers are documented in the reference manual.

Right below begin() is that isr() function. Hopefully you can see it's figuring out which half of the buffer the DMA is currently using. Then it just copies a block of audio data into the other half of the i2s_tx_buffer buffer. It's really that simple.

So if you don't want to use AudioStream.h buffer definitions, you would modify that isr() interrupt code to put your data into whichever half of the i2s_tx_buffer the DMA isn't using at that moment. That isr() function gets run by interrupts generated from the DMA controller. It generates those interrupts when it reaches the middle and the end of the buffer, because the DMA channel's CSR register is assigned to "DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR". (unless you're using Teensy LC... it's less capable DMA controller works differently....)

The other key piece of the isr() function is the call to AudioStream::update_all(), if the update_responsibility flag is set. This causes code in AudioStream.cpp to call ALL the update() functions of every audio instance you've created. One of those will be the update() function you can see in this file. If you read that code, you'll see all it does is save a pointer to the block of sample it gets from the rest of the library. That pointer is what the isr() function later uses to actually copy the samples to the i2s_tx_buffer.

How often isr() gets called depends on how often the I2S hardware requests data, which comes from the audio sample rate, and the size of i2s_tx_buffer. The size of i2s_tx_buffer in this code is set to be twice the size of audio blocks from AudioStream.h, so each interrupt will require exactly one of those blocks to fill half of i2s_tx_buffer.

So, if you want to reuse the low-level I2S and DMA code, but discard all the infrastructure and work of the audio library, you'll probably delete that update function and the AudioStream::update_all() call. In their place, maybe you'll have the isr() function cause your audio code to run? Or maybe you'll have that code store data somehow and modify the isr() function to copy it from whatever buffers your synth code uses to the i2s_tx_buffer array.

How you go about all that is up to you. Whatever approach you take, I hope you'll consider sharing your code or at least anything you learn along the way which might help others who wish to use the code this way. While I can write a lengthy message like this with the basic concepts, it's the real little issues that come up along the way that people who try to follow this path in the future can really use to help. So I hope you'll consider sharing that experience on this thread.
 
Last edited:
Thank you very much. Something like this explanation was something I was hoping to find earlier. You'll find a little higher up, that I am using a Teensy 4.0.
So it seems that writing my calculated values to the I2S registers by DMA is not as hard as I was thinking. (I also haven't really worked with OOP coming from c myself). Some more questions:
1. The defined CPU speeds all seem to be aimed at having exact frequencies for the audio modules, however I was looking forward to using the full 600 MHz for processing. Where is this set up and where can the frequency be changed?
2. I guess getting a sampling rate of 48kHZ will not be as easy as setting AUDIO_SAMPLE_RATE_EXACT to 48000f, or is the calculation set up to be able to select the best register values for that?
 
Audio clock and CPU clock are independent (different PLLs) on Teensy 4.0.

Look at config_i2s() and set_audioClock() for details.

Main CPU clock is configured by set_arm_clock() in clockspeed.c in the core lib.

Figure 14-2 on pages 1016-1017 in the reference manual gives an overall map of the clock system. We use the PLL4 path for the I2S (SAI) ports.
 
In order to have another example to do I2S acquisition (I'm not doing I2S output), here is what I extracted from my actual application

Code:
#define FSAMP 96000
#define NSAMP 128

// assume to have two audio channels with 3 TDM samples per channel 
#define NPORT_I2S 2
#define NCHAN_I2S 3
#define NBUF_I2S (NPORT_I2S*NCHAN_I2S*NSAMP)

// assume we want keep all I2S channels
#define NCHAN_ACQ 6
#define NBUF_ACQ (NCHAN_ACQ*NSAMP)

// assume we want to digitally shift data by 8 bits (e.g. to bring 24 MSB to LSB)
#define adc_shift 8

// define how I2S channels are arranged in acquisition buffer
#if NPORT_I2S == 1
  static int i2sIndex[NCHAN_ACQ]={0,1,2,3,4,5};
#elif NPORT_I2S ==2
  static int i2sIndex[NCHAN_ACQ]={0,2,4,1,3,5}; 
#endif

static void acq_isr(void);

DMAMEM __attribute__((aligned(32)))
static uint32_t i2s_rx_buffer[2*NBUF_I2S];
static uint32_t acq_rx_buffer[2*NBUF_ACQ];

#ifndef I2S_DMA_PRIO
  #define I2S_DMA_PRIO 5*16
#endif

#include "DMAChannel.h"
static DMAChannel dma;

#if defined(__IMXRT1062__)

    #define IMXRT_CACHE_ENABLED 2 // 0=disabled, 1=WT, 2= WB

    PROGMEM
    void set_audioClock(int nfact, int32_t nmult, uint32_t ndiv, bool force) // sets PLL4
    {
        if (!force && (CCM_ANALOG_PLL_AUDIO & CCM_ANALOG_PLL_AUDIO_ENABLE)) return;

        CCM_ANALOG_PLL_AUDIO = CCM_ANALOG_PLL_AUDIO_BYPASS | CCM_ANALOG_PLL_AUDIO_ENABLE
                    | CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(2) // 0: 1/4; 1: 1/2; 2: 1/1
                    | CCM_ANALOG_PLL_AUDIO_DIV_SELECT(nfact);

        CCM_ANALOG_PLL_AUDIO_NUM   = nmult & CCM_ANALOG_PLL_AUDIO_NUM_MASK;
        CCM_ANALOG_PLL_AUDIO_DENOM = ndiv & CCM_ANALOG_PLL_AUDIO_DENOM_MASK;
        
        CCM_ANALOG_PLL_AUDIO &= ~CCM_ANALOG_PLL_AUDIO_POWERDOWN;//Switch on PLL
        while (!(CCM_ANALOG_PLL_AUDIO & CCM_ANALOG_PLL_AUDIO_LOCK)) {}; //Wait for pll-lock
        
        const int div_post_pll = 1; // other values: 2,4
        CCM_ANALOG_MISC2 &= ~(CCM_ANALOG_MISC2_DIV_MSB | CCM_ANALOG_MISC2_DIV_LSB);
        if(div_post_pll>1) CCM_ANALOG_MISC2 |= CCM_ANALOG_MISC2_DIV_LSB;
        if(div_post_pll>3) CCM_ANALOG_MISC2 |= CCM_ANALOG_MISC2_DIV_MSB;
        
        CCM_ANALOG_PLL_AUDIO &= ~CCM_ANALOG_PLL_AUDIO_BYPASS;   //Disable Bypass
        Serial.printf("PLL %f\r\n",24.0f*((float)nfact+(float)nmult/(float)ndiv));
    }

    void acq_init(int fsamp)
    {
        CCM_CCGR5 |= CCM_CCGR5_SAI1(CCM_CCGR_ON);

        // if either transmitter or receiver is enabled, do nothing
        if (I2S1_RCSR & I2S_RCSR_RE) return;
        //PLL:
        int fs = fsamp;
        int ovr = 2*(NCHAN_I2S*32);

        // PLL between 27*24 = 648MHz und 54*24=1296MHz
        int n0 = 26; // targeted PLL frequency (n0*24 MHz) n0>=27 && n0<54
        int n1, n2;
        do
        {   n0++;
            n1=0;
            do
            {   n1++; 
                n2 = 1 + (24'000'000 * n0) / (fs * ovr * n1);
            } while ((n2>64) && (n1<=8));
        } while ((n2>64 && n0<54));
        Serial.printf("fs=%d, n1=%d, n2=%d, %d (>=27 && <54) ", fs, n1,n2,n1*n2*(fs/1000)*ovr/24000);

        double C = ((double)fs * ovr * n1 * n2) / (24000000.0f);
        Serial.printf(" C=%f\r\n",C);
        int c0 = C;
        int c2 = 10'000;
        int c1 =  C * c2 - (c0 * c2);
        set_audioClock(c0, c1, c2, true);

        // clear SAI1_CLK register locations
        CCM_CSCMR1 = (CCM_CSCMR1 & ~(CCM_CSCMR1_SAI1_CLK_SEL_MASK))
            | CCM_CSCMR1_SAI1_CLK_SEL(2); // &0x03 // (0,1,2): PLL3PFD0, PLL5, PLL4

        CCM_CS1CDR = (CCM_CS1CDR & ~(CCM_CS1CDR_SAI1_CLK_PRED_MASK | CCM_CS1CDR_SAI1_CLK_PODF_MASK))
            | CCM_CS1CDR_SAI1_CLK_PRED((n1-1)) // &0x07  // <8
            | CCM_CS1CDR_SAI1_CLK_PODF((n2-1)); // &0x3f // <64

        IOMUXC_GPR_GPR1 = (IOMUXC_GPR_GPR1 & ~(IOMUXC_GPR_GPR1_SAI1_MCLK1_SEL_MASK))
                | (IOMUXC_GPR_GPR1_SAI1_MCLK_DIR | IOMUXC_GPR_GPR1_SAI1_MCLK1_SEL(0));  //Select MCLK


        CORE_PIN23_CONFIG = 3;  //1:MCLK // connected to GPIO0 not used
        CORE_PIN21_CONFIG = 3;  //1:RX_BCLK
        CORE_PIN20_CONFIG = 3;  //1:RX_SYNC
  
        I2S1_RMR = 0;
        I2S1_RCR1 = I2S_RCR1_RFW(4);
        I2S1_RCR2 = I2S_RCR2_SYNC(0) | I2S_TCR2_BCP | I2S_RCR2_MSEL(1)
            | I2S_RCR2_BCD | I2S_RCR2_DIV(0);

        I2S1_RCR4 = I2S_RCR4_FRSZ((NCHAN_I2S-1)) | I2S_RCR4_SYWD(0) | I2S_RCR4_MF
            | I2S_RCR4_FSE | I2S_RCR4_FSD;
        I2S1_RCR5 = I2S_RCR5_WNW(31) | I2S_RCR5_W0W(31) | I2S_RCR5_FBT(31);
  
  #if NPORT_I2S == 1
        I2S1_RCR3 = I2S_RCR3_RCE;
        CORE_PIN8_CONFIG = 3;   //RX_DATA0
        IOMUXC_SAI1_RX_DATA0_SELECT_INPUT = 2; // GPIO_B1_00_ALT3, pg 873
  
        dma.TCD->SADDR = &I2S1_RDR0;
        dma.TCD->SOFF = 0;
        dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2);
        dma.TCD->NBYTES_MLOFFNO = 4;
        dma.TCD->SLAST = 0;
        dma.TCD->DADDR = i2s_rx_buffer;
        dma.TCD->DOFF = 4;
        dma.TCD->CITER_ELINKNO = NBUF_I2S;
        dma.TCD->DLASTSGA = -sizeof(i2s_rx_buffer);
        dma.TCD->BITER_ELINKNO = NBUF_I2S;
        dma.TCD->CSR = DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;
  
  #elif NPORT_I2S == 2
        I2S1_RCR3 = I2S_RCR3_RCE_2CH;
        CORE_PIN8_CONFIG = 3;   //RX_DATA0
        CORE_PIN6_CONFIG = 3;   //RX_DATA1
        IOMUXC_SAI1_RX_DATA0_SELECT_INPUT = 2; // GPIO_B1_00_ALT3, pg 873
        IOMUXC_SAI1_RX_DATA1_SELECT_INPUT = 1; // GPIO_B0_10_ALT3, pg 873
  
        dma.TCD->SADDR = &I2S1_RDR0;
        dma.TCD->SOFF = 4;
        dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2);
        dma.TCD->NBYTES_MLOFFYES = DMA_TCD_NBYTES_SMLOE |
                                    DMA_TCD_NBYTES_MLOFFYES_MLOFF(-8) |
                                    DMA_TCD_NBYTES_MLOFFYES_NBYTES(8);
        dma.TCD->SLAST = -8;
        dma.TCD->DADDR = i2s_rx_buffer;
        dma.TCD->DOFF = 4;
        dma.TCD->CITER_ELINKNO = NBUF_I2S;
        dma.TCD->DLASTSGA = -sizeof(i2s_rx_buffer);
        dma.TCD->BITER_ELINKNO = NBUF_I2S;
        dma.TCD->CSR = DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;
  #endif
        dma.triggerAtHardwareEvent(DMAMUX_SOURCE_SAI1_RX);
        dma.enable();

        I2S1_RCSR = I2S_RCSR_RE | I2S_RCSR_BCE | I2S_RCSR_FRDE | I2S_RCSR_FR;
        dma.attachInterrupt(acq_isr, I2S_DMA_PRIO); 
    }

    uint32_t acq_count=0;
    void extract(void *out, void *inp);
    void acq_isr(void)
    {
        uint32_t daddr;
        uint32_t *src;

        dma.clearInterrupt();
        daddr = (uint32_t)(dma.TCD->DADDR);
        acq_count++;

        if (daddr < (uint32_t)i2s_rx_buffer + sizeof(i2s_rx_buffer) / 2) {
            // DMA is receiving to the first half of the buffer
            // need to remove data from the second half
            src = &i2s_rx_buffer[NBUF_I2S];
        } else {
            // DMA is receiving to the second half of the buffer
            // need to remove data from the first half
            src = &i2s_rx_buffer[0];
        }

        #if IMXRT_CACHE_ENABLED >=1
            arm_dcache_delete((void*)src, sizeof(i2s_rx_buffer) / 2);
        #endif

        extract((void *) acq_rx_buffer, (void *) src);
    }

#endif

    void extract(void *out, void *inp)
    {   int32_t *dout = (int32_t *) out;
        int32_t *din  = (int32_t *) inp;
        for(int ii=0; ii < NSAMP; ii++)
        {
          for(int jj=0; jj<NCHAN_ACQ; jj++)
          { int32_t *dst=&dout[ii*NCHAN_ACQ];
            int32_t *src=&din[ii*NPORT_I2S*NCHAN_I2S];
            dst[jj]=src[i2sIndex[jj]]>>adc_shift;
          }
        }
    }

void setup() {
  // put your setup code here, to run once:
  acq_init(FSAMP);
}

void loop() {
  // put your main code here, to run repeatedly:
  static uint32_t to=0;
  if(millis()-to>1000)
  {
    Serial.println(acq_count);
    acq_count=0;
    to=millis();
  }
}
Obviously , the program does nothing, it only prints out the number of acquired buffers per second, which is 750 for 96 kHz sampling frequency.
 
Last edited:
Thank you both for your support!
i tried to implement the parts I needed:
Code:
void initI2S(void)
{
  dma.begin(true); // Allocate the DMA channel first
  configI2S();


  CORE_PIN7_CONFIG  = 3;  //1:TX_DATA0
  dma.TCD->SADDR = i2s_tx_buffer;
  dma.TCD->SOFF = 2;
  dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(1) | DMA_TCD_ATTR_DSIZE(1);
  dma.TCD->NBYTES_MLNO = 2;
  dma.TCD->SLAST = -sizeof(i2s_tx_buffer);
  dma.TCD->DOFF = 0;
  dma.TCD->CITER_ELINKNO = sizeof(i2s_tx_buffer) / 2;
  dma.TCD->DLASTSGA = 0;
  dma.TCD->BITER_ELINKNO = sizeof(i2s_tx_buffer) / 2;
  dma.TCD->CSR = DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;
  dma.TCD->DADDR = (void *)((uint32_t)&I2S1_TDR0 + 2);
  dma.triggerAtHardwareEvent(DMAMUX_SOURCE_SAI1_TX);
  dma.enable();

  I2S1_RCSR |= I2S_RCSR_RE | I2S_RCSR_BCE;
  I2S1_TCSR = I2S_TCSR_TE | I2S_TCSR_BCE | I2S_TCSR_FRDE;
  dma.attachInterrupt(isr);
}

void isr(void)
{
  uint32_t saddr;
  saddr = (uint32_t)(dma.TCD->SADDR);
  dma.clearInterrupt();
  if (saddr < (uint32_t)i2s_tx_buffer + sizeof(i2s_tx_buffer) / 2) {
    // DMA is transmitting the first half of the buffer
    // so we must fill the second half
    *buf_ptr = i2s_tx_buffer[1];
      } else {
    // DMA is transmitting the second half of the buffer
    // so we must fill the first half
    *buf_ptr = i2s_tx_buffer[0];
  }
  val += 10;
  if(val > 0)
  {
    *buf_ptr = 32767;
  }else
  {
    *buf_ptr = 0;
  }
}

void configI2S(void)
{
  CCM_CCGR5 |= CCM_CCGR5_SAI1(CCM_CCGR_ON);

  // if either transmitter or receiver is enabled, do nothing
  if (I2S1_TCSR & I2S_TCSR_TE) return;
  if (I2S1_RCSR & I2S_RCSR_RE) return;
//PLL:
  int fs = 48000;
  // PLL between 27*24 = 648MHz und 54*24=1296MHz
  int n1 = 4; //SAI prescaler 4 => (n1*n2) = multiple of 4
  int n2 = 1 + (24000000 * 27) / (fs * 256 * n1);

  double C = ((double)fs * 256 * n1 * n2) / 24000000;
  int c0 = C;
  int c2 = 10000;
  int c1 = C * c2 - (c0 * c2);
  set_audioClock(c0, c1, c2);

  // clear SAI1_CLK register locations
  CCM_CSCMR1 = (CCM_CSCMR1 & ~(CCM_CSCMR1_SAI1_CLK_SEL_MASK))
       | CCM_CSCMR1_SAI1_CLK_SEL(2); // &0x03 // (0,1,2): PLL3PFD0, PLL5, PLL4
  CCM_CS1CDR = (CCM_CS1CDR & ~(CCM_CS1CDR_SAI1_CLK_PRED_MASK | CCM_CS1CDR_SAI1_CLK_PODF_MASK))
       | CCM_CS1CDR_SAI1_CLK_PRED(n1-1) // &0x07
       | CCM_CS1CDR_SAI1_CLK_PODF(n2-1); // &0x3f

  // Select MCLK
  IOMUXC_GPR_GPR1 = (IOMUXC_GPR_GPR1
    & ~(IOMUXC_GPR_GPR1_SAI1_MCLK1_SEL_MASK))
    | (IOMUXC_GPR_GPR1_SAI1_MCLK_DIR | IOMUXC_GPR_GPR1_SAI1_MCLK1_SEL(0));

  CORE_PIN23_CONFIG = 3;  //1:MCLK
  CORE_PIN21_CONFIG = 3;  //1:RX_BCLK
  CORE_PIN20_CONFIG = 3;  //1:RX_SYNC

  int rsync = 0;
  int tsync = 1;

  I2S1_TMR = 0;
  //I2S1_TCSR = (1<<25); //Reset
  I2S1_TCR1 = I2S_TCR1_RFW(1);
  I2S1_TCR2 = I2S_TCR2_SYNC(tsync) | I2S_TCR2_BCP // sync=0; tx is async;
        | (I2S_TCR2_BCD | I2S_TCR2_DIV((1)) | I2S_TCR2_MSEL(1));
  I2S1_TCR3 = I2S_TCR3_TCE;
  I2S1_TCR4 = I2S_TCR4_FRSZ((2-1)) | I2S_TCR4_SYWD((32-1)) | I2S_TCR4_MF
        | I2S_TCR4_FSD | I2S_TCR4_FSE | I2S_TCR4_FSP;
  I2S1_TCR5 = I2S_TCR5_WNW((32-1)) | I2S_TCR5_W0W((32-1)) | I2S_TCR5_FBT((32-1));

  I2S1_RMR = 0;
  //I2S1_RCSR = (1<<25); //Reset
  I2S1_RCR1 = I2S_RCR1_RFW(1);
  I2S1_RCR2 = I2S_RCR2_SYNC(rsync) | I2S_RCR2_BCP  // sync=0; rx is async;
        | (I2S_RCR2_BCD | I2S_RCR2_DIV((1)) | I2S_RCR2_MSEL(1));
  I2S1_RCR3 = I2S_RCR3_RCE;
  I2S1_RCR4 = I2S_RCR4_FRSZ((2-1)) | I2S_RCR4_SYWD((32-1)) | I2S_RCR4_MF
        | I2S_RCR4_FSE | I2S_RCR4_FSP | I2S_RCR4_FSD;
  I2S1_RCR5 = I2S_RCR5_WNW((32-1)) | I2S_RCR5_W0W((32-1)) | I2S_RCR5_FBT((32-1));
}

The hardware seems to be configured. I can measure 48kHz on the LRCK. On BCK I measure something around 3.07 MHz which is double what I configured on my STM32 since I only transmit 16 Bit per sample. I'm not sure how to change that.
Further I tried to output some sort of rect by incrementing the val variable with overflow and setting a value when val positive. I can see outputting it on Serial, that the variable is counting up properly, but putting an I2S module with a speaker to it just gives me noise (not white noise). This same module works on the STM, I checked after trying it on the Teensy. Some configuration still seems wrong.
 
Last edited:
Maybe try running some of the audio library examples, just to confirm whether your hardware works with Teensy when using known-good code?
 
I just tried the simpleDrum example, which worked fine. Really loud and clear sounds!
I think it might have to do with the DMA transfers to the I2S DR. I put a digitalWrite into the isr and could read a 48kHz frequency on that pin, so the isr triggers correctly as well.
When I put I2S1_TDR0 = 0; anywhere into the isr I get high pitched ringing, almost inaudible to me because of the high frequency.
 
Better not to interfere with DMA that is running in parallel,

Yeah, this was just to confirm, that a write to the reg would result in any sound at all. That's why I think it has to do with the DMA transfers.
I am wondering if my i2s_tx_buffer setup is correct. I declared it as an int16 array, because that's my samplewidth. Since I am generating only one sample just in time with every interrupt I set it up as an array with 2 elements, to get the double buffering to work. Might this be what's wrong, or am I misunderstanding the workings of the I2S module here? When I set the size to something like 4 or 2048 I get different sounds, probably because there are more transfers happening.
 
I put a digitalWrite into the isr and could read a 48kHz frequency on that pin, so the isr triggers correctly as well.

Normally the isr function runs every 128 audio samples, which is about 2.9 ms at 44.1 kHz sample rate. The DMA channel is supposed to generate an interrupt when the transfer reaches the middle and the end of the buffer. Unless you've reduced the transmit buffer size to just 2 samples, which would make DMA pointless, the isr isn't supposed to run at the audio sample rate.

At least you have the known-good code for comparison. Maybe put a similar digitalWrite or digitalToggle into each, so you can see whether they are running differently.
 
Since I am generating only one sample just in time with every interrupt I set it up as an array with 2 elements

If you're interrupting for every audio ample, using DMA is pointless! It only adds a lot of overhead for no real benefit.

Here's a thread where I wrote a non-DMA example which just has the SAI trigger its own interrupt.

https://forum.pjrc.com/threads/62819

These 3 lines from msg #9 on that thread are the secret sauce to make the SAI peripheral generate its own interrupt for each sample...

Code:
  attachInterruptVector(IRQ_SAI1, isr);
  NVIC_ENABLE_IRQ(IRQ_SAI1);
  I2S1_TCSR |= 1<<8;  // start generating TX FIFO interrupts

If you're really going to interrupt every 21us, keep in mind everything else in your project needs to be carefully designed not to disable or delay the SAI interrupt for 20us. Don't forget the USB interrupt runs for several microseconds has USB transfers complete. You might need to configure interrupt priority levels, so your SAI interrupt can't be interrupted by USB and others.
 
If you're interrupting for every audio ample, using DMA is pointless! It only adds a lot of overhead for no real benefit.

I was only intending to use DMA to write the same values for my second channel, since I only really need mono. That way I have twice the time to generate another sample. Generating more samples is not really feasible, since I already ran into timing issues with the 180 MHz clock on the STM.

Just for me to understand correctly: Why does the audio library generate so many samples at once, if you can only transmit two channels with maybe 32 bit per channel?
 
I am very interested in this, and would like to dive a bit deeper if you don't mind, since I would like to optimize my synthesizer program as well.
I looked into the synthesizing function for the sine wave and am trying to understand the workings:
Code:
index = ph >> 24;
				val1 = AudioWaveformSine[index];
				val2 = AudioWaveformSine[index+1];
				scale = (ph >> 8) & 0xFFFF;
				val2 *= scale;
				val1 *= 0x10000 - scale;
				block->data[i] = multiply_32x32_rshift32(val1 + val2, magnitude);
				ph += inc;

So you're doing a DDS style sinewave generation, only taking the 8 MSB of the phase acc. Then I'm not really sure whats happening. It looks like you're taking the 16 middle bits and do some sort of interpolation?
After this I am completely lost, as I have to admit I don't really get DSP functions.
Still all these operations are done in a loop and for every sample over again. How is this quicker, than just doing the same operations, when they are needed?
 
Indeed, that code is using 8 bits for a table lookup and the next 16 bits for linear interpolation between the nearest 2 table entries. Then the sum is scaled to the desired amplitude.

In terms of efficiency, the really large savings comes when 2, 4 or 8 samples are done per loop and audio data is moved 2 samples at a time. Maybe someday this code will get optimized to compute 2 per loop and store them both using a single packed 32 bit write.

But even doing 1 sample in a loop of 128 is much more efficient than 1 sample per interrupt. The compiler has many optimizations where initialization and loading of registers are pushed outside of the loop, so all that overhead is suffered only once for 128 samples. In this case, the phase accumulator and increment will be retained in registers, so the slow reads from memory are done only once, and the slow write to store the accumulator is only needed after all 128 samples are computed. Implicit variables, like the address of the lookup table and address of the data buffer are also held in registers, so you only suffer the overhead of setting those up once for all 128 samples.

When non-TCM memory is used, the caches help much more for the other 127 iterations. M7's branch prediction also comes into play after the first few times through the loop.
 
Thanks, for the info, I didn't know any of that.
Maybe I should just switch over to the Teensy Audio library after all for my synthesizer, since my own code reaches its limit with 6 voices with 2 oscillators each and some amplitude and LFO calculations. With this library there is at least enough headroom for more voices and some effects.
 
To get a really quick estimate, maybe try running the chiptune example. I believe it's in Audio > Synthesis > PlaySynthMusic. It uses 16 oscillators and 16 envelope effects, and it prints CPU usage info while playing.
 
Status
Not open for further replies.
Back
Top