DMA multiplexing, teensy 3.6

Status
Not open for further replies.

ozeng

Member
I want to sample a 4-channel analog-to-digital converter at 20 kHz over SPI (I'm sampling at 32-bit resolution, so I can't use the onboard ADC). There is a single chip select but the channels are muxed by GPIOs. I would like to sample the ADC using DMA; is there a way to synchronize the DMA timer to toggle the GPIO pins to switch every 32 SPI clocks? In addition, can I set the SPI word size to 32? If this is possible, how complicated is the implementation? We are considering sampling the SPI manually. I have DMA working with 8-bit spi words currently using https://github.com/crteensy/DmaSpi.
 
Last edited:
I want to sample a 4-channel analog-to-digital converter at 20 kHz over SPI (I'm sampling at 32-bit resolution, so I can't use the onboard ADC).
What is the converter?

There is a single chip select but the channels are muxed by GPIOs.
How? 4 pins? 2 pins?

I would like to sample the ADC using DMA; is there a way to synchronize the DMA timer to toggle the GPIO pins to switch every 32 SPI clocks?

Maybe. What's the exact timing? Do you have restrictions on pin usage? Are the FTM timers used? Can you use SPI0?

In addition, can I set the SPI word size to 32?
Not exactly. Why? Byte reordering is possible with the DMA controller.

What is the memory layout of the transferred data supposed to be?
 
ADC: LTC2500 http://cds.linear.com/docs/en/datasheet/250032f.pdf
mux: http://www.analog.com/media/en/technical-documentation/data-sheets/ADG608_609.pdf

I think the mux uses two gpio pins to represent a two-bit integer.

I don't have many restrictions on pin usage; many GPIOs are open, and we have access to SPI0. We are sampling at 5 kHz where we have 4 channels per sample and each channel is 32 bits. So, the GPIO should be changing at a rate of 20kHz. I believe we're not using the ftm timers (I'm not sure what they are). I'm guessing it would be a good idea for the gpio and dma/spi timings to come from a single source? Otherwise over time the gpio and spi might not be synchronized.

The spi word size is relevant because I think the chip select is cycled every spi word but we want to have chip select down for each 32-bit sample. If we are using GPIO to mux, however, we might also attach chip select to a gpio.

Memory layout would ideally be all four channels of a sample lie contiguous to each other in memory and we just have a buffer of samples one after the other. A circular buffer of samples would be nice to have since we would like continuous streaming of adc reads.
 
I would use use 2 FTM timers. FTM0, running at 5khz is responsible for the GPIO switching. Two 50% PWM signals offset by a 90 degree phase difference generate a 2-bit gray code (00 10 11 01) that you can use to switch the MUX. The FTM timers allow you to pair channels, allowing full control over start / end of the PWM signal. The first channel of the pair will switch the output pin high, the second channel will switch it low (e.g. channel pair 1 runs from 0 - 50%, channel pair 2 runs from 25% - 75%). Look at the K66 manual, '45.5.8 Combine mode'.

Use a second FTM module (e.g. FTM1) to generate the 20kHz ADC conversion start pulse. You can kick off both FTM modules at the same time using the global time base feature (manual, '45.5.28 Global time base'). If you have the second FTM module counting at an exact multiple of the first one, they will stay in sync.

You can have the ADC busy signal falling edge (conversion complete) trigger an interrupt or DMA transfer. The GPIO pins can be configured for either. However, there is only a single DMA trigger source for each GPIO port. Another option is to simply use fixed timing and configure a second channel from FTM1 for Output Compare mode and have it trigger either an interrupt or DMA.

If you are using an interrupt and SPI0 (which has 4-word RX and TX queues), you can simply set up the SPI transfer for 16-bit, write two dummy doublewords to PUSHR (manual, '57.4.7 PUSH TX FIFO Register In Master Mode') with the right upper half ('Continuous Peripheral Chip Select Enable' in the first; 'End Of Queue' in the second and PCS in both). The ADC seems to be plenty fast, so if you are running at 30MHz SPI speed you could simply wait the 1us it takes to transfer 32 bits.

Or alternately set an interrupt for the 'End Of Queue' flag (manual, 57.4.6 'DMA/Interrupt Request Select and Enable Register (SPIx_RSER)'). You can then pick up 2 words (2 * 16bit) from POPR (57.4.9 POP RX FIFO Register (SPIx_POPR)). If you are using interrupts, just rely on the FIFO and don't use DMA. 40'000 interrupts/s (or 20'000) is not a lot for Teensy 3.6...

You can also do everything with DMA. However, that's a lot more difficult to debug and more difficult to recover from errors.
 
Thanks so much! You're a real hero.

I'll look into the docs you sent. Fingers crossed that I can figure out how these modules work.
 
The DMA transfer isn't too bad actually. The tricky part is that the SPI module doesn't want to run continuously without intervention. So the first DMA transfer clears out the SPI status flags.

This code transfers data from SPI0 into a ring buffer, triggered by pin 32 (connected to 31 for testing). SPI0 pin 11 / 12 are connected as loopback for testing.

This relies on the 4-word SPI FIFOs. It wouldn't work on SPI1/SPI2.

Code:
#include <DMAChannel.h>
#include <array>
#include <SPI.h>

const uint8_t spi_cs_pin = 15;   // pin 15 SPI0 chip select
const uint8_t trigger_pin = 32;  // PTB11

// CTAS(1) is configured for 16-bit SPI word transfer
std::array<uint32_t, 2> spi_tx_src;

const size_t buffer_size = 8;
std::array<volatile uint32_t, buffer_size> spi_rx_dest;

DMAChannel dma_start_spi;
DMAChannel dma_rx;
DMAChannel dma_tx;


uint32_t dummy;
uint32_t start_spi_sr = 0xFF0F0000;

auto& serial = Serial;
auto& spi = SPI;
SPISettings spi_settings(10, MSBFIRST, SPI_MODE0);

void setup() {
    serial.begin(115200);
    delay(2000);
    serial.println("Starting.");

    spi.begin();

    pinMode(spi_cs_pin, OUTPUT);
    digitalWriteFast(spi_cs_pin, HIGH);
    auto kinetis_spi_cs = spi.setCS(spi_cs_pin);
    spi_tx_src = { 
        SPI_PUSHR_PCS(kinetis_spi_cs) | SPI_PUSHR_CONT | SPI_PUSHR_CTAS(1) | 0x4242,
        SPI_PUSHR_PCS(kinetis_spi_cs) | SPI_PUSHR_EOQ | SPI_PUSHR_CTAS(1) | 0x4343u,
    };
    
    spi.beginTransaction(spi_settings);

    // for testing, pin 31 connected to 32, send t in serial monitor to initiate SPI transfer
    const uint8_t trigger_pin_out = 31;
    pinMode(trigger_pin_out, OUTPUT);
    digitalWriteFast(trigger_pin_out, LOW);

    pinMode(trigger_pin, INPUT_PULLUP);
    volatile uint32_t *pin_config = portConfigRegister(trigger_pin);
    *pin_config |= PORT_PCR_IRQC(0b0010); // DMA on falling edge

    dma_start_spi.sourceBuffer(&start_spi_sr, sizeof(start_spi_sr));
    dma_start_spi.destination(KINETISK_SPI0.SR);
    // triggered by pin 32, port B
    dma_start_spi.triggerAtHardwareEvent(DMAMUX_SOURCE_PORTB);
    
    dma_tx.TCD->SADDR = spi_tx_src.data();
    dma_tx.TCD->ATTR_SRC = 2; // 32-bit read from source
    dma_tx.TCD->SOFF = 4;
    // transfer both 32-bit entries in one minor loop
    dma_tx.TCD->NBYTES = 8;
    dma_tx.TCD->SLAST = -sizeof(spi_tx_src); // go back to beginning of buffer
    dma_tx.TCD->DADDR = &KINETISK_SPI0.PUSHR;
    dma_tx.TCD->DOFF = 0;
    dma_tx.TCD->ATTR_DST = 2; // 32-bit write to dest
    // one major loop iteration
    dma_tx.TCD->BITER = 1;
    dma_tx.TCD->CITER = 1;
    dma_tx.triggerAtCompletionOf(dma_start_spi);

    dma_rx.source((uint16_t&) KINETISK_SPI0.POPR);
    dma_rx.destinationBuffer((uint16_t*) spi_rx_dest.data(), sizeof(spi_rx_dest));
    dma_rx.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI0_RX);

    SPI0_RSER = SPI_RSER_RFDF_RE | SPI_RSER_RFDF_DIRS; // DMA on receive FIFO drain flag
    SPI0_SR = 0xFF0F0000;

    dma_rx.enable();
    dma_tx.enable();
    dma_start_spi.enable();

    uint32_t dma_rx_pos = uint32_t(dma_rx.sourceAddress());
    while(true) {
        if(serial.available()) {
            char c = serial.read();
            if(c == 't') {
                digitalWriteFast(trigger_pin_out, HIGH);
                delay(1);
                digitalWriteFast(trigger_pin_out, LOW);
            }
        }
        if(uint32_t(dma_rx.destinationAddress()) != dma_rx_pos) {
            dma_rx_pos = (uint32_t) dma_rx.destinationAddress();
            if(dma_rx_pos % 4 == 0) { // only print finished transfer
                serial.printf("rx buf: %x   dma ptr: %x   delta: %u\n",
                    uint32_t(spi_rx_dest.data()), dma_rx_pos, 
                    dma_rx_pos - uint32_t(spi_rx_dest.data()));
                for(size_t i = 0; i < spi_rx_dest.size(); i++) serial.printf("%8x ", spi_rx_dest[i]);
                serial.println();
            }
        }
    }
}

void loop() {}
 
Last edited:
Wow, that's awesome! This stuff is legendary.

One question:
This code is interrupt-dependent, correct (at least, it failed when I did noInterrupts())? If we have a competing interrupt, could that cause us to drop a byte? E.g. if SPI0_POPR is full and the interrupt doesn't fire immediately, would we drop bits? An offset error would be pretty bad for us. Or, maybe if multiple interrupts are queued, it would try to read SPI0_POPR twice in a row, where the second read would hang because the register only carries 32 bits? Essentially, how careful do we have to be about the offsets and interrupts?
 
This code is interrupt-dependent, correct (at least, it failed when I did noInterrupts())?
No, it's not. What does it mean it failed? The SPI transfer to the buffer will still occur, but various other things won't work (like USB Serial I/O). So you can't press 't' to trigger a transfer. A push-button wired to pin 32 will trigger the transfer with interrupts disabled (I have tried it).

The code doesn't use interrupts at all. The DMA chaining is there to avoid having to use them. The DMA controller runs independently, interrupts being disabled doesn't matter.
 
Each time @tni posts some code, I feel that I'm a nobody, seen his deep knowledge, understanding and coding faculties. That guy is just brilliant!

Indeed - each posted piece is a lesson ( or three ). I coded in the dark ages where that wasn't available - or was 'C' only - unless ASM was called for. ... and then there is understanding of the hardware detail . . .
 
So I cooked up something based on this code: https://forum.pjrc.com/threads/24992-phase-correct-PWM?styleid=2
I think it does what you described earlier, except FTM0 uses four channels. I didn't know how to get paired channels (e.g. FTM0_CHAN2 and FTM0_CHAN3) to be phase-offset. For me when one is high the other is always low. But, FTM0_CHAN0 and FTM0_CHAN2 are phase-offset by using the FTMx_CnV.

A few things:
1. When I comment out FTM0_CONF |= FTM_CONF_GTBEOUT;, I don't get any pulses. I believe that is working as intended, as this lets ftm0 have the source counter?
2. The rising on the 20khz wave does not exactly match the edge on the 5khz wave. Why does this happen (also, am I doing it right)? Also, how would I configure the offsets? I think the 20khz should probably go high slightly after the 5khz wave updates.
3. The mod on FTM0 and FTM1 are 11999 and 2999, respectively. First, why is it like this if 5khz evenly divides into 180Mhz, the teensy clock? Second, will this cause the phase difference to change over time if the mods are not multiples of each other? My understanding is the timers share a counter and they independently take the modulo of the counter.
4. The first ftm wave is not at 5khz, it's slightly slower. Why does that happen?

Code:
Code:
void init_FTM0();

void setup()   {
  Serial.begin(115200);
  delay(2000);
  Serial.println("Starting");
  init_FTM0();
}

void loop()
{
  //noInterrupts();
}

void init_FTM0(){

 FTM0_SC = 0;
 FTM1_SC = 0;

 analogWriteFrequency(22, 5000); // FTM0 channel 0
 analogWriteFrequency(3, 20000); // FTM1 channel 0
 FTM0_POL = 0;                  // Positive Polarity
 FTM0_OUTMASK = 0xFF;           // Use mask to disable outputs
 FTM0_CNTIN = 0;                // Counter initial value
 FTM0_COMBINE = 0x00003333;     // COMBINE=1, COMP=1, DTEN=1, SYNCEN=1
 FTM0_MODE = 0x01;              // Enable FTM0
 FTM0_SYNC = 0x02;              // PWM sync @ max loading point enable
 uint32_t mod = FTM0_MOD;
 uint32_t mod2 = FTM1_MOD;
 Serial.printf("mod %d, mod2 %d\n", mod, mod2);
 Serial.printf("clock source %d, 2 %d\n", FTM0_SC, FTM1_SC);
 FTM0_C0V = mod/4;                  // Combine mode, pulse-width controlled by...
 FTM0_C1V = mod * 3/4;           //   odd channel.
 FTM0_C2V = 0;                  // Combine mode, pulse-width controlled by...
 FTM0_C3V = mod/2;           //   odd channel.
 FTM0_SYNC |= 0x80;             // set PWM value update
 FTM0_C0SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_C1SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_C2SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_C3SC = 0x28;              // PWM output, edge aligned, positive signal
 
 FTM0_CONF = ((FTM0_CONF | FTM_CONF_GTBEEN) & ~(FTM_CONF_GTBEOUT));             // GTBEOUT 0 and GTBEEN 1
 FTM1_CONF = ((FTM1_CONF | FTM_CONF_GTBEEN) & ~(FTM_CONF_GTBEOUT));             // GTBEOUT 0 and GTBEEN 1
 //FTM1_CONF |= FTM_CONF_GTBEOUT;             // GTBEOUT 1
 FTM0_CONF |= FTM_CONF_GTBEOUT;             // GTBEOUT 1
 FTM0_CNT = 0;
 //FTM1_CNT = 0;

 CORE_PIN22_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;    //config teensy output port pins
 CORE_PIN23_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;   //config teensy output port pins
 CORE_PIN9_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;    //config teensy output port pins
 CORE_PIN10_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;   //config teensy output port pins


 FTM0_OUTMASK = 0x0;            // Turns on PWM output
 analogWrite(3, 128);
}

Logic analyzer output: https://imagebin.ca/v/3OBpp21fACvR
Zoomed in version: https://imagebin.ca/v/3OBpiTWAoU5x

Channel 0 in logic is pin 9, FTM0_CHAN2 (5khz)
Channel 1 in logic is pin 10, FTM0_CHAN3 (5khz)
Channel 2 in logic is pin 22, FTM0_CHAN0 (5khz)
Channel 3 in logic is pin 23, FTM0_CHAN1 (5khz)
Channel 5 is pin 3, FTM1_CHAN0 (20khz)
 
Also, I realize now that this setup is dependent on the adc ready pin down to go down consistently. Is there a way to check whether the dma missed a sample, as dropping a sample causes all our axes to shift?
 
I didn't know how to get paired channels (e.g. FTM0_CHAN2 and FTM0_CHAN3) to be phase-offset. For me when one is high the other is always low.
That's how it works. You need two match values, one for start, one for stop. Each timer channel provides one.

A few things:
1. When I comment out FTM0_CONF |= FTM_CONF_GTBEOUT;, I don't get any pulses. I believe that is working as intended, as this lets ftm0 have the source counter?
2. The rising on the 20khz wave does not exactly match the edge on the 5khz wave. Why does this happen (also, am I doing it right)? Also, how would I configure the offsets? I think the 20khz should probably go high slightly after the 5khz wave updates.
You didn't follow the procedure in the manual ("45.5.28.1 Enabling the global time base (GTB)") exactly. The timers must be disabled, then you do the config, then you kick them off via 'FTM_CONF_GTBEOUT'.

3. The mod on FTM0 and FTM1 are 11999 and 2999, respectively. First, why is it like this if 5khz evenly divides into 180Mhz, the teensy clock?

The timer have MOD + 1 count states, they count from 0 to MOD. They count F_BUS cycles (60MHz), not CPU cycles.

Second, will this cause the phase difference to change over time if the mods are not multiples of each other?
They are multiples and they must be. You are not counting the states correctly. :)

My understanding is the timers share a counter and they independently take the modulo of the counter.
No. Each has their own counter and they count independently. Each timer module (e.g. FTM0 or FTM1) has an own MOD and counter. If you kicked them off properly, they count the same bus cycles and stay in sync. The GTB kick-off is a one-time thing, you can't mess with the counter afterwards.

4. The first ftm wave is not at 5khz, it's slightly slower. Why does that happen?
Is it? The MOD sounds correct 60'000'000 / 12'000 = 5000.

Also, I realize now that this setup is dependent on the adc ready pin down to go down consistently. Is there a way to check whether the dma missed a sample, as dropping a sample causes all our axes to shift?

As long as things are nice even multiples (e.g. running at 180MHz, the CPU clock is 3x the bus clock and they do stay in sync), you can just look at micros (which is based on a CPU cycle counter) and you can calculate the expected sample number from there. If you didn't get triggered, it will be less. Of course, you have to keep in mind that things run in parallel and the numbers change while you look at them...
 
Update: it works?! Thanks again for the advice.

Logic: https://imagebin.ca/v/3OPzRUBDVfnq
Channel 0 is pin 9, which is the first 5khz pin to go high.
Channel 7 is pin 13, which goes high right after I kick off the ftm timers.

Now I'll add a small offset so that the rising edge of the 20khz wave is squarely within one of the combinations (00, 01, 11, 10). My only question is why does the 20khz wave start three periods before the other channels? It looks like every channel is missing its first period.

Code:
Code:
void init_FTM0();

void setup()   {
  Serial.begin(115200);
  delay(2000);
  Serial.println("Starting");
  pinMode(13, OUTPUT);
  init_FTM0();
}

void loop()
{
  delay(5000);
  uint32_t a = FTM0_CNT;
  uint32_t b = FTM1_CNT;
  uint32_t c = FTM0_CNT;
  Serial.printf("FTM0 %d, FTM1 %d, FTM0 after %d\n", a, b, c);
}

void init_FTM0(){

 FTM0_SC = 0;
 FTM1_SC = 0;

 analogWriteFrequency(22, 5000); // FTM0 channel 0
 analogWriteFrequency(3, 20000); // FTM1 channel 0
 FTM0_POL = 0;                  // Positive Polarity
 FTM0_OUTMASK = 0xFF;           // Use mask to disable outputs
 FTM0_CNTIN = 0;                // Counter initial value
 FTM0_COMBINE = 0x00003333;     // COMBINE=1, COMP=1, DTEN=1, SYNCEN=1
 FTM0_MODE = 0x01;              // Enable FTM0
 FTM0_SYNC = 0x02;              // PWM sync @ max loading point enable
 uint32_t mod = FTM0_MOD;
 uint32_t mod2 = FTM1_MOD;
 Serial.printf("mod %d, mod2 %d\n", mod, mod2);
 Serial.printf("clock source %d, 2 %d\n", FTM0_SC, FTM1_SC);
 FTM0_C0V = mod/4;                  // Combine mode, pulse-width controlled by...
 FTM0_C1V = mod * 3/4;           //   odd channel.
 FTM0_C2V = 0;                  // Combine mode, pulse-width controlled by...
 FTM0_C3V = mod/2;           //   odd channel.
 FTM0_SYNC |= 0x80;             // set PWM value update
 FTM0_C0SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_C1SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_C2SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_C3SC = 0x28;              // PWM output, edge aligned, positive signal
 FTM0_OUTMASK = 0x0;            // Turns on PWM output
 analogWrite(3, 128);           // FTM1 50% duty cycle
 
 FTM0_CONF = ((FTM0_CONF | FTM_CONF_GTBEEN) & ~(FTM_CONF_GTBEOUT));             // GTBEOUT 0 and GTBEEN 1
 FTM1_CONF = ((FTM1_CONF | FTM_CONF_GTBEEN) & ~(FTM_CONF_GTBEOUT));             // GTBEOUT 0 and GTBEEN 1
 //FTM1_CONF |= FTM_CONF_GTBEOUT;             // GTBEOUT 1

 CORE_PIN22_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;    //config teensy output port pins
 CORE_PIN23_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;   //config teensy output port pins
 CORE_PIN9_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;    //config teensy output port pins
 CORE_PIN10_CONFIG = PORT_PCR_MUX(4) | PORT_PCR_DSE | PORT_PCR_SRE;   //config teensy output port pins
 
 FTM0_CNT = 0;
 FTM1_CNT = 0;

 FTM0_CONF |= FTM_CONF_GTBEOUT;             // GTBEOUT 1
 digitalWriteFast(13, HIGH);

}

Serial output: the ftm counters match!
Starting
mod 11999, mod2 2999
clock source 136, 2 136
FTM0 45, FTM1 47, FTM0 after 49
FTM0 1450, FTM1 1452, FTM0 after 1454
FTM0 3206, FTM1 208, FTM0 after 3210
FTM0 4871, FTM1 1873, FTM0 after 4875
FTM0 6628, FTM1 630, FTM0 after 6632
 
Status
Not open for further replies.
Back
Top