DMA Chaining using Teensy 4.1

andretolba

New member
Hello,
I'm trying to scan a pressure sensor matrix of 40 * 40 elements using Teensy 4.1.
I would like to DMA Chaining the whole software.
My circuit consists of Multiplexers for selecting columns where I read one ADC channel from it.
And Shift registers to select rows.

Here is my attempt, that I didn't test, but the question is it possible ?


C++:
#include <ADC.h>
#include <DMAChannel.h>
#include <AnalogBufferDMA.h>

#define ROW_COUNT 40
#define COLUMN_COUNT 40
#define TOTAL_READINGS (ROW_COUNT * COLUMN_COUNT)
#define SCAN_FREQUENCY 160000  // 160kHz ADC sampling

// Hardware addresses for direct DMA access
#define GPIO1_DR_ADDR 0x401B8000            // GPIO1 Data Register
#define GPIO2_DR_ADDR 0x401BC000            // GPIO2 Data Register
#define PIT_LDVAL0_ADDR 0x40084100          // PIT Timer 0 Load Value
#define PIT_TCTRL0_ADDR 0x40084108          // PIT Timer 0 Control

// DMAMUX source definitions for hardware triggering
#define DMAMUX_SOURCE_PIT0 60    // PIT Timer 0 trigger
#define DMAMUX_SOURCE_ALWAYS 62  // Always on trigger

#define MUX_GPIO_PORT GPIO1_DR_ADDR    // Mux control pins
#define SHIFT_GPIO_PORT GPIO2_DR_ADDR  // Shift register pins

// Pre-calculated GPIO patterns for each row/column
DMAMEM static volatile uint32_t mux_patterns[ROW_COUNT] __attribute__((aligned(32)));
DMAMEM static volatile uint32_t shift_patterns[COLUMN_COUNT] __attribute__((aligned(32)));

// Data buffers - properly aligned for DMA
DMAMEM static volatile uint16_t adc_buffer_1[TOTAL_READINGS] __attribute__((aligned(32)));
DMAMEM static volatile uint16_t adc_buffer_2[TOTAL_READINGS] __attribute__((aligned(32)));
volatile uint8_t processed_buffer[TOTAL_READINGS] __attribute__((aligned(32)));

// Global state variables
volatile uint32_t g_completed_samples = 0;
volatile uint32_t g_current_row = 0;
volatile uint32_t g_current_col = 0;
volatile bool g_frame_ready = false;
volatile bool g_scanning = false;

class HardwareChainedScanner {
private:
  // DMA Channels with specific hardware triggers
  DMAChannel pit_trigger_dma;     // Channel 0: PIT timer triggers column advance
  DMAChannel adc_trigger_dma;     // Channel 1: Triggers ADC conversion
  DMAChannel mux_control_dma;     // Channel 2: Column multiplexer switching
  DMAChannel row_advance_dma;     // Channel 3: Row advancement (shift register)

  // ADC setup
  ADC* adc;
  AnalogBufferDMA* abdma;
  
  // Static pointer to access instance from ISR
  static HardwareChainedScanner* instance;

  // Timing and control
  volatile uint32_t pit_reload_value;
  volatile uint32_t samples_collected = 0;

public:
  HardwareChainedScanner() : adc(nullptr), abdma(nullptr) {
    g_completed_samples = 0;
    g_frame_ready = false;
    g_scanning = false;
    instance = this;  // Set static instance pointer
  }

  ~HardwareChainedScanner() {
    if (adc) delete adc;
    if (abdma) delete abdma;
  }

  void setup() {
    Serial.println("Initializing Hardware-Chained DMA Scanner...");
    
    // Initialize ADC
    adc = new ADC();
    adc->adc0->setAveraging(1);
    adc->adc0->setResolution(12);
    adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::VERY_HIGH_SPEED);
    adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::VERY_HIGH_SPEED);
    adc->adc0->enableInterrupts(ADC_0);

    // Initialize AnalogBufferDMA
    abdma = new AnalogBufferDMA(adc_buffer_1, TOTAL_READINGS, adc_buffer_2, TOTAL_READINGS);
    abdma->init(adc, ADC_0);

    // Setup hardware components
    setupGPIOPins();
    setupPITTimer();
    calculateControlPatterns();
    setupSynchronizedDMA();

    Serial.println("Hardware-triggered DMA scanner initialized successfully");
  }

private:
  void setupGPIOPins() {
    // Configure GPIO pins for mux control (assuming pins 4-10 on GPIO1)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_06 = 5; // GPIO1_IO06 (pin 4)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_07 = 5; // GPIO1_IO07 (pin 5)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_08 = 5; // GPIO1_IO08 (pin 6)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_09 = 5; // GPIO1_IO09 (pin 7)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_10 = 5; // GPIO1_IO10 (pin 8)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_11 = 5; // GPIO1_IO11 (pin 9)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_12 = 5; // GPIO1_IO12 (pin 10)

    // Configure as outputs
    GPIO1_GDIR |= (0x7F << 4); // Set pins 4-10 as outputs

    // Configure shift register pins on GPIO2 (assuming pins 2-3)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_04 = 5; // GPIO2_IO04 (pin 2 - data)
    IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_05 = 5; // GPIO2_IO05 (pin 3 - clock)
    GPIO2_GDIR |= (0x3 << 2); // Set pins 2-3 as outputs

    Serial.println("GPIO pins configured for mux and shift register control");
  }

  void setupPITTimer() {
    // Enable PIT clock
    CCM_CCGR1 |= CCM_CCGR1_PIT(CCM_CCGR_ON);

    // Calculate timer reload value for desired frequency
    pit_reload_value = (24000000 / SCAN_FREQUENCY) - 1;  // 24MHz IPG clock

    // Configure PIT Timer 0
    PIT_MCR = 0;                    // Enable PIT module
    PIT_LDVAL0 = pit_reload_value;  // Set reload value
    PIT_TCTRL0 = PIT_TCTRL_TEN | PIT_TCTRL_TIE;  // Enable timer and interrupts

    Serial.print("PIT Timer configured for ");
    Serial.print(SCAN_FREQUENCY);
    Serial.println(" Hz sampling rate");
  }

  void calculateControlPatterns() {
    // Calculate multiplexer patterns for each row
    for (int row = 0; row < ROW_COUNT; row++) {
      uint32_t pattern = GPIO1_DR & ~(0x7F << 4); // Preserve other pins

      // Determine which mux and channel (assuming 16-channel muxes)
      int target_mux = row / 16;  // 0, 1, or 2
      int channel = row % 16;

      // Set inhibit pins (pins 8, 9, 10) - disable all first
      pattern |= (0x7 << 8);  // All inhibits HIGH
      pattern &= ~(1 << (8 + target_mux));  // Enable target mux

      // Set channel address (pins 4, 5, 6, 7)
      pattern |= ((channel & 0x0F) << 4);

      mux_patterns[row] = pattern;
    }

    // Calculate shift register patterns
    for (int col = 0; col < COLUMN_COUNT; col++) {
      uint32_t pattern = GPIO2_DR & ~(0x3 << 2); // Preserve other pins
      
      // Data bit (pin 2) - HIGH for first position, LOW for shift
      if (col == 0) {
        pattern |= (1 << 2);  // Data HIGH for first column
      }
      // Clock will be pulsed separately
      
      shift_patterns[col] = pattern;
    }

    Serial.println("Control patterns calculated for all rows and columns");
  }

  void setupSynchronizedDMA() {
    // For this complex scanning pattern, we'll use a simpler approach
    // that works with AnalogBufferDMA and adds hardware timing control
    
    // === DMA Channel 0: PIT Timer triggers sample advancement ===
    pit_trigger_dma.begin(true);
    pit_trigger_dma.triggerAtHardwareEvent(DMAMUX_SOURCE_PIT0);
    
    // Create a trigger value to advance sampling
    static volatile uint32_t sample_trigger = 1;
    pit_trigger_dma.TCD->SADDR = (void*)&sample_trigger;
    pit_trigger_dma.TCD->SOFF = 0;  // No source increment
    pit_trigger_dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2); // 32-bit
    pit_trigger_dma.TCD->NBYTES_MLNO = 4;
    pit_trigger_dma.TCD->SLAST = 0;
    
    // Destination: dummy register (we'll use interrupt to trigger actions)
    static volatile uint32_t dummy_dest;
    pit_trigger_dma.TCD->DADDR = (void*)&dummy_dest;
    pit_trigger_dma.TCD->DOFF = 0;
    pit_trigger_dma.TCD->CITER_ELINKNO = TOTAL_READINGS;  // Total samples needed
    pit_trigger_dma.TCD->DLASTSGA = 0;
    pit_trigger_dma.TCD->BITER_ELINKNO = TOTAL_READINGS;
    pit_trigger_dma.TCD->CSR = DMA_TCD_CSR_INTMAJOR;  // Interrupt on completion
    
    pit_trigger_dma.attachInterrupt(sampleTriggerISR);
    pit_trigger_dma.enable();

    // === DMA Channel 1: Column multiplexer control ===
    mux_control_dma.begin(true);
    
    // Set up for cycling through column patterns
    mux_control_dma.TCD->SADDR = (void*)shift_patterns;
    mux_control_dma.TCD->SOFF = 4;  // Move to next pattern
    mux_control_dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2); // 32-bit
    mux_control_dma.TCD->NBYTES_MLNO = 4;
    mux_control_dma.TCD->SLAST = -COLUMN_COUNT * 4;  // Reset to start
    
    mux_control_dma.TCD->DADDR = (void*)SHIFT_GPIO_PORT;
    mux_control_dma.TCD->DOFF = 0;
    mux_control_dma.TCD->CITER_ELINKNO = COLUMN_COUNT;
    mux_control_dma.TCD->DLASTSGA = 0;
    mux_control_dma.TCD->BITER_ELINKNO = COLUMN_COUNT;
    mux_control_dma.TCD->CSR = DMA_TCD_CSR_INTMAJOR;

    mux_control_dma.attachInterrupt(columnCompleteISR);

    // === DMA Channel 2: Row advancement (shift register clock) ===
    row_advance_dma.begin(true);
    
    // Clock pulse patterns: HIGH then LOW
    static volatile uint32_t clock_patterns[2] = {
      (1 << 3),  // Clock HIGH
      0          // Clock LOW  
    };
    
    row_advance_dma.TCD->SADDR = (void*)clock_patterns;
    row_advance_dma.TCD->SOFF = 4;  // Next pattern
    row_advance_dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(2) | DMA_TCD_ATTR_DSIZE(2); // 32-bit
    row_advance_dma.TCD->NBYTES_MLNO = 4;
    row_advance_dma.TCD->SLAST = -8;  // Reset to start
    
    row_advance_dma.TCD->DADDR = (void*)SHIFT_GPIO_PORT;
    row_advance_dma.TCD->DOFF = 0;
    row_advance_dma.TCD->CITER_ELINKNO = 2;  // HIGH then LOW
    row_advance_dma.TCD->DLASTSGA = 0;
    row_advance_dma.TCD->BITER_ELINKNO = 2;
    row_advance_dma.TCD->CSR = DMA_TCD_CSR_INTMAJOR;
    
    row_advance_dma.attachInterrupt(rowAdvanceISR);

    Serial.println("Hardware-triggered DMA channels configured");
  }

  // ISR for PIT timer trigger - coordinates the scanning sequence
  static void sampleTriggerISR() {
    if (!instance || !instance->adc) return;
    
    g_completed_samples++;
    
    // Trigger ADC conversion for current position
    // The AnalogBufferDMA will handle the actual data collection
    instance->adc->adc0->startSingleRead(A0);
    
    g_current_col++;
    
    // Check if we need to advance to next column
    if (g_current_col < COLUMN_COUNT) {
      // Update column mux for next sample
      volatile uint32_t* shift_reg = (volatile uint32_t*)SHIFT_GPIO_PORT;
      *shift_reg = shift_patterns[g_current_col];
      
      // Clock the shift register
      *shift_reg |= (1 << 3);   // Clock HIGH
      asm volatile("nop; nop; nop; nop;"); // Small delay
      *shift_reg &= ~(1 << 3);  // Clock LOW
      
    } else {
      // End of row - advance to next row
      g_current_col = 0;
      g_current_row++;
      
      if (g_current_row >= ROW_COUNT) {
        // Frame complete
        g_current_row = 0;
        g_frame_ready = true;
        g_scanning = false;
        
        // Stop PIT timer
        PIT_TCTRL0 &= ~PIT_TCTRL_TEN;
      } else {
        // Set up next row mux
        volatile uint32_t* mux_reg = (volatile uint32_t*)MUX_GPIO_PORT;
        *mux_reg = mux_patterns[g_current_row];
        
        // Reset shift register to first column
        volatile uint32_t* shift_reg = (volatile uint32_t*)SHIFT_GPIO_PORT;
        *shift_reg = shift_patterns[0];
        *shift_reg |= (1 << 3);   // Clock HIGH
        asm volatile("nop; nop; nop; nop;");
        *shift_reg &= ~(1 << 3);  // Clock LOW
      }
    }
    
    // Toggle LED to show activity
    digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
  }

  static void columnCompleteISR() {
    // Column scanning complete - this could trigger row advancement
    // Currently handled in the main sample trigger ISR
  }

  static void rowAdvanceISR() {
    // Row advancement complete
    // Additional setup for new row could go here if needed
  }

public:
  void startScan() {
    if (g_scanning) {
      Serial.println("Scan already in progress");
      return;
    }
    
    // Reset state
    g_completed_samples = 0;
    g_current_row = 0;
    g_current_col = 0;
    g_frame_ready = false;
    g_scanning = true;
    samples_collected = 0;

    // Initialize first row and column
    setupInitialPosition();
    
    // Enable DMA channels (they're already configured, just enable)
    pit_trigger_dma.enable();
    
    // Start the scanning process by enabling PIT timer
    PIT_TCTRL0 |= PIT_TCTRL_TEN;
    
    Serial.println("Matrix scan started - hardware DMA timing active");
  }

  void stopScan() {
    // Stop PIT timer to halt the entire chain
    PIT_TCTRL0 &= ~PIT_TCTRL_TEN;
    
    // Disable DMA channels
    pit_trigger_dma.disable();
    
    g_scanning = false;
    Serial.println("Matrix scan stopped");
  }

  void setupInitialPosition() {
    // Set up first row mux (row 0)
    volatile uint32_t* mux_reg = (volatile uint32_t*)MUX_GPIO_PORT;
    *mux_reg = mux_patterns[0];
    
    // Initialize shift register with first column active
    volatile uint32_t* shift_reg = (volatile uint32_t*)SHIFT_GPIO_PORT;
    *shift_reg = shift_patterns[0];  // Data HIGH for first column
    
    // Clock in the initial data
    *shift_reg |= (1 << 3);   // Clock HIGH
    delayNanoseconds(100);
    *shift_reg &= ~(1 << 3);  // Clock LOW
    
    Serial.println("Initial position set: Row 0, Column 0");
  }

  bool isFrameComplete() {
    return g_frame_ready;
  }

  void acknowledgeFrame() {
    g_frame_ready = false;
  }

  const uint16_t* getFrameData() {
    // Return the buffer that was last filled by AnalogBufferDMA
    volatile uint16_t* buffer = abdma->bufferLastISRFilled();
    return (const uint16_t*)buffer;
  }

  void processFrame() {
    if (!g_frame_ready) return;
    
    // Get the completed data buffer
    volatile uint16_t* buffer = abdma->bufferLastISRFilled();
    
    if (buffer == nullptr) {
      Serial.println("Error: No buffer available");
      return;
    }
    
    // Convert 12-bit ADC data to 8-bit for processing
    for (int i = 0; i < TOTAL_READINGS; i++) {
      processed_buffer[i] = buffer[i] >> 4;  // Convert 12-bit to 8-bit
    }
    
    Serial.print("Frame processed: ");
    Serial.print(g_completed_samples);
    Serial.println(" samples collected");
    
    // Optional: Output some sample data for debugging
    if (Serial.availableForWrite() > 100) {
      Serial.print("Sample data (first 10): ");
      for (int i = 0; i < 10 && i < TOTAL_READINGS; i++) {
        Serial.print(processed_buffer[i]);
        Serial.print(" ");
      }
      Serial.println();
    }
  }

  uint32_t getSamplesCollected() {
    return g_completed_samples;
  }

  bool isScanning() {
    return g_scanning;
  }

  // Debug function to print current state
  void printStatus() {
    Serial.print("Scanning: ");
    Serial.print(g_scanning ? "YES" : "NO");
    Serial.print(", Row: ");
    Serial.print(g_current_row);
    Serial.print(", Col: ");
    Serial.print(g_current_col);
    Serial.print(", Samples: ");
    Serial.print(g_completed_samples);
    Serial.print(", Frame Ready: ");
    Serial.println(g_frame_ready ? "YES" : "NO");
  }
};

// Static member definition
HardwareChainedScanner* HardwareChainedScanner::instance = nullptr;

// Global scanner instance
HardwareChainedScanner g_scanner;

// Arduino setup function
void setup() {
  Serial.begin(115200);
  while (!Serial && millis() < 3000);
  
  Serial.println("40x40 matrix, 160kHz sampling rate");
  
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWriteFast(LED_BUILTIN, LOW);
  
  // Initialize the scanner hardware
  g_scanner.setup();
  
  Serial.println("Setup complete. Starting initial scan...");
  delay(100);
  
  // Start the first scan
  g_scanner.startScan();
}

// Arduino main loop
void loop() {
  static uint32_t last_status_print = 0;
  static uint32_t scan_count = 0;
  
  // Check if frame is complete and process it
  if (g_scanner.isFrameComplete()) {
    g_scanner.processFrame();
    g_scanner.acknowledgeFrame();
    
    scan_count++;
    Serial.print("Completed frame #");
    Serial.println(scan_count);
    
    // Wait a bit before starting next scan
    delay(100);
    
    // Start next scan automatically
    g_scanner.startScan();
  }
  
  // Print status periodically (every 2 seconds)
  if (millis() - last_status_print > 2000) {
    g_scanner.printStatus();
    last_status_print = millis();
  }
  
  // Small delay to prevent overwhelming the system
  delay(10);
  
  // Handle serial commands for debugging
  if (Serial.available()) {
    char cmd = Serial.read();
    switch (cmd) {
      case 's':
        Serial.println("Starting manual scan...");
        g_scanner.startScan();
        break;
      case 'x':
        Serial.println("Stopping scan...");
        g_scanner.stopScan();
        break;
      case '?':
        g_scanner.printStatus();
        break;
    }
  }
}
}
 
Last edited:
I'm trying to scan a pressure sensor matrix of 40 * 40 elements using Teensy 4.1.
I would like to DMA Chaining the whole software.
My circuit consists of Multiplexers for selecting columns where I read one ADC channel from it.
And Shift registers to select rows. ...... but the question is it possible ?

I'm not getting a clear picture of your intended hardware from only these words. Are you really going to connect 1600 pressure sensors?!

Routing a massive number of analog signals through many mux chips to a single ADC is (usually) a recipe for terrible noise problems. Each analog mux adds source impedance and gives opportunity for charge injection from the digital control signals. You can do some things in the analog circuitry to mitigate some of these problems, like low-pass filtering the mux control signals so the mux isn't getting extremely fast edges into its pins. You can use ground planes or shielding, and maybe do things to prevent digital return currents from flowing through the analog ground path. But the more analog muxes you use, the harder it gets...

The built in ADC on Teensy 4.1 is on the same silicon as a 600 MHz Cortex-M7 processor and a lot of other digital circuitry. Even on a good day it gives at best about 10 bits performance. But if you have a lot of digital pin switching happening (eg, controlling shift registers), much worse performance is likely.

Even if you do everything else perfectly, these 2 unpleasant analog issues are likely to add up to low quality results. To achieve good quality measurements, you probably want to use dedicated ADC chips with limited analog muxing, located physically close to each small group of sensors.

On the DMA question, I would answer with a definite maybe.

But just because it might be possible doesn't mean it's a good idea. DMA is complicated and extremely difficult to troubleshoot, so at the very least I would advise getting everything working first with ordinary code. Even interrupts add a lot of complexity, so if you're also building up complicated digital and analog hardware, please do yourself a huge favor by writing the simplest polling-based code. Even if it hogs the CPU and gives lower performance, the path to success is to keep the software side as simple as possible while you have unknown hardware. Get the hardware working well first. Avoid complexity on the software side, even if sub-optimal for performance, until you have the hardware proven working well.

DMA to shift registers would be tricky. You would probably want to use DMA to SPI and let it shift the bits out. DMA directly to GPIO to shift data gets quite complicated. You might look at the OctoWS2811 code for an example. Inside OctoWS2811 you'll see several small but critical details, like switching the pin from fast GPIO to slow-but-DMA-compatible GPIO. My general advice if using shift registers is use SPI. Or avoid shift regsiters and use direct pin control of the muxes, which would then be controlled by writing to the GPIO registers.

The DMA channels have a pretty amazing (or mind numbing) number of powerful features. But those features often come with some caveats and interact with limitations in difficult to anticipate ways. You can link DMA channels, so completion of one channel triggers another channel to do something. Whether that is actually useful depends on the finer details of how both channels minor and major loops are set up. Usually there is some way to accomplish quite complicated behavior, but getting it to worth within the specific things the DMA hardware can actually do sometimes requires a lot of creativity. The really hard part is debugging, since the DMA controller works mostly invisibly in the background. When it doesn't do what you want, figuring out why it did (or even what it did) can take a lot of trial and error.

Teensy has a fast CPU. Use it the simplest no-interrupt polling way to get started. Get the hardware working well first. For the sake of your own sanity, resist the urge to dabble with interrupts and DMA until you have the hardware working well. Then as you dive into that complicated stuff, you'll also have a simple polling-based program for the sake of comparison.


Here is my attempt, that I didn't test,

Was this code generated by a LLM chat bot?
 
Last edited:
Back
Top