Teensy LC: Trying to Link DMA Channels

Status
Not open for further replies.

kbob

Member
I am trying to program a Teensy LC to link two DMA channels so that they copy data sequentially, clocked by one of the timer-counters. The eventual goal is to use the DMA to reprogram the timer-counter, but this test program just copies memory-to-memory.

The test program is below. I want it to copy the string "01234567890123456789" to the output buffer, one byte per timer overflow (2.7 msec). I want DMA channel 0 to copy the first 10 bytes, and channel 1 to copy the other 10.

What actually happens, see the log below, is that channel 0 copies the first 10 bytes, then channel 1 copies one byte and stops.

If I set the ERQ bit in DMA channel 1 at setup time, both channels run in parallel.

What am I doing wrong? Thanks.

Test program.
Code:
#include <DMAChannel.h>

#ifndef __MKL26Z64__
  #error "for Teensy LC only"
#endif

#define BUF_CNT 10
#define OUT_CNT (3 * BUF_CNT)

DMAChannel chan0, chan1;  // Reserve two DMA channels.
uint8_t buf[BUF_CNT];
uint8_t out[OUT_CNT];
uint32_t start_time, report_time, stop_time;

#define assert(c) ((c) || assertion_failed(#c, __LINE__))

void start_DMA()
{
  assert(chan0.channel == 0);
  assert(chan1.channel == 1);
  
  // Configure DMA channel 0
  DMAMUX0_CHCFG0 = DMAMUX_ENABLE | DMAMUX_SOURCE_FTM1_OV;
  DMA_SAR0 = buf;
  DMA_DAR0 = out;
  DMA_DSR_BCR0 = sizeof buf;
  DMA_DCR0 = 0 |
             DMA_DCR_ERQ |
             DMA_DCR_CS |
             DMA_DCR_SINC |
             DMA_DCR_SSIZE(0b01) |
             DMA_DCR_DINC |
             DMA_DCR_DSIZE(0b01) |
             DMA_DCR_D_REQ |
             DMA_DCR_LINKCC(0b11) |
             DMA_DCR_LCH1(1) |
             0;

  
  // Configure DMA channel 1
  DMAMUX0_CHCFG1 = DMAMUX_ENABLE | DMAMUX_SOURCE_FTM1_OV;
  DMA_SAR1 = buf;
  DMA_DAR1 = out + BUF_CNT;
  DMA_DSR_BCR1 = sizeof buf;
  DMA_DCR1 = 0 |
//             DMA_DCR_ERQ |
             DMA_DCR_CS |
             DMA_DCR_SINC |
             DMA_DCR_SSIZE(0b01) |
             DMA_DCR_DINC |
             DMA_DCR_DSIZE(0b01) |
             DMA_DCR_D_REQ |
             DMA_DCR_LINKCC(0b11) |
             DMA_DCR_LCH1(0) |
             0;

  dump_registers("Ready to start");
  
  FTM1_SC |= 0x100;   // FTM_SC_DMA bit not defined in kinetis.h

  dump_registers("Started");
}

void dump_registers(const char *label)
{
  #define PREG(r) (Serial.print(#r " "),  \
                   Serial.println((uint32_t)(r), HEX))

  Serial.println(label);

  PREG(DMAMUX0_CHCFG0);
  PREG(DMA_SAR0);
  PREG(DMA_DAR0);
  PREG(DMA_DSR_BCR0);
  PREG(DMA_DCR0);
  Serial.println();

  PREG(DMAMUX0_CHCFG1);
  PREG(DMA_SAR1);
  PREG(DMA_DAR1);
  PREG(DMA_DSR_BCR1);
  PREG(DMA_DCR1);
  Serial.println();
  
  PREG(FTM1_SC);
  Serial.println();
  
  #undef PREG
}

void report()
{
  for (int i = 0; i < OUT_CNT; i++) {
    uint16_t v = out[i];
    if ('0' <= v && v <= '9')
      Serial.print((char)v);
    else if (v == 0xFF)
      Serial.print('.');
    else
      Serial.print('?');
  }
  Serial.println();
}


void setup()
{
  Serial.begin(115200);
  
  for (int i = 0; i < BUF_CNT; i++)
    buf[i] = '0' + i;

  for (int i = 0; i < OUT_CNT; i++)
    out[i] = 0xFF;

  analogWrite(17, 127);   // ensure the timer-counter is running

  start_time = millis() + 1000;
  report_time = start_time;
  stop_time = start_time + 50;
}

void loop()
{
  uint32_t t0 = millis();
  if (t0 < start_time)
    return;
    
  start_DMA();
  while (millis() < stop_time) {
    if (millis() >= report_time) {
      report();
      report_time += 5;
    }
  }
  
  dump_registers("Stop");
  while (1)  // Sit.  Stay.
    continue;
}

bool assertion_failed(const char *c, int line)
{
  Serial.println();
  Serial.print("Assertion failed at line ");
  Serial.print(line);
  Serial.print(": ");
  Serial.println(c);
  while (1)
    continue;
  return false;
}

Serial output.
Code:
Ready to start
DMAMUX0_CHCFG0 B7
DMA_SAR0 1FFFFE64
DMA_DAR0 1FFFFE3C
DMA_DSR_BCR0 A
DMA_DCR0 605A00B4

DMAMUX0_CHCFG1 B7
DMA_SAR1 1FFFFE64
DMA_DAR1 1FFFFE46
DMA_DSR_BCR1 A
DMA_DCR1 205A00B0

FTM1_SC 89

Started
DMAMUX0_CHCFG0 B7
DMA_SAR0 1FFFFE65
DMA_DAR0 1FFFFE3D
DMA_DSR_BCR0 2000009
DMA_DCR0 605A00B4

DMAMUX0_CHCFG1 B7
DMA_SAR1 1FFFFE64
DMA_DAR1 1FFFFE46
DMA_DSR_BCR1 A
DMA_DCR1 205A00B0

FTM1_SC 109

0.............................
012...........................
01234.........................
01234567......................
01234567890...................
01234567890...................
01234567890...................
01234567890...................
01234567890...................
01234567890...................
01234567890...................
Stop
DMAMUX0_CHCFG0 B7
DMA_SAR0 1FFFFE6E
DMA_DAR0 1FFFFE46
DMA_DSR_BCR0 1000000
DMA_DCR0 205A00B4

DMAMUX0_CHCFG1 B7
DMA_SAR1 1FFFFE65
DMA_DAR1 1FFFFE47
DMA_DSR_BCR1 2000009
DMA_DCR1 205A00B0

FTM1_SC 189
 
Here's some already-written code from when I was previously testing DMAChannel.h. Maybe it will help?

Unfortunately, I just can't dig into your code right now (which is using the direct registers, not DMAChannel). Sorry.

Code:
#include "DMAChannel.h"

DMAChannel myDMA;
DMAChannel myDMA2;

unsigned char cbuf[6], c1;
unsigned char tbuf[6], t1;

void p(void) {
	unsigned int i;
	Serial.print("c1=");
	Serial.print(c1);
	Serial.print(", cbuf=");
	for (i=0; i<sizeof(cbuf); i++) {
		Serial.print(cbuf[i]);
		Serial.print(", ");
	}
	Serial.print("  t1=");
	Serial.print(t1);
	Serial.print(", tbuf=");
	for (i=0; i<sizeof(tbuf); i++) {
		Serial.print(tbuf[i]);
		Serial.print(", ");
	}

	Serial.println();

}

// the setup routine runs once when you press reset:
void setup() {
	while (!Serial) ;
	delay(100);
	Serial.println("DMA Test 2");

	// configure the myDMA channel to copy c1 to cbuf[6]
	myDMA.source(c1);
	myDMA.destinationBuffer(cbuf, sizeof(cbuf));
	myDMA.enable();

	// configure myDMA2 to be triggered by myDMA
	myDMA2.source(t1);
	myDMA2.destinationBuffer(tbuf, sizeof(tbuf));
	myDMA2.triggerAtTransfersOf(myDMA);  // triggers #1-5
	//myDMA2.triggerAtCompletionOf(myDMA); // trigger #6
	myDMA2.enable();

	c1 = 45;
	t1 = 36;
	p();
	myDMA.triggerManual();
	p();
	myDMA.triggerManual();
	p();
	c1 = 72;
	myDMA.triggerManual();
	p();
	t1 = 81;
	myDMA.triggerManual();
	p();
	c1 = 66;
	myDMA.triggerManual();
	p();
	myDMA.triggerManual();
	p();
}

void loop() {
}
 
I found a solution. Instead of ping-ponging between two DMA channels, I am using one DMA channel with a circular source buffer and a PIT timer programmed to interrupt twice per bufferful. I'll post source code later -- right now I need to go to work.

I re-read the DMA control chapter of the reference manual, and now I don't believe the ping-pong method is possible. (I'll be happy to be proven wrong, of course.) For my app, trading a timer for a DMA is good. I'm pretty sure I'll be using all 4 DMAs before I'm done. For a general purpose driver, the tradeoff is less clear.
 
DMA Ring Buffers

Here is the working test program. This one is a little different. An IntervalTimer's ISR continuously fills a ring buffer (src) with characters. The DMA engine, clocked by a timer-counter's overflow, copies one character at a time to another ring buffer (dest). Every 20 msec (~7 ticks), the background program prints a snapshot of the two ring buffers. The data pattern is 12 repetitions each of the letters 'a' through 'j', followed by an '_'. (That's 121 characters total.)

In the serial output, you can clearly see the data being loaded into the source buffer and copied over to the destination.

If anyone reading this thread hasn't guessed, my eventual goal is to drive a string of WS28xx LEDs via DMA and the timer-counter. This program demonstrates the DMA part.

Test program:
Code:
#include <DMAChannel.h>
#include <IntervalTimer.h>

#ifndef __MKL26Z64__
  #error "for Teensy LC only"
#endif

#define SRC_SIZE 32
#define DEST_SIZE 16
#define LED_SIZE 12
#define LED_COUNT 10

#define START_DELAY 1000 // msec
#define REPORT_INTERVAL 20
#define DURATION 320

DMAChannel dma;
IntervalTimer timer;
char src[SRC_SIZE] __attribute__((__aligned__(SRC_SIZE)));
char dest[DEST_SIZE] __attribute__((__aligned__(DEST_SIZE)));
volatile uint8_t LEDnum;
volatile uint8_t cursor;
volatile uint32_t i0r, i1r, bcrr;
uint32_t start_time, report_time, stop_time;

void write_LED(uint8_t pos)
{
  uint8_t led = LEDnum++;
  char c = led < LED_COUNT ? 'a' + led : '_';
  for (uint8_t i = 0; i < LED_SIZE; i++) {
    src[pos] = c;
    pos = (pos + 1) % SRC_SIZE;
  }
}

void fill_buffer(uint8_t i0, uint8_t i1)
{
  uint8_t avail = i1 >= i0 ? i1 - i0 : i1 + SRC_SIZE - i0;
  while (avail >= LED_SIZE) {
    write_LED(cursor);
    cursor = (cursor + LED_SIZE) % SRC_SIZE;
    avail -= LED_SIZE;
  }
}

void timer_ISR()
{
  volatile const void *sar = dma.CFG->SAR;

  uint8_t i0 = cursor;
  uint32_t i1 = (char *)sar - src;
  i0r = i0;
  i1r = i1;
  bcrr = dma.CFG->DSR_BCR;
  fill_buffer(i0, i1);
}

void start()
{
  LEDnum = 0;
  cursor = 0;
  fill_buffer(0, SRC_SIZE);
  
  // Disable timer DMA.
  FTM1_SC &= ~0x100;   // FTM_SC_DMA bit not defined in kinetis.h

  // Configure timer
  //
  // timer overflows every 64K bus clock ticks.
  // We want to wake up after buf_size/2 overflows.
  // Multiply by 1000000 for microseconds.
  uint32_t interval = 65536 / (F_BUS / 1000000) * SRC_SIZE / 2;
  timer.begin(timer_ISR, interval);

  // Configure DMA channel
  dma.triggerAtHardwareEvent(DMAMUX_SOURCE_FTM1_OV);
  dma.sourceCircular((unsigned char *)src, sizeof src);
  dma.destinationCircular((unsigned char *)dest, sizeof dest);
  dma.transferCount(LED_COUNT * LED_SIZE + 1);
  dma.enable();

  // Enable timer DMA.
  FTM1_SC |= FTM_SC_TOF;
  FTM1_SC |= 0x100;  // 0x100 == FTM_SC_DMA
}

void report()
{
  for (int i = 0; i < SRC_SIZE; i++)
    Serial.print(src[i]);
  Serial.print("   ");
  for (int i = 0; i < DEST_SIZE; i++)
    Serial.print(dest[i]);
  Serial.print(' ');
  Serial.print(i0r);
  Serial.print(' ');
  Serial.print(i1r);
  Serial.print(' ');
  Serial.print(bcrr, HEX);
  Serial.println();
}

void setup()
{
  Serial.begin(115200);
  while (!Serial)
    continue;

  for (int i = 0; i < SRC_SIZE; i++)
    src[i] = '.';
  for (int i = 0; i < DEST_SIZE; i++)
    dest[i] = '.';

  start_time = millis() + START_DELAY;
  report_time = start_time;
  stop_time = start_time + DURATION;
}

void loop()
{
  uint32_t t0 = millis();
  if (t0 < start_time)
    return;

  start();
  while (millis() < stop_time) {
    if (millis() >= report_time) {
      report();
      report_time += REPORT_INTERVAL;
    }
  }

  timer.end();
  dma.clearComplete();

  while (1)  // Sit.  Stay.
    continue;
}

Serial output. Each row is a snapshot, and a snapshot was printed every 20 msec. The left column is the DMA source ring buffer. It gets updated 12 letters at a time as space is available. The second column is the DMA destination ring buffer. It gets updated by 1 letter every 2.7 msec. The 3rd and 4th columns on the right are snapshots of variables in timer_ISR(), and the last column is the DMA DSR_BCR register as of the most recent timer ISR. 0x2000000 is BUSY, 0x1000000 is DONE, and the low order bits are the number of bytes left to copy.
Code:
aaaaaaaaaaaabbbbbbbbbbbb........   ................ 0 0 0
aaaaaaaaaaaabbbbbbbbbbbb........   aaaaaaaaaa...... 0 0 0
aaaaaaaaaaaabbbbbbbbbbbb........   bbbaaaaaaaaabbbb 0 0 0
ccccddddddddddddbbbbbbbbcccccccc   bbbbbbbbcccccbbb 24 21 2000064
ccccddddddddddddbbbbbbbbcccccccc   ccccdddbcccccccc 24 21 2000064
ffffffffddddddddeeeeeeeeeeeeffff   ecccdddddddddddd 16 11 200004E
ffffffffddddddddeeeeeeeeeeeeffff   eeeeeeeeeeeddddd 16 11 200004E
ffffffffgggggggggggghhhhhhhhhhhh   ffffeeeeeeeeffff 8 0 2000039
ffffffffgggggggggggghhhhhhhhhhhh   ffffffffggggggff 8 0 2000039
iiiiiiiiiiiigggggggghhhhhhhhhhhh   gggghhhhgggggggg 0 21 2000024
iiiiiiiiiiiigggggggghhhhhhhhhhhh   iigghhhhhhhhhhhh 0 21 2000024
____iiiiiiiijjjjjjjjjjjj________   iiiiiiiiiiihhhhh 12 11 200000E
____iiiiiiiijjjjjjjjjjjj________   jjjjjiiiiiiijjjj 12 11 200000E
____iiiiiiiijjjjjjjjjjjj________   jjjjjjjj_iiijjjj 12 11 200000E
________________jjjjjjjj________   jjjjjjjj_iiijjjj 4 25 1000000
________________jjjjjjjj________   jjjjjjjj_iiijjjj 4 25 1000000
 
Your last sketch may help me, i've been trying to figure out how to use FTM1/DMA to trigger DAC ...

A question: You just "or" in to FTM1_SC, so you are assuming what settings for FTM1?

On my LC when I run your sketch and print out initial value of FTM1_SC it is 0x89, and FTM1_MOD is 0xBFFF.
If I measure how long it takes between change in dest and dest[i+1], i get 2048us, which is consistent with these settings.
if I set FMT1_SC with

FTM1_SC = 0; delay(1);
FTM1_SC = FTM_SC_CLKS(1) | FTM_SC_PS(0) | FTM_SC_TOF ;

then the update time is 1024us
 
Hi, manitou.

The Teensyduino runtime has already programmed the timer-counters for PWM output. You're right that the FTM1_MOD register is 0xBFFF; I forgot that detail and got the speed wrong in my comments above. I did not change the timer-counter settings because I did not want to preclude using PWM pins. (That program doesn't use PWM, but still...)

As you observed, the preprogrammed SC value of 0x89 includes a prescaler divide by 2. You changed the prescaler to divide-by-1, so of course it runs twice as fast. (-:

Meanwhile, I've got a driver now that drives WS2811 LEDs on pin 17 by DMA. FTM1 channel 1 drives pin 17 in PWM mode. FTM1 channel 1 also clocks a DMA channel. That DMA channel loads a new pulse width into the FTM1_C1V from a circular buffer. And an IntervalTimer's ISR reloads the circular buffer, as demonstrated above.

I can't take the credit -- using DMA to reload the PWM register was Paul's idea. I just filled in the details. (-:
 
Meanwhile, I've got a driver now that drives WS2811 LEDs on pin 17 by DMA. FTM1 channel 1 drives pin 17 in PWM mode. FTM1 channel 1 also clocks a DMA channel. That DMA channel loads a new pulse width into the FTM1_C1V from a circular buffer. And an IntervalTimer's ISR reloads the circular buffer, as demonstrated above.

I can't take the credit -- using DMA to reload the PWM register was Paul's idea. I just filled in the details. (-:

Excellent. will you be posting the code for your driver?
 
UniWS LED driver

Excellent. will you be posting the code for your driver?

Here it is.

View attachment UniWS.ino

The caveats:

  1. This is just a first draft. Not production quality.
  2. There is an annoying glitch in the LEDs every few seconds. I don't know what it is yet.
  3. The API is somewhere between the Adafruit_Neopixel and OctoWS2811. Not quite compatible with either.
  4. Only tested on Teensy LC with my 16 NeoPixels (which don't run well at 800 KHz -- maybe they're 400 KHz devices.)
  5. The library is hardcoded to pin 17, but aside from the 5V issue, it could be made to run on any PWM pin. It takes over the whole timer-counter, though.
  6. I still don't fully initialize the TPM/FTM.
  7. I don't actually know how to package a library for Arduino. (Is there any documentation?)

If you give it a try, let me know how it works for you.
 
I just noticed that the UniWS sketch initializes Serial. If you run it with no serial monitor, it hangs in setup().
 
One really easy but important thing needs to be done. To share this with the world, you need to put an open source license on it.

The simplest and most permissive one is the MIT License. If you want to let people use your code any way, even in commercial projects, this is the best one. Just copy that text into a big comment at the top of the code, with your name and 2015 on the first line.

To turn this into a proper Arduino library usually involves a pair of .cpp and .h files, an examples folder, and maybe other stuff like keywords.txt and the new library.properties file.

But really, the only essential thing is the license comment, which establishes that you're the original author, and other people have permission to modify & share your work.
 
Thanks. As soon as I get a couple of independent "It works for me" confirmations, I'll package it up and stick it on GitHub. EDIT: if it works for you, do you see glitches?

I found this resource, too. Writing a Library for Arduino

Next up: I need a SPI slave driver for Teensy LC...
 
Last edited:
In your UniWS.ino sketch you comment

Code:
   // Bug in DMAChannel: transferCount shifts right to convert words to bytes.
    // It should shift left.

Have you brought this to Paul's attention (LC code in DMAChannel.h)?

In your first sketch, you also note that

FTM1_SC &= ~0x100; // FTM_SC_DMA bit not defined in kinetis.h

I also noted that there is a DMA bit in the LC TPMx_SC that is not in the teensy 3 FTMx_SC and no corresponding symbol in kinetis.h
 
In your UniWS.ino sketch you comment

Code:
   // Bug in DMAChannel: transferCount shifts right to convert words to bytes.
    // It should shift left.

Have you brought this to Paul's attention (LC code in DMAChannel.h)?

In your first sketch, you also note that

FTM1_SC &= ~0x100; // FTM_SC_DMA bit not defined in kinetis.h

I also noted that there is a DMA bit in the LC TPMx_SC that is not in the teensy 3 FTMx_SC and no corresponding symbol in kinetis.h

... and then I fell into a pit for a month. I just now climbed out and sent Paul two pull requests. Thanks for reminding me.

https://github.com/PaulStoffregen/cores/pull/74
https://github.com/PaulStoffregen/cores/pull/75
 
Status
Not open for further replies.
Back
Top