Teensy 4.1 UART DMA interfering with delay() functionality

las564

New member
Hello, I am working on building a data acquisition system for 5x FUTEK LSB206 Load cells with UART output using a Teensy 4.1. The load cells use QIA128 controller chips to interface with a host system via UART, which is more thoroughly documented here. The load cells offer a "Stream" Mode, which, upon initialization, causes the device to send 4 byte packets containing ADC readings to the host over UART. The structure of the packets are:
Code:
<Highest Significance Byte> <Middle Significance Byte> <Lowest Significance Byte> <Checksum>
Where the checksum is calculated:
Code:
 Checksum = (HSB * 1) + (MSB * 2) + (LSB * 3)

For my initial prototyping I am just using a single load cell.
To the Teensy 4.1, I have attached the load cell's Vin to 3.3V, Gnd to Gnd, and Rx and Tx to Tx1 and Rx1.

I am using Teensyduino and the Arduino IDE. I have written a basic class to send/read commands to/from the load cell using Serial1, which I have used successfully to collect adc data from the single load cell in stream mode.

Given that I will be using 5x of these load cells eventually, I want to use DMA to have the packets being received by the teensy be automatically read into a ping-pong buffer so that it can be saved to a file on an sd card via an interrupt at the end of each half of the buffer.
The full source code is attached to this post, but below is the main script.
Code:
#include <SD.h>
#include <DMAChannel.h>
#include "FutekLoadCell.h"
#include <imxrt.h> // imx manual: https://www.pjrc.com/teensy/IMXRT1060RM_rev3_annotations.pdf
#define DMA_BUFFER_SIZE 1024
#define DMA_BUFFER_WATER DMA_BUFFER_SIZE / 2
volatile unsigned char dma_rx_buffer[DMA_BUFFER_SIZE];
float force_reading;
FutekLoadCell loadcell(&Serial1, 320000);
File dump_fp;
DMAChannel dma;
uint32_t start, isr_calls;
void setup()
{
  Serial.begin(115200);
  while(!Serial);
  // SD card setup
  if(!SD.begin(BUILTIN_SDCARD)) {
    Serial.println("SD Card not present or setup failed");
    return;
  }
  SD.remove("dma_dump");
  dump_fp = SD.open("dma_dump", FILE_WRITE);
  loadcell.begin();
  // Sample Rate Change
  // loadcell.writeCmd(SPSPR_4SPS);
  // loadcell.writeCmd(SPSPR_20SPS);
  loadcell.writeCmd(SPSPR_100SPS);
  // loadcell.writeCmd(SPSPR_850SPS);
  delay(500); // Sample Rate Change takes up to 500ms
  char header[5];
  loadcell.readPayload(header, 5); // parse 5 byte confirmation packet from load cell
  LPUART6_BAUD |= LPUART_BAUD_RDMAE; // set Receiver Full DMA Enable in Serial Baud register (manual pg 2921)
  while(loadcell.available())
    loadcell.readByte();
  // Enable Stream Mode
  loadcell.setStreamState(1);
  delay(500);
  loadcell.readPayload(header, 5); // parse 5 byte confirmation packet from load cell

  // Configure DMA for UART
  dma.source(LPUART6_DATA); // Use Serial1 as the source
  dma.destinationBuffer(dma_rx_buffer, (unsigned int) DMA_BUFFER_SIZE); // Set the destination buffer
  dma.transferSize(4); // Transfer 1 byte at a time
  dma.transferCount(1); // Set number of bytes to transfer
  dma.TCD->CSR &= !DMA_TCD_CSR_ESG; // Disable Scatter Gather
  dma.TCD->DLASTSGA = 0; // Destination Address Offset at the end of each Major Loop
  dma.triggerAtHardwareEvent(DMAMUX_SOURCE_LPUART6_RX); // Trigger on UART RX
  dma.attachInterrupt(isr);
  dma.interruptAtCompletion();
  dma.enable();
 
  start = millis();
  Serial.printf("start time: %dms\n",start);
 
  delay(20000);
  dump_fp.close();
  LPUART6_BAUD &= !LPUART_BAUD_RDMAE;
  dma.disable();
  Serial.println("DONE");
  Serial.printf("elapsed time: %dms\n",millis()-start);
  Serial.printf("isr calls: %d\n\n\n",isr_calls);
}
void loop()
{ 
}
void isr()
{
  uint32_t daddr;
  const char *src;
  ++isr_calls;
  daddr = (uint32_t)(dma.TCD->DADDR);
  dma.clearInterrupt();
  if((daddr == (uint32_t)dma_rx_buffer + DMA_BUFFER_WATER)) {
    // "PING"
    // DMA is recieving to the second half of the buffer
    // need to remove data from the first half
    Serial.println("copying from first half");
    src = (const char *)dma_rx_buffer;
  }
  else if(daddr == (uint32_t)dma_rx_buffer + DMA_BUFFER_SIZE - 1) {
    // "PONG"
    // DMA is recieving to the first half of the buffer
    // need to remove data from the second half
    Serial.println("copying from second half");
    src = (const char *)dma_rx_buffer + DMA_BUFFER_WATER;
    dma.TCD->DADDR = dma_rx_buffer; // reset dma destination to start of buffer
  }
  else {
    return;
  }
  arm_dcache_delete((void *)src, DMA_BUFFER_WATER);
  dump_fp.write(src, DMA_BUFFER_WATER);
}

As you can see, I have configured the TCD of the dma channel to trigger on Serial Recieve (DMAMUX_SOURCE_LPUART6_RX), to read 4 bytes (1 sample) on each minor loop, with 1 minor loop per major loop. I've attached an ISR to run upon major loop completion, which checks if TDC_DADDR is either at the watermark or at the end of the rx buffer. if so, it writes the first or second half respectively of the buffer to a file on the sd card. If TCD_DADDR is at the end of the buffer, the ISR will additionally reset TCD_DADDR to the beginning of the buffer.

What I am expecting is an output like this:

Code:
start time: 1766ms
copying from first half
copying from second half
copying from first half
copying from second half
....
....
copying from first half
copying from second half
DONE
elapsed time: 201766ms

However, what I am seeing instead is:
Code:
start time: 1766ms
sizeof int: 4
copying from first half
DONE
elapsed time: 8ms
isr calls: 287

My immediate reaction is strong confusion as to how delay() seems to be exiting before 20000ms elapse, especially since it seems like nothing is crashing, and the program exits normally. Additionally, the data that is saved to the SD card is staggered, and the checksums are missing:
Code:
7f08 0000 7f08 0000 dd00 0000 7f08 0000
dc00 0000 7f08 0000 dc00 0000 7f08 0000
dd00 0000 7f08 0000 dd00 0000 7f08 0000
de00 0000 7f08 0000 dd00 0000 7f08 0000
dd00 0000 7f08 0000 de00 0000 7f08 0000
de00 0000 7f08 0000 de00 0000 7f08 0000
.

0x7f08dd, 0x7f08dc, 0x7f08de are all readings that I would expect from the load cell, but I do not understand why the checksum is missing, or why they are striped with 0x0000 so regularly.

Interestingly enough, when I set the number of bytes read per minor loop to 1, or 3 instead of 4, the serial output returns to a more expected output, running for the full 20 seconds, however the bytes written to the SD card are more illegible:
Code:
7f7f de7f dc7f dc7f dc7f dc7f dc7f dd7f
dd7f dd7f dd7f dd7f dc7f dd7f dd7f dd7f 
dd7f dc7f dd7f dd7f dd7f de7f dd7f dd7f
de7f dd7f dc7f dc7f dc7f dc7f db7f dc7f
dd7f dc7f db7f dc7f dd7f dc7f dc7f dc7f
dd7f dd7f dd7f dd7f dd7f dd7f dc7f dc7f
This seems seems like packets are bing are being clipped somehow.

If you have any thoughts or theories about what might be happening here, I would love to hear them. Any help is enormously appreciated!

Thanks a million,
Lev
 

Attachments

  • teensy_4_1-Futek.png
    teensy_4_1-Futek.png
    385.1 KB · Views: 15
  • exo_teensy_dma.zip
    57.4 KB · Views: 14
Last edited:
Code:
  dma.TCD->CSR &= !DMA_TCD_CSR_ESG; // Disable Scatter Gather
 ...
LPUART6_BAUD &= !LPUART_BAUD_RDMAE;
These lines immediately jump out at me as being suspicious. The ! operator performs a logical NOT operation that returns 0 or 1. I suspect what you intended to use here is the ~ bitwise NOT operator.
I don't have the IMXRT datasheet available at the moment to check what else is stored in the DMA TCD CSR field but would advise first correcting this mistake to eliminate any possible undefined behaviour.
 
These lines immediately jump out at me as being suspicious. The ! operator performs a logical NOT operation that returns 0 or 1. I suspect what you intended to use here is the ~ bitwise NOT operator.
I don't have the IMXRT datasheet available at the moment to check what else is stored in the DMA TCD CSR field but would advise first correcting this mistake to eliminate any possible undefined behaviour.
Ah what a silly mistake, thank you for catching that. Unfortunately the behavior is still the same, so there must be some thing else going on, but I would have needed to fix this regardless, so thanks again.
 
Try switching the parameters for transferSize() and transferCount(). Instead of transferring 4 bytes at a time for 1 loop, you want to transfer 1 byte at a time (like the comment in the code says!) for 4 loops.

(Really you would want to transfer 4 bytes at a time for X counts to fill half of the dma_rx_buffer, but you would need to modify the UART low watermark for that so it wouldn't trigger the DMA until at least 4 bytes have been received.)
 
Try switching the parameters for transferSize() and transferCount(). Instead of transferring 4 bytes at a time for 1 loop, you want to transfer 1 byte at a time (like the comment in the code says!) for 4 loops.

(Really you would want to transfer 4 bytes at a time for X counts to fill half of the dma_rx_buffer, but you would need to modify the UART low watermark for that so it wouldn't trigger the DMA until at least 4 bytes have been received.)
Thank you for your advice, I have been playing around with my code and have found that the timing bug is caused by the else if statement in the isr:
changing the statement from
Code:
else if(daddr == (uint32_t)dma_rx_buffer + DMA_BUFFER_SIZE - 1) {
to
Code:
else if(daddr == (uint32_t)dma_rx_buffer + DMA_BUFFER_SIZE) {
causes the program to run to completion as expected each and every time.

However the problem of being able to configure DMA to correctly read each 4 byte packet still persists. I have tried varying: the number of bytes per minor loop via
Code:
dma.transferSize(DMA_BYTES_PER_MINOR_LOOP);
the number of minor loops in a major loop via
Code:
dma.transferCount(DMA_MINOR_LOOPS_PER_MAJOR);
and the UART RX Watermark (between the 2[default] and 3) via
Code:
LPUART6_WATER |= LPUART_WATER_RXWATER(UART_RX_DMA_WATERMARK);

I have attached dumps from these tests to this message, but as you can see the closest to 'working' that I am able to get is with either [minor loop = 2B; major = 2minors; RX_water = 2B] or [minor loop = 2B; major = 1minor; RX_water = 2B]. Both of these settings allow the teensy to record 3 Bytes from each 4 Byte payload, with only the checksum byte missing.

Setting it to [minor loop = 4B; major = 1minors; RX_water = 2B], yields similar results, except with 2 empty bytes in between each useful 2byte chunk (makes sense given that it is reading 4 bytes after 2 bytes have arrived).

After seeing this, I was worried that there was some issue with the load cell and that the teensy wasn't receiving the checksum byte, but after looking at the load cell stream packets on an oscilloscope, it seems like the 4Byte packets are arriving to the teensy intact (images of a few packets captured on the scope are also attached).

I am a bit confused as to how the UART_WATER_RXWATER bit works. the imx manual says:
Code:
Receive Watermark
When the number of datawords in the receive FIFO/buffer is greater than the value in this register field,
an interrupt or a DMA request is generated. For proper operation, the value in RXWATER must be set to
be less than the receive FIFO/buffer size as indicated by FIFO[RXFIFOSIZE] and FIFO[RXFE].

From this explanation, when RXWATER is set to 2(the default), there needs to be at least 3 bytes in the RXFIFO for a DMA request to be generated. This means that if my DMA config reads 4 bytes in each minor loop, there should be at least 3 non-empty bytes in the output buffer, correct? but this is not what I am seeing – as I alluded to earlier, with the settings [minor loop = 4B; major = 1minors; RX_water = 2B] I see a DMA output like the following:
Code:
de00 0000 7f08 0000 dc00 0000 7f08 0000
dc00 0000 7f08 0000 db00 0000 7f08 0000
In this example (water2_minior4_major1.log in the attached zip), it appears as if each minor loop 2 of the 4 bytes being read are empty as opposed to at least 3, which is what I would expect. I am assuming that the documentation is correct, but that doesn't really align with what I am seeing. Am I misunderstanding how the RXWATER works?

Thank you again for taking the time to read over my post, all of your help means the world!
 

Attachments

  • teensy4-1_dma_serial_dumps.zip
    9.8 KB · Views: 14
  • rx_packet_scope_imgs.zip
    64.9 KB · Views: 14
The issue here is that you are using the DMAChannel.h transferSize() function, which doesnt do what you want it to.
(RT1060 pgs 101, 104) From what I understand, a minor loop is STILL A LOOP, in that it will transfer multiple times until NBYTES are read
The reason it wasn't working before was that you had TCD_ATTR[SSIZE] = TCD_ATTR[DSIZE] = 2 (which is specified in the transferSize() function)
(RT1060 pg 155-156) Having these values as 010b will read out a 32 bit / 4-byte area
If I'm understanding this right, a DMA request will trigger immediately when the FIFO gets full to the watermark, and so we will miss the last byte because we grabbed the entire 4-byte block as it was at that point.
However, setting TCD_ATTR[SSIZE] to 000b will read an 8-bit / 1-byte area.
Due to the looping nature of the minor loop (RT1060 pg 104):
1 "iteration" of the minor loop will transfer SSIZE bits from the source to the destination until DSIZE bits have been transferred. If NBYTES != the number of bytes transferred, it will do another iteration, as the minor loop is not complete.​
-Note that these iterations are internal to the minor loop - and will not decrement CITER
The following line is in the transferSize() function of DMAChannel.h in Teensyduino.
Code:
TCD->ATTR = (TCD->ATTR & 0xF8F8) | 0x0202;
-From page 155-156, we know that this keeps TCD_ATTR[SMOD] and TCD_ATTR[DMOD] the same, while adding the SSIZE and DSIZE values
- For your purposes, change this line to the following:
Code:
TCD->ATTR = (TCD->ATTR & 0xF8F8) | 0x0002;

Note that this can also be done by setting the dma.TCD->ATTR_DST and dma.TCD->ATTR_SRC to independantly set each side of TCD_ATTR. If you did this, you could continue to use transferSize(), but would need to manually include your xMOD if you were using it.

You will also need to modify the offset values. Typically, (TCD_ATTR[xSIZE])^2 = TCD_xOFF. In other words, xOFF is the address offset after each transfer - so TCD_DOFF = NBYTES. However, since you are reading from serial and the data is a stream at a static address, you will want to set TCD_SOFF = 0.

Now, if you use Watermark = 3, NBYTES = 4, CITER = BITER = 1, you will get the entire 4-byte block. I have attached code with these modifications

Not sure if this is helpful, but you could also get the entire 4-byte block by disabling the FIFO entirely, per this thread.
If you do this, you will probably also need to use NBYTES = 4, CITER = BITER = 1. Pretty sure using other settings would read a lot of zero bytes between the actual data.
 

Attachments

  • exo_sensor_handling.ino
    5.5 KB · Views: 7
Back
Top