ILI9341_t3n DMA problem when dynamically constructed on a Teensy 4.1

jorj

Member
Kurt, you probably know this answer too...

Here's a test program (wired up on a Teensy 4.1) that dumps a 320x240 16-bit image on to an ILI9341, using DMA. Works great as long as it statically constructs the ILI9341_t3n object. If I dynamically construct it with new(), then no image is displayed and the DMA frame counter doesn't increment via serial.

In some versions of this test, I've seen one of the three DMA update slices happen before it hangs but it never finishes a full frame when TESTNEW is 1. Always works a treat with TESTNEW set to 0.

Code:
#include <Arduino.h>
#include <ILI9341_t3n.h>
#include "image.h"

#define PIN_RST 8
#define PIN_DC 9
#define PIN_CS 0
#define PIN_MOSI 26
#define PIN_MISO 1
#define PIN_SCK 27

#define TESTNEW 0

DMAMEM uint8_t dmaBuffer[240*320*2] __attribute__((aligned(32)));

#if TESTNEW
ILI9341_t3n *tft = NULL;
#else
ILI9341_t3n tft(PIN_CS, PIN_DC, PIN_RST, PIN_MOSI, PIN_SCK, PIN_MISO);
#endif

void setup()
{
  Serial.begin(230400);
  delay(200);
  if (CrashReport) {
    Serial.print(CrashReport);
    delay(5000);
  }
  
#if TESTNEW
  tft = new ILI9341_t3n(PIN_CS, PIN_DC, PIN_RST, PIN_MOSI, PIN_SCK, PIN_MISO);
  tft->begin(50000000u);
  tft->setRotation(3);
  tft->setFrameBuffer((uint16_t *)dmaBuffer);
  tft->useFrameBuffer(true);
  tft->fillScreen(ILI9341_BLACK);
#else
  tft.begin(50000000u);
  tft.setRotation(3);
  tft.setFrameBuffer((uint16_t *)dmaBuffer);
  tft.useFrameBuffer(true);
  tft.fillScreen(ILI9341_BLACK);
#endif
  
  const uint8_t *p = testimg;
  for (uint16_t y=0; y<240; y++) {
    for (uint16_t x=0; x<320; x++) {
      uint8_t v = pgm_read_byte(p++);
      dmaBuffer[(y*320+x)*2+1] = v;
      v = pgm_read_byte(p++);
      dmaBuffer[(y*320+x)*2] = v;
    }
  }
}

void loop()
{
#if TESTNEW
  tft->updateScreenAsync(true);
#else
  tft.updateScreenAsync(true);
#endif
  delay(1000);
#if TESTNEW
  Serial.println(tft->frameCount());
#else
  Serial.println(tft.frameCount());
#endif
}

The image (and a copy of that source) are in the attached zipfile.

I'm guessing there may be something ith the way the dma callback is allocated in RAM one way or the other that the 1062 doesn't like, but haven't found anything in the docs to confirm or deny that yet...

View attachment testili9341.zip
 
Does not surprise me, I had same issue with ST7735/89 code with the uncanny eyes examples from Adafruit...

The issue then (and probably now) is that the dma structures:
Code:
  DMASetting _dmasettings[3];
  DMAChannel _dmatx;
are part of the display object, and I chain up a set of settings to each other to handle the DMA, and they for example include replace on completion... And the system plain did not like that.
Probably the real memory was out of sync with the cache memory, tried different things, and the way I got it to work on ST was to have the library allocate a set of these as static objects, so it allocated enough of them always for to handle maybe up to 3 displays..

Can do the same here, which will probably waste a little memory... Wish at times there was something like: memoryAllocate(sizeof(dmaSeting), MEMORY_LOW); (or uncached or ..)
 
Thanks, that gives me somewhere to look. I've got a version of the RA8875 driver that's doing DMA (8-bit only) and suffering from the same issue. I'll play with flushing the cache first to see if that helps at all.
 
I have a version with the dma stuff as mentioned up in the dma_new_fix branch,

BUT at times the display wants to get screwed up... Need to see where it might be happening...
Not necessarily with the DMA stuff... Could be something is not initialized and by default is 0....

But if you want to give it try it is up in that new branch
 
I have a version with the dma stuff as mentioned up in the dma_new_fix branch,

BUT at times the display wants to get screwed up... Need to see where it might be happening...
Not necessarily with the DMA stuff... Could be something is not initialized and by default is 0....

But if you want to give it try it is up in that new branch

Seems to work casually, I'll work it out a little more later.

I'm not convinced that the problem is (at least solely) caching - playing with this a little over lunch today, I found (as I'm sure you've seen) no amount of arm cache flushing helped when it was malloc'd in RAM2. I also played with allocing in EXTRAM which also fails. Unless there's a cache on EXTRAM that I'm missing (entirely possible).

Since you want this to be in uncached RAM1, though - any reason you're not decorating it as FASTRUN?

Code:
FASTRUN ILI9341DMA_Data ILI9341_t3n::_dma_data[3];   // one structure for each SPI buss...
 
Seems to work casually, I'll work it out a little more later.

I'm not convinced that the problem is (at least solely) caching - playing with this a little over lunch today, I found (as I'm sure you've seen) no amount of arm cache flushing helped when it was malloc'd in RAM2. I also played with allocing in EXTRAM which also fails. Unless there's a cache on EXTRAM that I'm missing (entirely possible).

Since you want this to be in uncached RAM1, though - any reason you're not decorating it as FASTRUN?

Code:
FASTRUN ILI9341DMA_Data ILI9341_t3n::_dma_data[3];   // one structure for each SPI buss...

EXTMEM on PSRAM, when available, uses the same CACHE feature as RAM2/DMAMEM.

FASTRUN is to push code to RAM1 ITCM - all dynamic variables by default are allocated in RAM1 DTCM.
 
EXTMEM on PSRAM, when available, uses the same CACHE feature as RAM2/DMAMEM.

Thanks, didn't know it was doing that. That definitely explains the behavior.

FASTRUN is to push code to RAM1 ITCM - all dynamic variables by default are allocated in RAM1 DTCM.

Okay, so I'm not aware of a decoration for a segment of RAM1 DTCM, and in my mind declaring that it has to be in RAM1 makes sense. The comment before that code says "This way we make sure it is hopefully in uncached memory" and it seems to me that "hopefully" is based on it not being declared as destined for a particular region.

Is there a way to declare something for the RAM1 DTCM segment? I mean, I think this is basically what it's doing by default:

Code:
__attribute__ ((section(".data"))) ILI9341DMA_Data ILI9341_t3n::_dma_data[3];   // one structure for each SPI buss...

... which happens to be uncached on this architecture, but ".data" doesn't semantically address whether or not it's in a region of memory that's cached. Is there a better way to codify that this must be uncached in order to function correctly? Maybe that's the question I'm trying to ask. In lieu of Kurt's dream of specifying where to malloc from, does it make more sense to decorate this as FASTRUN so it at least has some connotation of being explicitly in RAM1?
 
... just a refresher as understood here:

Look at the picture here and note what is shown for RAM2 and PSRAM. Nothing placed there is ever initialized data, or any indication of anything not explicitly assigned there at COMPILE time with: DMAMEM or EXTMEM.

RAM1 has no cache since it runs at processor speed. And is the default for any allocated data that isn't somehow CONST and kept in FLASH as PROGMEM.

DMA writes directly to physical RAM as it is pointed to. In so doing CACHE is bypassed. So nothing in the cache should be used after DMA operation as it won't reflect the results of the DMA operation.
 
Back
Top