KurtE
Senior Member+
Sorry, I know that I have brought some of this up a few times during the beta thread, but still running into issues, that I am trying to figure out how best to solve.
Probably the most recent and verbose was at: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=211230&viewfull=1#post211230
Put simply: If you are trying to use DMA to either send data to a device or receive data from a device, and the memory is from the upper memory area, there are some interesting things to think about and figure out what to do. You will get memory from these locations if you either declare your variables as DMAMEM or by using malloc or new.
The issue is that DMA operations work directly with the actual physical memory, where simply reading and writing variables in this range go through a cache and the actual updated values from setting a value may or may not have actually made it to the physical memory and likewise any updates by DMA to actual memory may not be seen by those things fetching from the cache...
There are system functions available to help out in some cases, which for example I put into the SPI library when we do non-blocking transfers.
For example if you are doing a transfer from memory to a device: I call arm_dcache_flush(buffer, count);
Which tells the system to flush the cache in that range and size back to physical memory.
Likewise if you are doing an SPI transfer from a device back into memory, we call arm_decache_delete(retbuf, count), which tells the system to delete it's cache such that later reads will retrieve the data back from the physical memory...
With some of our display driver code, this can work for single updates, where before we tell the system to update our display from frame buffer, we flush the memory. But breaks down if we try to do continuous updates. Could go into details, but... So to work around this I instead have two smaller buffers, as part of the display object, which I copy the data out of the frame buffer and then output from there... Which works OK, when the display object is a static object, as was by all of my test cases... UNTIL:
Uncanny Eyes, I am trying a version where I have one display on SPI and another on SPI1, and am trying to have the two displays update at same time using DMA and it is NOT working!
Then found that this program is doing a new operation for the displays. Which is working except for my DMA support!
To verify this, I made a version of one of my st77xx test programs that allow me to either use a static tft object or one created by using new, and sure enough the one done by new is failing...
View attachment st7735_t3_simpletest_FB-190814a.zip
If you build it as is, it will use static object... If you uncomment the first line #define...
It will do a new...
Hitting CR - will go through the different orientations of the screen>
hitting a<CR> - will update the screen using async (one time)
hitting c<CR> - Will do a continuous update for several frames.
These asynch ones will fail.
And I am now trying to fully understand and figure out how/if I can resolve. Some of the issues are (I think):
a) My smaller buffers are in high memory so again that cache issue. That part is not hard to work around:
Note: the DMA ISR does get called a couple of times before it does not work (needs to get called several times).
Right now I have some simple debug printing turned on. Prints one . each time the ISR is entered, a ! if 2 before last one and $ for last one of frame...
So some debug output in the working case:
So if I try now to run it with a new ST7735...
b) Suspected issue - The DMASetting and DMAChannel structures are part of this display object:
So I am guessing that some of these settings are getting out of sync between the actual memory and the cache, but not sure which way. Should I flush or delete the cache at these points? Or ???
Now back to pulling a few more hairs (and I don't have many left)
Probably the most recent and verbose was at: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=211230&viewfull=1#post211230
Put simply: If you are trying to use DMA to either send data to a device or receive data from a device, and the memory is from the upper memory area, there are some interesting things to think about and figure out what to do. You will get memory from these locations if you either declare your variables as DMAMEM or by using malloc or new.
The issue is that DMA operations work directly with the actual physical memory, where simply reading and writing variables in this range go through a cache and the actual updated values from setting a value may or may not have actually made it to the physical memory and likewise any updates by DMA to actual memory may not be seen by those things fetching from the cache...
There are system functions available to help out in some cases, which for example I put into the SPI library when we do non-blocking transfers.
For example if you are doing a transfer from memory to a device: I call arm_dcache_flush(buffer, count);
Which tells the system to flush the cache in that range and size back to physical memory.
Likewise if you are doing an SPI transfer from a device back into memory, we call arm_decache_delete(retbuf, count), which tells the system to delete it's cache such that later reads will retrieve the data back from the physical memory...
With some of our display driver code, this can work for single updates, where before we tell the system to update our display from frame buffer, we flush the memory. But breaks down if we try to do continuous updates. Could go into details, but... So to work around this I instead have two smaller buffers, as part of the display object, which I copy the data out of the frame buffer and then output from there... Which works OK, when the display object is a static object, as was by all of my test cases... UNTIL:
Uncanny Eyes, I am trying a version where I have one display on SPI and another on SPI1, and am trying to have the two displays update at same time using DMA and it is NOT working!
Then found that this program is doing a new operation for the displays. Which is working except for my DMA support!
To verify this, I made a version of one of my st77xx test programs that allow me to either use a static tft object or one created by using new, and sure enough the one done by new is failing...
View attachment st7735_t3_simpletest_FB-190814a.zip
If you build it as is, it will use static object... If you uncomment the first line #define...
It will do a new...
Hitting CR - will go through the different orientations of the screen>
hitting a<CR> - will update the screen using async (one time)
hitting c<CR> - Will do a continuous update for several frames.
These asynch ones will fail.
And I am now trying to fully understand and figure out how/if I can resolve. Some of the issues are (I think):
a) My smaller buffers are in high memory so again that cache issue. That part is not hard to work around:
Code:
if (_dma_sub_frame_count & 1) {
memcpy(_dma_buffer1, &_pfbtft[_dma_pixel_index], _dma_buffer_size*2);
if ((uint32_t)_dma_buffer1 >= 0x20200000u) arm_dcache_flush(_dma_buffer1, _dma_buffer_size*2);
} else {
memcpy(_dma_buffer2, &_pfbtft[_dma_pixel_index], _dma_buffer_size*2);
if ((uint32_t)_dma_buffer1 >= 0x20200000u) arm_dcache_flush(_dma_buffer2, _dma_buffer_size*2);
}
Note: the DMA ISR does get called a couple of times before it does not work (needs to get called several times).
Right now I have some simple debug printing turned on. Prints one . each time the ISR is entered, a ! if 2 before last one and $ for last one of frame...
So some debug output in the working case:
Code:
init CS:10 DC:9 MOSI:11 SCLK:13 RST:20
13:01:04.464 -> Row Start:3 Col Start: 2
13:01:04.464 -> Set Rotation: 0 width: 128 height: 128
13:01:04.504 -> Hit any key to continue
13:01:04.504 -> Set Rotation: 1 width: 128 height: 128
13:01:04.504 -> Hit any key to continue
13:01:09.397 -> DMA Init buf size: 512 sub frames:32 spi num: 0
13:01:09.397 -> 20001b20 400e9020:SA:20001b34 SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20001b00 CS:12 BI:200
13:01:09.397 -> 20001aa0 20001ac0:SA:20001b34 SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20001b00 CS:12 BI:200
13:01:09.397 -> 20001ae0 20001b00:SA:20001f34 SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20001ac0 CS:12 BI:200
13:01:09.397 -> After Async Update
13:01:09.397 -> ..............................!
13:01:09.437 -> ..*
13:01:09.437 -> $
13:01:09.437 -> Async completed 14
13:01:10.477 -> Set Rotation: 2 width: 128 height: 128
13:01:10.477 -> Hit any key to continue
13:01:10.837 -> Set Rotation: 3 width: 128 height: 128
13:01:10.877 -> Hit any key to continue
13:01:11.717 -> Start Continuous update test
13:01:11.717 -> 20001b20 400e9020:SA:20001b34 SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20001b00 CS:12 BI:200
13:01:11.717 -> 20001aa0 20001ac0:SA:20001b34 SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20001b00 CS:12 BI:200
13:01:11.717 -> 20001ae0 20001b00:SA:20001f34 SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20001ac0 CS:12 BI:200
13:01:11.717 -> After updateScreenAsync
13:01:11.717 -> ................................*
13:01:11.756 -> ................................*
13:01:11.756 -> ................................*
13:01:11.756 -> ................................*
13:01:11.797 -> ................................*
13:01:11.797 -> ................................*
13:01:11.797 -> ................................*
13:01:11.797 -> ................................*
13:01:11.876 -> ................................*
13:01:11.876 -> ................................*
13:01:11.876 -> ................................*
13:01:11.876 -> ................................*
13:01:11.917 -> ................................*
13:01:11.917 -> ................................*
13:01:11.917 -> ................................*
13:01:11.957 -> ................................*
13:01:11.957 -> ................................*
13:01:11.957 -> ................................*
13:01:11.997 -> ................................*
13:01:11.997 -> ................................*
13:01:12.036 -> ................................*
13:01:12.036 -> ................................*
13:01:12.036 -> ................................*
13:01:12.077 -> ................................*
13:01:12.077 -> ................................*
13:01:12.077 -> ................................*
13:01:12.077 -> ................................*
13:01:12.117 -> ................................*
13:01:12.117 -> ................................*
13:01:12.156 -> ................................*
13:01:12.156 -> ................................*
13:01:12.206 -> ................................*
13:01:12.206 -> ................................*
13:01:12.206 -> ................................*
13:01:12.206 -> ................................*
13:01:12.247 -> Finished all frames
13:01:12.247 -> After call to endUpdateAsync
13:01:12.247 -> ..............................!
13:01:12.247 -> ..*
13:01:12.247 -> $
13:01:12.247 -> Test completed
So if I try now to run it with a new ST7735...
Code:
13:32:46.209 -> init CS:10 DC:9 MOSI:11 SCLK:13 RST:20
13:32:46.209 -> Row Start:3 Col Start: 2
13:32:46.209 -> Set Rotation: 0 width: 128 height: 128
13:32:46.209 -> Hit any key to continue
13:32:58.775 -> Set Rotation: 1 width: 128 height: 128
13:32:58.815 -> Hit any key to continue
13:32:59.575 -> Set Rotation: 2 width: 128 height: 128
13:32:59.655 -> Hit any key to continue
13:33:01.656 -> DMA Init buf size: 512 sub frames:32 spi num: 0
13:33:01.656 -> 20200128 400e9020:SA:2020013c SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20200108 CS:f812 BI:200
13:33:01.656 -> 202000a8 202000c8:SA:2020013c SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:20200108 CS:f812 BI:200
13:33:01.656 -> 202000e8 20200108:SA:2020053c SO:2 AT:101 NB:2 SL:-1024 DA:403a0064 DO: 0 CI:200 DL:202000c8 CS:12 BI:200
13:33:01.656 -> After Async Update
13:33:01.656 -> .
b) Suspected issue - The DMASetting and DMAChannel structures are part of this display object:
Code:
#elif defined(__IMXRT1052__) || defined(__IMXRT1062__) // Teensy 4.x
// try work around DMA memory cached. So have a couple of buffers we copy frame buffer into
// as to move it out of the memory that is cached...
DMASetting _dmasettings[2];
DMAChannel _dmatx;
volatile uint32_t _dma_pixel_index = 0;
volatile uint16_t _dma_sub_frame_count = 0; // Can return a frame count...
uint16_t _dma_buffer_size; // the actual size we are using <= DMA_BUFFER_SIZE;
uint16_t _dma_cnt_sub_frames_per_frame;
static const uint16_t DMA_BUFFER_SIZE = 512;
uint16_t _dma_buffer1[DMA_BUFFER_SIZE] __attribute__ ((aligned(4)));
uint16_t _dma_buffer2[DMA_BUFFER_SIZE] __attribute__ ((aligned(4)));
uint32_t _spi_fcr_save; // save away previous FCR register value
Now back to pulling a few more hairs (and I don't have many left)