honey_the_codewitch
Well-known member
I'm on a Teensy 4.1 (MXRT1062)
I've been inspecting the code for *_t3 graphics libraries like SSD1351_t3, and ST7789_t3.
I noticed there is a complicated anti-caching scheme happening wherein multiple buffers are being juggled and memcpy'd, according to (i think?) Paul's comments this copying is happening to avoid caching.
I've adapted the code to do partial screen updates like LVGL usually expects, and therefore I can fit every transfer within 32KB with no trouble.
Here's the thing. I was running into issues using Paul's scheme because I couldn't get the ISR handler chaining to cut off the transfer at the appropriate number of bytes in all cases.
So I eliminated the anti-caching scheme altogether just to see if it would still work, and stressed it with a fire demo I made to blast it at the SPI rate limits. (70MHz for the ST7789 i have). Now the interrupt handler just fires the completion code, no continuation necessary, no memcpys, nothing like that.
There's plenty of code, so I'll link to the testbed I checked in so you folks can peruse it. Here's the relevant source file. You'll see it's largely the same as Paul's code except the DMA bits have been dramatically simplified.
https://github.com/codewitch-honey-...pi_driver_t4/src/source/lcd_spi_driver_t4.cpp
One major difference here is when i use the above code I am not using malloc to allocate the transfer buffers. They are static arrays. But everything seems to work. I'm not getting stale image data or anything.
Questions:
1. Do I not need the anti-caching scheme because those transfer buffers are maybe being created in the fast 128KB region instead of being allocated on the general heap? Is that what's likely happening above? If not, then why does it work?
2. Is it safe? Is this something intermittent that might not be showing up for me, but likely will depending on the circumstances? If so, is there a way I can encourage the problem to show itself?
3. Am I right that there's in essence a 32KB limit on the transfer size? If I implement chaining could that potentially introduce caching issues?
4. I've heard there's a way to disable caching on major NXP chips for memory set aside for DMA transfers. I'm not sure about the 1062 specifically, but it seems to me that would be a lot more efficient than doing memcpys between multiple buffers, and a lot less sketchy too. Is there a reason for the memcpy scheme that I'm missing?
I've included the ILI9341_t3 code under ./lib so you can see Paul's code for the chaining and anti-caching scheme I'm talking about.
I'd just like some details because scanning TRMs gives me a headache in my eye and I don't usually find what I'm looking for anyway. Some people have a gift for that sort of sifting. I'm comically bad at it.
I've been inspecting the code for *_t3 graphics libraries like SSD1351_t3, and ST7789_t3.
I noticed there is a complicated anti-caching scheme happening wherein multiple buffers are being juggled and memcpy'd, according to (i think?) Paul's comments this copying is happening to avoid caching.
I've adapted the code to do partial screen updates like LVGL usually expects, and therefore I can fit every transfer within 32KB with no trouble.
Here's the thing. I was running into issues using Paul's scheme because I couldn't get the ISR handler chaining to cut off the transfer at the appropriate number of bytes in all cases.
So I eliminated the anti-caching scheme altogether just to see if it would still work, and stressed it with a fire demo I made to blast it at the SPI rate limits. (70MHz for the ST7789 i have). Now the interrupt handler just fires the completion code, no continuation necessary, no memcpys, nothing like that.
There's plenty of code, so I'll link to the testbed I checked in so you folks can peruse it. Here's the relevant source file. You'll see it's largely the same as Paul's code except the DMA bits have been dramatically simplified.
https://github.com/codewitch-honey-...pi_driver_t4/src/source/lcd_spi_driver_t4.cpp
One major difference here is when i use the above code I am not using malloc to allocate the transfer buffers. They are static arrays. But everything seems to work. I'm not getting stale image data or anything.
Questions:
1. Do I not need the anti-caching scheme because those transfer buffers are maybe being created in the fast 128KB region instead of being allocated on the general heap? Is that what's likely happening above? If not, then why does it work?
2. Is it safe? Is this something intermittent that might not be showing up for me, but likely will depending on the circumstances? If so, is there a way I can encourage the problem to show itself?
3. Am I right that there's in essence a 32KB limit on the transfer size? If I implement chaining could that potentially introduce caching issues?
4. I've heard there's a way to disable caching on major NXP chips for memory set aside for DMA transfers. I'm not sure about the 1062 specifically, but it seems to me that would be a lot more efficient than doing memcpys between multiple buffers, and a lot less sketchy too. Is there a reason for the memcpy scheme that I'm missing?
I've included the ILI9341_t3 code under ./lib so you can see Paul's code for the chaining and anti-caching scheme I'm talking about.
I'd just like some details because scanning TRMs gives me a headache in my eye and I don't usually find what I'm looking for anyway. Some people have a gift for that sort of sifting. I'm comically bad at it.