T4.0 Memory - trying to make sense of the different regions

Actually you may want to break this out as it's own thread, as there are probably only a few people who have played with this type of loader scripts and the like

Hi KurtE, Thanks for the advice. I will raise the question in Tech Support and Assistance
Thanks Andy
 
Hello - I realized I'm coming in a bit late here, but I'm confused around the issue of whether the Teensy 4.0 OCRAM/RAM2 is uncached and therefore DMA safe. @KurtE my understanding is that you are maintaining RAM2 is cached WBWA, however according to the currently available Teensy 4.0 documentation, RAM2 is uncached and actually optimized for DMA access?

 
the issue of whether the Teensy 4.0 OCRAM/RAM2 is uncached and therefore DMA safe

RAM2 is cached. For DMA, you need to use use the cache flush / delete functions.


however according to the currently available Teensy 4.0 documentation, RAM2 is uncached and actually optimized for DMA access?

That's incorrect. Can you point to the specific place you saw this wrong info? It really should get updated.
 
His link points to this part of PJRC T4.0 webpage

RAM 1024K of memory is available for variables and data. Half ofthis memory (RAM1) is accessed as tightly coupled memory for maximumperformance. The other half (RAM2) is optimized for access by DMA.Normally large arrays & data buffers are placed in RAM2, to savethe ultra-fast RAM1 for normal variables.

Further down are these items. I suppose it could be more explicit that RAM1 is TCM and is not cached, and RAM2 is not TCM and is cached.

Tightly Coupled Memory Tightly Coupled Memory is a special feature which allows Cortex-M7 fast single cycleaccess to memory using a pair of 64 bit wide buses. The ITCM bus provides a 64 bit path tofetch instructions. The DTCM bus is actually a pair of 32 bit paths, allowing M7 toperform up to 2 separate memory accesses in the same cycle. These extremely high speedbuses are separate from M7's main AXI bus, which accesses other memory and peripherals

.Cache Two 32K caches, one for instructions and one for data, are used to speedup repetitive access to non-TCM memory.
 
Ah, ok, now I see the problem. I need to expand the description because "optimized for access by DMA" leaves a lot of opportunity for misunderstand.

RAM2 is good for DMA because most of the CPU's high bandwidth usage is to RAM1. RAM2 is also good for DMA because it *is* cached. Use of CPU cache further reduces the amount of access to the RAM2 memory, which leaves the bus (usually) more open for DMA access. When your DMA uses the bus, because it's completely separate bus bandwidth from RAM1, your DMA (typically) has less impact on the speed of whatever your program is doing at that moment. The words "optimized for access" are meant to say the DMA controller (typically) has much more access to RAM2 without bus arbitration.

DMA access to RAM1 is possible. But the DMA controller is on the same bus as RAM2. So DMA access to RAM1 consumes bandwith of both the AXI and TCM buses!

Here the word "optimized" is not meant to say you write fewer lines of code. Using cached RAM2 requires more code because you have to call the cache flush / delete functions. The word "optimized" means when you do this extra work, your DMA transfer (typically) get more arbitration-free bus access to the memory with least impact on your program's (ordinary) use of memory.
 
Great notes Paul. Bus design explains why RAM2 is DMAMEM, even though the cache 'is bypassed/'gets in the way' and needs to be accounted for.

Good note as it is a recurring element of many DMA posts
Second/third this - and appreciate everyone jumping in here with these details
 
Back
Top