KurtE
Senior Member+
While trying to debug some issues, where a program would cause USB to not work in cases, but did in others, I decided to try to understand more on the memory organization of new Teensy T4.0.
I put a lot of this up on the Teensy 4 First beta thread, maybe starting with the post: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=213113&viewfull=1#post213113
Also during this time frame, it was clear we probably need to enhance some of our data reporting at the end of the builds, as for example I did a sample application, which has the binary data for three full size bitmaps to display on an ILI9341 display, which it by default tries to put all of it in the ITCM segment. The linker said I only used under 50% of memory, but in actuality all of it was trying to fit into the DTCM segment and failed to load.
It was suggested that others might find some of this useful, especially if they did not have to dig through something like 170 pages of a forum thread. Hopefully over time we can cleanup these descriptions and maybe transfer some of this to a more appropriate spot like a Wiki or... Until then, those of moderator privileges, feel free to cleanup some of this... Including probably removing some of this intro:
During the T4 beta, I kept seeing and reading about different terms like FlexRam, ITCM, DTCM, OCRAM, ...? Sometimes I hate 3 or 4 letter acronyms! What do all of these mean, and what impact does each of these mean to me:
For example the product page says:
So again what does this mean? I know I can search the web and the like and I have...
First: 1024KB of RAM with 512KB tightly coupled: The memory of the T4 is divided into two main pieces, both of which are 512KB in size.
FlexRam:
The first part which in the documents is called the FlexRam, which has 16 banks of 32KB of memory. Each of these banks can be configured to be of the type: ITCM, DTCM, OCRAM, or not used. In our setup we only use types ITCM (Instruction Tightly Coupled Memory) or DTCM(Data Tightly Coupled Memory). I will describe each of these in more details. But basically the build process will take some or all of your code and allocate enough 32k blocks of memory to hold this, and the startup code will copy that code into this block. The remaining blocks of the FlexRam will be marked to be DTCM. So for example if you have over 32Kb and less than 64KB of code, it will take 2 banks for ITCM and leave you with 14 banks for Data.
OCRAM: On Chip RAM
This is second 512KB of memory.
1924KB Flash memory - This is where your program is stored (maybe) - As I mentioned above about ITCM - A lot of your code may be moved into ITCM.
Now to describe some of these sections in some more details. As part of this, during the early part of the T4 beta, I believe it was @FrankB who first developed a tool that you could build into the build process (updated Platform.txt), that would give us more information than the standard build did. I know others have also done parts of this as well like @mjs513, @defragster, ... Over the last couple of days I decided to try to update it from the T4B1 (IMXRT 1052) to the current T4 (IMXRT 1062). It is still a Work In Progress, as for my own self I want to cleanup some of the output, plus put in some error checking which for example if you run out of room in DTCM, the tool should report this and return an error status.
I am not sure yet, if I should post the code again, or put into github or if someone already has a github project... But here is an example output, for a real simple sketch which can blink any pin...
Now suppose I add one simple line like add: Serial.println("Defaults to pin 13");
These numbers change:
So again what does this all imply? What are the differences between these sections and how do I control where things are placed?
Note: A lot of this is very different than it was on T3.x
As mentioned earlier ITCM+DTCM=512kb, which is controlled by the flexRam config register...
ARM Memory Ranges:
If you look at the addresses of a variable or function pointer, you will see that the addresses of these items gives you a hint on what type of memory it is. That is
### Code ###: What goes into ITCM versus what goes into Flash?
I believe the simple answer is, by default all code will try to be placed into the ITCM section. As you can see, just adding the Serial.println increased this sections size.
I believe the way to leave the code in the flash memory is by using the keyword: FLASHMEM
Yes The T4 is different than The T3.x and TLC in that FLASHMEM means something again. So for example if I change my sketch setup function to be defined like:
The size of ITCM shrank
Not by much, but than again there is not much code there.
### DATA ### What goes into DTCM, versus OCRAM, versus stays in Flash?
DTCM - By default I believe just about everything goes here? This include all of your global variables, both initialized and uninitialized variables.
Unlike the T3.x, variables such as arrays that are defined as const, will not stay in Flash, but instead will be copied at startup time into DTCM. So some programs that for example work on T3.6 may run into issues of running out of RAM.
FLASH - as I mentioned under DTCM, const data is by default not left in Flash, but instead moved into DTCM. You can tell the system to leave some specific const structures in flash, by using the PROGMEM keyword. Like:
Also I am not sure, but example earlier might imply that strings you pass to things like Serial.println, may also be left in Flash?
OCRAM - Again the other 512KB of memory...
So far I have found only two ways to use this memory. You can define a variable with the attribute: DMAMEM
Or you use malloc/new to allocate the memory.
So far I have not found any way to have a program put any initialized structures up in this region of memory.
The memory in the OCRAM section is defined as being cached WBWA – which can really screw up DMA. That is DMA operations will talk to the underlying memory whereas normal instructions will talk through the cache, which may or may not match. ….
Bad enough using it for DMA buffers, not sure how to get it to work with things like DMASettings. At lest I did not get them to work at all, especially when it involves replaceOnCompletion semanatics.
Will add more soon, but tired of typing!
Again those with moderator access, feel free to correct, add, ...
Also let me know if there is some additional things I should add/remove/modify.
Thanks
Kurt
Update: Paul has added more details about the T4 memory layout on the T4 Product page:
Update: Update PROGMEM for code is now FLASHMEM as having PROGMEM for both code and data causes link issues.
I put a lot of this up on the Teensy 4 First beta thread, maybe starting with the post: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=213113&viewfull=1#post213113
Also during this time frame, it was clear we probably need to enhance some of our data reporting at the end of the builds, as for example I did a sample application, which has the binary data for three full size bitmaps to display on an ILI9341 display, which it by default tries to put all of it in the ITCM segment. The linker said I only used under 50% of memory, but in actuality all of it was trying to fit into the DTCM segment and failed to load.
It was suggested that others might find some of this useful, especially if they did not have to dig through something like 170 pages of a forum thread. Hopefully over time we can cleanup these descriptions and maybe transfer some of this to a more appropriate spot like a Wiki or... Until then, those of moderator privileges, feel free to cleanup some of this... Including probably removing some of this intro:
During the T4 beta, I kept seeing and reading about different terms like FlexRam, ITCM, DTCM, OCRAM, ...? Sometimes I hate 3 or 4 letter acronyms! What do all of these mean, and what impact does each of these mean to me:
For example the product page says:
1024K RAM (512K is tightly coupled)
2048K Flash (64K reserved for recovery & EEPROM emulation)
So again what does this mean? I know I can search the web and the like and I have...
First: 1024KB of RAM with 512KB tightly coupled: The memory of the T4 is divided into two main pieces, both of which are 512KB in size.
FlexRam:
The first part which in the documents is called the FlexRam, which has 16 banks of 32KB of memory. Each of these banks can be configured to be of the type: ITCM, DTCM, OCRAM, or not used. In our setup we only use types ITCM (Instruction Tightly Coupled Memory) or DTCM(Data Tightly Coupled Memory). I will describe each of these in more details. But basically the build process will take some or all of your code and allocate enough 32k blocks of memory to hold this, and the startup code will copy that code into this block. The remaining blocks of the FlexRam will be marked to be DTCM. So for example if you have over 32Kb and less than 64KB of code, it will take 2 banks for ITCM and leave you with 14 banks for Data.
OCRAM: On Chip RAM
This is second 512KB of memory.
1924KB Flash memory - This is where your program is stored (maybe) - As I mentioned above about ITCM - A lot of your code may be moved into ITCM.
Now to describe some of these sections in some more details. As part of this, during the early part of the T4 beta, I believe it was @FrankB who first developed a tool that you could build into the build process (updated Platform.txt), that would give us more information than the standard build did. I know others have also done parts of this as well like @mjs513, @defragster, ... Over the last couple of days I decided to try to update it from the T4B1 (IMXRT 1052) to the current T4 (IMXRT 1062). It is still a Work In Progress, as for my own self I want to cleanup some of the output, plus put in some error checking which for example if you run out of room in DTCM, the tool should report this and return an error status.
I am not sure yet, if I should post the code again, or put into github or if someone already has a github project... But here is an example output, for a real simple sketch which can blink any pin...
Code:
cmd /c "D:\\arduino-1.8.9\\hardware\\teensy\\..\\tools\\arm\\bin\\arm-none-eabi-gcc-nm -n C:\\Users\\kurte\\AppData\\Local\\Temp\\arduino_build_683341\\Blink_any_pin.ino.elf | C:\\Users\\kurte\\source\\repos\\imxrt-size\\Debug\\imxrt-size.exe"
flexRam Config : aaaaaaab
ITCM : 22816 B (69.63% of 32 KB)
DTCM : 12992 B ( 2.64% of 480 KB)
Stack Size : 478528
OCRAM: 0 B ( 0.00% of 512 KB)
Flash: 32672 B ( 1.61% of 1984 KB)
"D:\\arduino-1.8.9\\hardware\\teensy/../tools/arm/bin/arm-none-eabi-size" -A "C:\\Users\\kurte\\AppData\\Local\\Temp\\arduino_build_683341/Blink_any_pin.ino.elf"
Sketch uses 32672 bytes (1%) of program storage space. Maximum is 2031616 bytes.
Global variables use 35808 bytes (3%) of dynamic memory, leaving 1012768 bytes for local variables. Maximum is 1048576 bytes.
Now suppose I add one simple line like add: Serial.println("Defaults to pin 13");
These numbers change:
Code:
flexRam Config : aaaaaaab
ITCM : 22880 B (69.82% of 32 KB)
DTCM : 12992 B ( 2.64% of 480 KB)
Stack Size : 478528
OCRAM: 0 B ( 0.00% of 512 KB)
Flash: 32752 B ( 1.61% of 1984 KB)
So again what does this all imply? What are the differences between these sections and how do I control where things are placed?
Code:
As mentioned earlier ITCM+DTCM=512kb, which is controlled by the flexRam config register...
ARM Memory Ranges:
If you look at the addresses of a variable or function pointer, you will see that the addresses of these items gives you a hint on what type of memory it is. That is
Code:
Chapter 2: Shows the Arm Platform Memory map: Things like:
0-0x7FFFF - ITCM (512KB)
0x20000000 - 0x2007FFFF - DTCM (512KB)
0x20200000 - 0x2027FFFF - OCRMA2 (512KB)
0x60000000 - 0x6FFFFFFF - FLEXSPI ...
### Code ###: What goes into ITCM versus what goes into Flash?
I believe the simple answer is, by default all code will try to be placed into the ITCM section. As you can see, just adding the Serial.println increased this sections size.
I believe the way to leave the code in the flash memory is by using the keyword: FLASHMEM
Yes The T4 is different than The T3.x and TLC in that FLASHMEM means something again. So for example if I change my sketch setup function to be defined like:
Code:
void FLASHMEM setup() {
// Blink any pin. Note: I put pin 13 as input to see if we can
// jumper to it to see if we can find the pin...
while (!Serial && millis() < 5000);
DBGSerial.begin(115200);
delay (250);
DBGSerial.println("Find Pin by blinking");
DBGSerial.println("Enter pin number to blink");
DBGSerial.println("Defaults to pin 13");
pinMode(13, OUTPUT);
}
Code:
flexRam Config : aaaaaaab
ITCM : 22752 B (69.43% of 32 KB)
DTCM : 12992 B ( 2.64% of 480 KB)
Stack Size : 478528
OCRAM: 0 B ( 0.00% of 512 KB)
Flash: 32768 B ( 1.61% of 1984 KB)
### DATA ### What goes into DTCM, versus OCRAM, versus stays in Flash?
DTCM - By default I believe just about everything goes here? This include all of your global variables, both initialized and uninitialized variables.
Unlike the T3.x, variables such as arrays that are defined as const, will not stay in Flash, but instead will be copied at startup time into DTCM. So some programs that for example work on T3.6 may run into issues of running out of RAM.
FLASH - as I mentioned under DTCM, const data is by default not left in Flash, but instead moved into DTCM. You can tell the system to leave some specific const structures in flash, by using the PROGMEM keyword. Like:
Code:
const unsigned short teensy40_front[76800] PROGMEM={...};
Also I am not sure, but example earlier might imply that strings you pass to things like Serial.println, may also be left in Flash?
OCRAM - Again the other 512KB of memory...
So far I have found only two ways to use this memory. You can define a variable with the attribute: DMAMEM
Or you use malloc/new to allocate the memory.
So far I have not found any way to have a program put any initialized structures up in this region of memory.
The memory in the OCRAM section is defined as being cached WBWA – which can really screw up DMA. That is DMA operations will talk to the underlying memory whereas normal instructions will talk through the cache, which may or may not match. ….
Bad enough using it for DMA buffers, not sure how to get it to work with things like DMASettings. At lest I did not get them to work at all, especially when it involves replaceOnCompletion semanatics.
Will add more soon, but tired of typing!
Again those with moderator access, feel free to correct, add, ...
Also let me know if there is some additional things I should add/remove/modify.
Thanks
Kurt
Update: Paul has added more details about the T4 memory layout on the T4 Product page:
I've added a "Memory Layout" section to the Teensy 4.0 product page.
Update: Update PROGMEM for code is now FLASHMEM as having PROGMEM for both code and data causes link issues.
Last edited: