The best I've found is this:
https://www.pjrc.com/store/teensy40.html
These symbol section decorators work much like PROGMEM does on old-school AVR Arduinos, so you may be able to get a description of how to actually use these in code by looking for tutorials on that. However, you have to update PROGMEM to the teensy-specific section you're interested in.
One bit of information that might help is this:
Every global/static variable, and every non-inline function, has "storage" (takes up bytes in memory,) and has a "name" (it's a "symbol".)
The job of a linker is to lay the storage for all symbols into appropriate memory, and bind all the names to the address of the symbol represented by the name.
In a plain-jane computer system (Linux, MacOS, Windows, etc,) there are usually three, maybe four, separate "sections" into which data/code can be laid out:
TEXT -- this is traditionally the name for "code that's executed"
DATA -- this is where your globals/statics with initialized values live -- anything that doesn't have the value 0/null when the program starts
BSS -- this is where globals/statics that have the value 0/null end up
(possibly a CONSTDATA section, where un-writable constants like string literals may live -- these can also live in TEXT, or maybe DATA, depending on specifics of the computer OS / runtime model.)
The loader (that actually loads and runs the program file that the linker generated) will copy the bytes from the TEXT section into the "where code starts" address in memory, and copy the bytes from the DATA section into the "where global variables start" address in memory, and then reserve enough space in memory for the BSS section.
BSS is an optimization, in that the loader doesn't need to copy any data into that section, it can just nuke it all to zero and call it good. Thus, BSS also doesn't take any storage in the program file itself.
Dynamic link libraries, and memory mapped files, of course throw some wrenches into these works, which you can go on a Wikipedia journey to discover if you want to have a good time, but that's not important for this discussion.
Then, the program starts running.
The model for the Teensy 4 isn't that different, except it has a few more sections. There's no "file on disk," but there is "flash memory." And the "flash memory" is directly addressable by the CPU. The "copy bytes into memory" has to be done to initialize RAM that can be written, and it has to be done to initialize RAM that it's fast to run code from, but it doesn't have to be done for constant data and code that can live in flash memory.
So, looking at the Teensy 4 reference page above:
The DATA segment gets copied into RAM1 (DTCM.) -- these are regular initialized globals
The BSS segment goes into RAM1 (DTCM.) -- these are regular zero globals
FASTRUN code gets copied into RAM1 (ITCM.) -- this is the default for functions/code
Because of memory controller reasons (look up Harvard Architecture,) the code and data can't share a 32 kB page, so the FASTRUN code size gets rounded up to 32 kB.
FLASHMEM code, and PROGMEM variables, stay in flash. (PROGMEM is the same as for Arduino, except you can read it directly without the special functions that the Arduino needs.)
The flash memory also needs to store copies of FASTRUN and DATA, but those copies are not referenced once the program has actually started running.
(A common optimization is to compress the text and data sections, so they take up less flash memory -- I don't know if Teensy has made arrangements to do this or not, as it requires some special modifications to the tool chain.)
RAM1 also contains the runtime stack, which starts at the top and grows downwards. Because you will have interrupts and function local variables, and maybe even use recursion (horror!) you need to make sure that you DON'T FILL UP RAM1! If it says you're 99% full, you have significant risk that your stack will overwrite the globals that go into BSS.
RAM2 contains variables marked DMAMEM. This is another kind of BSS data. Any variable marked DMAMEM will NOT be initialized. RAM2 also contains the heap that you get access to when you malloc()/new variables. Personally, I'm not a fan of using malloc/new in embedded applications, but you could do things like declare global variables that are pointers or references, initialized to the output of a call of malloc() or new. Running DMA from RAM2 probably has special implications related to the cache controller -- specifically, I would expect cache coherency to be stronger on the RAM2 bank, and that may or may not cause performance for tight accesses to that memory to be different. Some enterprising soul could benchmark this and report back!
Additionally, FLASHMEM code will not be writable, whereas FASTRUN code (unmarked functions) will be modifiable, perhaps by a stray pointer. Thus, you will have more robust code if you place it all into FLASHMEM. Note that self-modifying code will not "just work" in RAM1 -- you need to also arrange to flush or evict the instruction cache for that memory area, if you generate code at runtime. If you don't care about self-modifying code, no problem!
Finally, if you need DMAMEM for large mutable tables, but you want to initialize them, you can always declare the initialization data in PROGMEM and then memcpy() into the buffer in your setup() function.
So, with all that, you should have the information you need to figure out where to place each piece of code. FLASHMEM code can run with good performance, if it contains small loops, because it will be cached, but the initial access to that code (the first iteration of the loop) will run slower. Large, bulky, serial, code, will run less fast from FLASHMEM. If it's not performance critical code, that may not matter.
Here are some very simple examples:
Code:
// RAM1, modifiable, initialized
char myWritableString[] = "This is a writable string (RAM1)";
// RAM1, modifiable, initialized
int mySmallBuffer[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
// RAM1, modifiable, BSS
char someZeroBuffer[20];
// flash, not modifiable, initialized
PROGMEM char myConstantString[] = "This is a constant string (flash)";
// RAM2, modifiable, not initialized
DMAMEM int myBigBuffer[10240] = {}; // this will always be zero
// RAM1, "modifiable," initialized
void fastrunFunction(int from) {
for (int i = from; i != 100; ++i) {
mySmallBuffer[i & 7] += 1;
if (i & 1) {
fastrunFunction(i+1);
}
}
}
// flash, not modifiable, initialized
FLASHMEM void flashmemFunction(int from) {
for (int i = from; i != 100; ++i) {
myBigBuffer[i] += 1;
if (i & 1) {
flashmemFunction(i+1);
}
}
}
void setup() {
pinMode(13, OUTPUT);
}
void loop() {
digitalWrite(13, HIGH);
digitalWrite(13, LOW);
fastrunFunction(90);
flashmemFunction(90);
}
@PaulStroffegen I'm not possessive about this explanation; if you want to use it in documentation or copy/modify sections of it, please go ahead!