Moderator Edit: linker script is fine, see this message for info.
I think (actually, I am sure) - the Teensy 4.1 Linker Script is NOT correct:
I extend my project, I shuffle code and data around (e.g. FASTRUN, DMAMEM, regular DTCM RAM...),
the code starts crashing, even it was working before, just place on a different memory, no changes on code line.
And it can crash immediately on startup of my code - very tough to recover from this situation!
Reason:
The Linker Script does not seem to be correct.
It can generate code and data access outside the available memories.
Details:
The linker script "imxrt1062_t41.ld" has this definition:
So, it means: you have actually 512K ITCM and 2*512K for data (DTCM + RAM) = 1.5 MB of internal RAM.
As I understand the NXP RT1062 datasheet:
it has total 1 MB (2x 512K) internal - not 1.5MB!
The RAM1 can be configured as "split" memory for ITCM and DTCM:
so, the 512 KB can be split into ITCM and DTCM, so both together in total 512K - not 1 MB!
This results in this issue:
I can generate up to 512K code (for ITCM) plus 512K data (for DTCM).
The Linker will not complain, because it was "told" to have 512K each.
BUT NOT TRUE!
The MCU has just 512 total for ITCM PLUS DTCM, not 1M. So, code or data or both become located outside a valid memory region in MCU.
This must crash (with a Bus Error or Hard Fault) - and it does.
Why 2*512K for ITCM and DTCM?
When I see the code for the MPU configuration, which is not really correct (but it is not the root cause), it configures also
512K for ITCM, 512K for DTCM, 512K for DMAMEM (RAM2).
Maybe, the Linker Script config was set in correlation with the MPU config (in effect), even the MPU config is not really correct
(it should have 256K ITCM, 256K DTCM regions, instead).
MPU Config and DMAMEM
I thought, "DMAMEM" means: this memory is intended for DMA operations, buffers, used by DMAs etc. It should be "coherent":
no need for cache maintenance, a DMA can use and update this memory without "coherency issues".
But the code I saw for PMU config tells me:
this region (RAM1), is configured as WBWA. I have assumes as "not cached" or WT.
This WBWA tells me:
you have to use cache maintenance operations, like Clean and Invalidate, before and after a DMA.
I have no clue if my SPI in DMA does it. It seems to work, so I guess, there is cache maintenance.
If you implement your own DMA, using DMAMEM - make sure to use Clean and Invalidate in relation with running a DMA.
Why is const data on DTCM?
It is a bit annoying, that all Read-Only, constant data, e.g. const data structures, const strings ... are all placed as well on DTCM RAM
(the memory for high speed data access).
It blows up my DTCM (default) memory for all data, even all const go there.
I though, const goes into a Read-Only memory.
OK: I found, this "feature" is documented (but still unclear why this way and a bit annoying to have it this way).
I do not see a reason why to have it this way. It reduces at the end the memory for my read-write data available.
And the ITCM memory region might have still enough free space to keep my const data there.
BTW: you can use ITCM (intended for code), via FASTRUN, also as data memory. It works!
Data can be located on ITCM, even write-able.
BTW2: It does NOT work to move *.rodata to ITCM (assuming, it cannot be/is not initialized during startup)
It works if you move *.rodata to FLASHMEM (or PROGMEM, the same).
Modification of linker script "imxrt1062_t41.ld":
Conclusion
Very strange to see this discrepancy between Linker Script and physical features of MCU (as: 1 MB total internal RAM available, not 1.5 MB).
If not realized and fixed yet - it "tells" me:
nobody has ever created a large project (with more as 512K code and 512K data), or nobody has tested such a large project.
Due to this "incorrect" linker script - you do not get any warning that your code or data size is too large.
Instead: you will just realize when the code crashes during runtime, worst case: randomly depending which code or data "outside of available memories"
is invoked.
Maybe people have been trapped into this issue, seeing their project is crashing, when they extend their project or reorganize the memory use/locations.
I think, the Linker Script should be fixed to this definition:
(see the 256K there)
The MPU config could be fixed as well, but it should be fine to leave it as it is
(as long as no code is generated which could access outside available memories - as it is possible right now!)
I think (actually, I am sure) - the Teensy 4.1 Linker Script is NOT correct:
I extend my project, I shuffle code and data around (e.g. FASTRUN, DMAMEM, regular DTCM RAM...),
the code starts crashing, even it was working before, just place on a different memory, no changes on code line.
And it can crash immediately on startup of my code - very tough to recover from this situation!
Reason:
The Linker Script does not seem to be correct.
It can generate code and data access outside the available memories.
Details:
The linker script "imxrt1062_t41.ld" has this definition:
Code:
MEMORY
{
ITCM (rwx): ORIGIN = 0x00000000, LENGTH = 512K
DTCM (rwx): ORIGIN = 0x20000000, LENGTH = 512K
RAM (rwx): ORIGIN = 0x20200000, LENGTH = 512K
FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 7936K
ERAM (rwx): ORIGIN = 0x70000000, LENGTH = 16384K
}
So, it means: you have actually 512K ITCM and 2*512K for data (DTCM + RAM) = 1.5 MB of internal RAM.
As I understand the NXP RT1062 datasheet:
it has total 1 MB (2x 512K) internal - not 1.5MB!
The RAM1 can be configured as "split" memory for ITCM and DTCM:
so, the 512 KB can be split into ITCM and DTCM, so both together in total 512K - not 1 MB!
This results in this issue:
I can generate up to 512K code (for ITCM) plus 512K data (for DTCM).
The Linker will not complain, because it was "told" to have 512K each.
BUT NOT TRUE!
The MCU has just 512 total for ITCM PLUS DTCM, not 1M. So, code or data or both become located outside a valid memory region in MCU.
This must crash (with a Bus Error or Hard Fault) - and it does.
Why 2*512K for ITCM and DTCM?
When I see the code for the MPU configuration, which is not really correct (but it is not the root cause), it configures also
512K for ITCM, 512K for DTCM, 512K for DMAMEM (RAM2).
Maybe, the Linker Script config was set in correlation with the MPU config (in effect), even the MPU config is not really correct
(it should have 256K ITCM, 256K DTCM regions, instead).
MPU Config and DMAMEM
I thought, "DMAMEM" means: this memory is intended for DMA operations, buffers, used by DMAs etc. It should be "coherent":
no need for cache maintenance, a DMA can use and update this memory without "coherency issues".
But the code I saw for PMU config tells me:
this region (RAM1), is configured as WBWA. I have assumes as "not cached" or WT.
This WBWA tells me:
you have to use cache maintenance operations, like Clean and Invalidate, before and after a DMA.
I have no clue if my SPI in DMA does it. It seems to work, so I guess, there is cache maintenance.
If you implement your own DMA, using DMAMEM - make sure to use Clean and Invalidate in relation with running a DMA.
Why is const data on DTCM?
It is a bit annoying, that all Read-Only, constant data, e.g. const data structures, const strings ... are all placed as well on DTCM RAM
(the memory for high speed data access).
It blows up my DTCM (default) memory for all data, even all const go there.
I though, const goes into a Read-Only memory.
OK: I found, this "feature" is documented (but still unclear why this way and a bit annoying to have it this way).
I do not see a reason why to have it this way. It reduces at the end the memory for my read-write data available.
And the ITCM memory region might have still enough free space to keep my const data there.
BTW: you can use ITCM (intended for code), via FASTRUN, also as data memory. It works!
Data can be located on ITCM, even write-able.
BTW2: It does NOT work to move *.rodata to ITCM (assuming, it cannot be/is not initialized during startup)
It works if you move *.rodata to FLASHMEM (or PROGMEM, the same).
Modification of linker script "imxrt1062_t41.ld":
Code:
.text.progmem : {
*(.progmem*)
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.rodata*))) /* this works */
. = ALIGN(4);
} > FLASH
.text.itcm : {
. = . + 32; /* MPU to trap NULL pointer deref */
*(.fastrun)
*(.text*)
/* *(SORT_BY_ALIGNMENT(SORT_BY_NAME(.rodata*))) /* - does not work! */
. = ALIGN(16);
} > ITCM AT> FLASH
.data : {
*(.endpoint_queue)
/* *(SORT_BY_ALIGNMENT(SORT_BY_NAME(.rodata*))) /* - don't have const on data memory */
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.data*)))
KEEP(*(.vectorsram))
} > DTCM AT> FLASH
Conclusion
Very strange to see this discrepancy between Linker Script and physical features of MCU (as: 1 MB total internal RAM available, not 1.5 MB).
If not realized and fixed yet - it "tells" me:
nobody has ever created a large project (with more as 512K code and 512K data), or nobody has tested such a large project.
Due to this "incorrect" linker script - you do not get any warning that your code or data size is too large.
Instead: you will just realize when the code crashes during runtime, worst case: randomly depending which code or data "outside of available memories"
is invoked.
Maybe people have been trapped into this issue, seeing their project is crashing, when they extend their project or reorganize the memory use/locations.
I think, the Linker Script should be fixed to this definition:
Code:
MEMORY
{
ITCM (rwx): ORIGIN = 0x00000000, LENGTH = 256K
DTCM (rwx): ORIGIN = 0x20000000, LENGTH = 256K
RAM (rwx): ORIGIN = 0x20200000, LENGTH = 512K
FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 7936K
ERAM (rwx): ORIGIN = 0x70000000, LENGTH = 16384K
}
The MPU config could be fixed as well, but it should be fine to leave it as it is
(as long as no code is generated which could access outside available memories - as it is possible right now!)