using PSRAM on Teensy 4.1

mig-29

New member
I am using a Teensy 4.1 with extended the RAM memory to 16 MB.

I was able to allocate the global variables in the added RAM memory using the EXTMEM instruction.

Can you tell me how can one allocate functions as well in the external RAM memory?
Would this be possible by using MACROS?

Best regards,
MIG-29
 
There are 2 ways you might do this. At least 2 that I can imagine. Maybe other people will have more ideas?

1: You could create (somehow....) generic ARM Thumb executable "position independent" code, copy the binary data to an array allocated in EXTRAM, and then call it using a function pointer which is initialized to the address of that array (plus 1 for the Thumb mode bit). This might be useful for tiny pieces of code, but probably not so practical for general programming usage. Also requires figuring out horrible C function pointer syntax. For an example, check out do_flash_cmd in eeprom.c for Teensy 3.x. Scroll down that page to find the do_flash_cmd source code in comments, to make some sense of the do_flash_cmd array initialization.

2: You could edit imxrt1062_t41.ld to add a section for code to be compiled to run from the external RAM. It would probably look similar to the ".text.itcm" section, which creates code that runs from the ITCM RAM. The "> ITCM AT> FLASH" would become "> ERAM AT> FLASH". The "AT>" part for flash is important, because it tells the linker to compile the code for external RAM, but then to actually put the data into Flash memory. Then you would edit startup.c. Look for "// Initialize memory". That's where you would want copy the binary data the linker put into Flash to the external RAM chips. But you can't. You'll need to do it later, after the call to configure_external_ram(), because the external RAM doesn't just automatically work like the on-chip RAM, its hardware has to be set up first. Maybe also #define a name in one of the header files, so you don't need to write a lengthy __attribute__((section(".mysectionname"))) in front of every function you want the linker to compile for external RAM. With this approach the compiler knows the place in memory where the code will run and calling the functions from other code should be seamless, but you do have to figure out the tough linker script stuff.

Either way, you'll also need to edit this in configure_cache()

Code:
        SCB_MPU_RBAR = 0x70000000 | REGION(i++); // FlexSPI2
        SCB_MPU_RASR = MEM_CACHE_WBWA | READWRITE | NOEXEC | SIZE_16M;

Delete NOEXEC. By default we configure the MPU to disallow code execution from memory meant to hold data, as a sort of proactive security measure.
 
Last edited:
The question is: Why?
The code is in the Flash anyway. You have to copy it from the flash to the PSRAM before it can execute.
But there is no point executing it from there...it is not faster than Flash. And it uses the same cache. They share it.There is no extra cache for executing from PSRAM.
 
It is one of my colleagues that it is working on this code.
As far as I understood when he tries to compile and upload the memory is exceeded- being reported 136%.

I only added the extra PSRAM (16 MB) in an effort to have sufficient memory. However, it seems that he managed to use the external PSRAM only for variables and the functions are the ones that are creating the issues with the memory overload of 136%
 
Perhaps the size tool just reports a wrong size(Does he use TD 1.54? There were bugs before)
And if not, its easier to execute it from flash directly. I once made a *.ld which executes all code from flash, ITCM=0. It is not slower if the "hot" code is < 32 kB

Maybe your colleague should ask here... its difficult with poeple in between.
 
It is one of my colleagues that it is working on this code.
...
and the functions are the ones that are creating the issues with the memory overload of 136%

Maybe your colleague should post here with specific details, like at the very least a screenshot or exact copy of the actual error, or better the actual code causing the problem.

Perhaps they really have written a huge amount of code. But using over 1 of the 8 megabytes of space is pretty rare. Odds are strong the problem is something else. Asking a specific tech question based on a (likely) misunderstanding from a 2nd hand source is not a good way to solve a problem!

But if I were to blind guess, the very common problems involve needing PROGMEM on large const data arrays or FLASHMEM on large functions.
 
Hi,

This is what I asked him and he will add a reply- most likely tomorrow when he is back in office.

Best regards!
 
Hi,

I relay a message from my colleague. He will join the discussion thread as soon as possible.
The attachments he indicated are attached to this message; I had to change the extension of "imxrt1062_t41.ld" in order for the forum to allow attaching this file.

I have modified the two indicated files ("imxrt1062_t41.ld" and "startup.c") as you suggested. You can find attached the two modified files. Each indicated modification in this email has as reference the Line form file when the modification was applied.

The modification done in "imxrt1062_t41.ld" are:
- a new SECTION called ".text.extram" is added (Line 42 - Line 47)
NOTE: a similar section to ".text.itcm" is added by with "> ERAM AT> FLASH"
- new sections added: _sExttext, _eExttext, _sExttextload (Line 85 - Line 87)
NOTE: similar sections to _stext, _etext and _stextload are added and they use as address ".text.extram"
- modified section: _flashimagelen (Line 103)
NOTE: SIZEOF(.text.extram) is now included when computing _flashimagelen

The modification done in "startup.c" are:
- add _sExttextload, _sExttext and _eExttext (as extern unsigned long) similar to _stextload, _stext and _etext (Line 13 - Line 15)
- initialize external memory after the call of configure_external_ram():
a) memory_copy(&_sExttext, &_sExttextload, &_eExttext); (Line 138)
b) memory_copy(&_sdata, &_sdataload, &_edata); (Line 139)
- Remove NOEXEC flag for external memory at cache config
NOTE: SCB_MPU_RASR = MEM_CACHE_WBWA | READWRITE | SIZE_16M; (Line 272)

Could you please review the modification and let me know your feedback.
Additionally, could you kindly please confirm if the following code of function "int testFcn(int a)" is specifing the linker to compile it in external RAM?

int testFcn (int a)

__attribute__((section (".text.extram")));

int testFcn (int a)

{
int ErrCode;

if(a%2==0)
{
ErrCode = 0;
}
else
{
ErrCode = 1;
}

return ErrCode;
}



Thank you!



View attachment startup.c

View attachment imxrt1062_t41.ld.txt
 
You can just print the adress do testFcn to check where it is. It should be somewhere in the 0x7..... area
 
Hi,

My name is Andrei. I have followed your suggestions but unfortunately I am not able to upload the compiled code on the board. I am using a Windows 10 based machine for testing. When I use the original *.ld and startup.c files, the COM is immediately recognized in Device Manager and Arduino IDE recognize the Port.
After I substitute those files (*.ld and startup.c) with the one uploaded above, the COM is not available. I have tried to Program and Reboot the board as well as my PC without success. Thus, I cannot print the address of the function.
If I revert the changes (and re-upload the original *.ld and startup.c files) the COM is recognized and I can use the default configuration. As expected, with the default configuration value of the function address (for one test) was 0x7d (ITCM).
Do you think that the changes that I have made in the configuration files should affect the COM or this behavior needs to be investigated?


Thank you!

The main code of the sketch that I am using is attached bellow:

void setup() {
Serial.begin(9600);
}

void loop() {
char stringArray[100];

sprintf(stringArray,"function address= %p\n", testFunction);
Serial.println(stringArray);
delay(1500);
}

int testFcn (int a)

__attribute__((section (".text.extram")));

int testFunction(int a){
int ErrCode;

if(a%2==0)
ErrCode = 1;
else
ErrCode = 0;

return ErrCode;
}
 
First, why do you want to do that? It might be that there is a misunderstanding somewhere..
It has no advantage to run a program from PSRAM. On the contrary, it is slower than the normal RAM - but exactly as fast as flash. So, there is no win - flash is not worse.
 
The code that needs to be integrated requires almost 1.8 MB of RAM. It has been already tested for a different CPU architecture on an emulator which supports an RTOS. When configuring a task which integrates the code, it requires at least 1.8 MB of RAM allocation (only for this task). The usage of an RTOS is not a requirement, that part was done for fast testing (especially for validation purposes). Due to physical dimensions constraints, Teensy 4.1 is the best solution for testing campaign (such as PIL, HIL etc.). Another constrain is that the original code which needs to be executed on Teensy (code that needs to be integrated in "Loop") cannot suffer major modifications (this is a strict requirement - for example I can exclude local variables from a function and define them as global but I am not allowed to add any other type of modification to the code; the only interaction with the code that I am supposed to do is at upper level in "Loop").
The first update done was to exclude all local variables (of functions) and define them as global variables in PSRAM. Unfortunately this is not enough. I need to move also the functions or at least part of them outside of ITCM. Thank you very much for your suggestions. I will try this approach with FLASH and I will update you with respect to the testing progress.
 
Do you think that the changes that I have made in the configuration files should affect the COM

Yes. Almost certainly almost any errors in the linker script and startup code will manifest as a crash very early, before USB enumeration can even begin. You won't get a COM port detected, and the PC probably won't even detect that *anything* connected to the USB port. Errors like device failed to start with "code 10" or "code 43" usually means Teensy crashed much later, after Windows knows a new USB device appeared, but before it was able to read enough USB descriptors to detect what device.

Changing the linker script is pretty advanced and difficult programming. Probably best to not mess with it.


The first update done was to exclude all local variables (of functions) and define them as global variables in PSRAM. Unfortunately this is not enough. I need to move also the functions or at least part of them outside of ITCM.

Yup. Find all large functions and add FLASHMEM, so they don't consume ITCM.
 
Hi Frank Boesing! Could you kindly please share again the "experimental_imxrt1062_t41.ld" file by providing another link? The listed one is not valid anymore.
 
I found that file in my old PC. I will attach the code here.

Code:
MEMORY
{
	ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
	DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
	RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
	FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 7936K
	ERAM (rwx):  ORIGIN = 0x70000000, LENGTH = 16384K
}

ENTRY(ImageVectorTable)

SECTIONS
{
	.text.headers : {
		KEEP(*(.flashconfig))
		FILL(0xFF)
		. = ORIGIN(FLASH) + 0x1000;
		KEEP(*(.ivt))
		KEEP(*(.bootdata))
		. = ALIGN(1024);
	} > FLASH

	.text.code : {
		KEEP(*(.startup))
        *(.text*)
		*(.flashmem*)
        *(.progmem*)
		. = ALIGN(4);
		KEEP(*(.init))
		__preinit_array_start = .;
		KEEP (*(.preinit_array))
		__preinit_array_end = .;
		__init_array_start = .;
		KEEP (*(.init_array))
		__init_array_end = .;
		. = ALIGN(4);
	} > FLASH

	.text.itcm ALIGN(32,0) : {
        . = 0x0000;
		*(.fastrun)
		. = ALIGN(16);
	} AT > FLASH
    

	.ARM.exidx : {
		__exidx_start = .;
		*(.ARM.exidx* .gnu.linkonce.armexidx.*)
		__exidx_end = .;
	} > FLASH

	.text.itcm.padding (NOLOAD) : {
		. = ALIGN(32768);
	}  > ITCM
    
	.data : {
		*(.rodata*)
		*(.data*)
        . = ALIGN(16);
	} > DTCM  AT> FLASH

	.bss ALIGN(4) : {
		*(.bss*)
		*(COMMON)
		. = ALIGN(32);
		. = . + 32; /* MPU to trap stack overflow */
	} > DTCM

	.bss.dma (NOLOAD) : {
		*(.hab_log)
		*(.dmabuffers)
		. = ALIGN(32);
	} > RAM

	.bss.extram (NOLOAD) : {
		*(.externalram)
		. = ALIGN(32);
	} > ERAM

	.text.csf : {
		FILL(0xFF)
		. = ALIGN(1024);
		KEEP(*(.csf))
        __text_csf_end = .;
	} > FLASH

	_stext = ADDR(.text.itcm);
	_etext = ADDR(.text.itcm) + SIZEOF(.text.itcm);
    _size_of_itcm  = SIZEOF(.text.itcm);
	_stextload = LOADADDR(.text.itcm);

	_sdata = ADDR(.data);
	_edata = ADDR(.data) + SIZEOF(.data) + SIZEOF(.ARM.exidx);
	_sdataload = LOADADDR(.data);

	_sbss = ADDR(.bss);
	_ebss = ADDR(.bss) + SIZEOF(.bss);

	_heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
	_heap_end = ORIGIN(RAM) + LENGTH(RAM);

	_extram_start = ADDR(.bss.extram);
	_extram_end = ADDR(.bss.extram) + SIZEOF(.bss.extram);

	_itcm_block_count = (_size_of_itcm + 0x7FFF) >> 15;
	_flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
	_estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);

	_flashimagelen = __text_csf_end - ORIGIN(FLASH);
	_teensy_model_identifier = 0x25;

	.debug_info     0 : { *(.debug_info) }
	.debug_abbrev   0 : { *(.debug_abbrev) }
	.debug_line     0 : { *(.debug_line) }
	.debug_frame    0 : { *(.debug_frame) }
	.debug_str      0 : { *(.debug_str) }
	.debug_loc      0 : { *(.debug_loc) }

}

Please find below an example with the compilation message (I use Arduino 1.8.13 with a Teensy 4.1 equipped with 16 MB of PSRAM):

Memory Usage on Teensy 4.1:
FLASH: code:174608, data:38928, headers:8348 free for files:7904580
RAM1: variables:438992, code:0, padding:0 free for local variables:85296
RAM2: variables:12416 free for malloc/new:511872
EXTRAM: variables:366336
 
Back
Top