Teensy 4.0: assign large chunk of data in RAM

Status
Not open for further replies.

yokonav

Member
Hi,

The Teensy 4.0 has two memory blocks each of 512KB. I have a requirement to assign a single uint_8 array of size 550KB. The first memory block is already occupied with 300KB of data but have 212KB free space. So I have around 714 KB of free space in total but in two blocks. Is there any way to reconfigure memory? I have looked into the RT1062 FlexRAM but it seems only the first block can be re-arranged. Any help or hints will be appreciated.

Thanks,
Naveen
 
The blocks of RAM are two separate areas accessed in two different ways not 1 block with 2 partitions so it cannot be rearranged for larger buffers, your other option would be to get a Teensy 4.1 and add the extra PSRAM chip on the bottom. If you buy them from PJRC it’s 8MB of extra RAM, it is slower than the onboard RAM but for most applications it’s not much of an impact.
 
Thanks for your suggestion! I will look into Teensy 4.1. For the time being, I am trying to keep the data within 512 KB (in the 2nd block) but I faced another issue.

Here is a sample code to reproduce the issue:

Code:
#define SIZE 504 * 1024

uint8_t  buf[SIZE] DMAMEM;

void setup() {
  for (int i = 0; i < SIZE; i++) {
    buf[i] = 1;
  }
}

void loop() {

}


I am getting following error on compilation (Teensyduino):

arm-none-eabi/bin/ld: /var/folders/ck/sketch_jun01d.ino.elf section `.bss.dma' will not fit in region `RAM'
arm-none-eabi/bin/ld: region `RAM' overflowed by 4192 bytes
collect2: error: ld returned 1 exit status
Error compiling for board Teensy 4.0.

My assumption is the 2nd RAM Block has 524288 bytes size (512*1024). The program above is just asking for 516096 bytes (504 * 1024).
When I change the SIZE to 499*1024 it works but 500* 1024 or above does not.
 
The Teensy Core uses some DMAMEM for buffers, I don’t know the exact amount, but that would be why it’s not all free.
 
You would have to edit the core files to change them from DMAMEM, but you can’t just get rid of them without disabling features either. I know they are used for USB buffers, probably HardwareSerial as well, so if you need those features the buffers have to reside somewhere whether in RAM1 or RAM2.
 
I can happily move the buffers to the RAM1. Could you please point me to any documentation/example and location of the file where I have to make changes? I appreciate your help. I have done FlexRAM configuration using MCUExpresso for RT1010, I guess they make changes into linker files. This is my first Teensy board so I need to learn many things.
 
It would be littered across multiple files, if you load all the core files into your editor of choice you just have to search for DMAMEM and then delete it from all the variables found that use it. Any calls to malloc or new also go to RAM2 so you would have to look out for those, but I don’t believe the core files make any calls to them.
 
After looking into the core files I found removing DMAMEM allows to use almost all the RAM2. Most of the places DMAMEM is undefined when CPU speed is below 30MHz. I did a quick testing by changing the Teensy 4.0 CPU speed to 24 MHz in the Teensyduino IDE. And the code works up to the SIZE 512*1024 -96. Thanks @vjmuzik for pointing me to right direction!
 
That 24 Mhz to use RAM1 - and no DMA was a last minute change Paul noted as TD 1.52 shipped. I was offline today ... but you found the right place it seems.

When T_4.0 started it wasn't using DMAMEM - so reverting that just goes back to the initially tested code as it was done before moving to DMA, now non-DMA back in use for for 24 Mhz operation where USB glitches would appear with the slower clocking of the RAM2/DMAMEM area.
 
Although the simple example above works but the library I am using seems have new/malloc which is taking up some memory in the RAM2.
Is there any way to move the heap from RAM2 to RAM1?
 
Last edited:
Although the simple example above works but the library I am using seems have new/malloc which is taking up some memory in the RAM2.
Is there any way to move the heap from RAM2 to RAM1?

Currently malloc and new allocate from the heap which is defined to be RAM2. With the T_4.1 easily having an added 8MB of RAM on the PSRAM having a way to get RAM from that area became apparent. Having a way to 'heap' alloc from RAM1 may come along with that change - though that may not apply to default usage in libraries without change.

The .ld files define where the heap is T_4.0 :: ...\hardware\teensy\avr\cores\teensy4\imxrt1062.ld
Proper local edits there might allow it to work moving to RAM1 - though the Stack lives there with the DTCM area.
 
The imxrt1062.ld file seems cryptic to me. I tried to change

Code:
 _heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
 _heap_end = ORIGIN(RAM) + LENGTH(RAM);

to

Code:
 _heap_start = ADDR(.bss) + SIZEOF(.bss);
_heap_end = ORIGIN(DTCM) + LENGTH(DTCM);

Compilation is OK but upload gets failed. I guess it overrode the stack with the heap. I am kinda lost now.
 
The imxrt1062.ld file seems cryptic to me. I tried to change

Code:
 _heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
 _heap_end = ORIGIN(RAM) + LENGTH(RAM);

to

Code:
 _heap_start = ADDR(.bss) + SIZEOF(.bss);
_heap_end = ORIGIN(DTCM) + LENGTH(DTCM);

Compilation is OK but upload gets failed. I guess it overrode the stack with the heap. I am kinda lost now.

... that was generic advise as I've not looked at the details.

But putting HEAP at RAM( DTCM ) will have HEAP and RAM starting in the same place:
DTCM (rwx): ORIGIN = 0x20000000, LENGTH = 512K

The stack start is here to grow down:
_estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);

So perhaps putting the HEAP at above DTCM to grow up:
_???? = ORIGIN(DTCM) + ( _itcm_block_count << 15);

Would have to figure out the name '_????' creation to appease the linker.
 
In the i.MX RT1060 docs, they mention
On-chip RAM(1MB) :
* 512 KB FlexRAM shared between ITCM/DTCM and OCRAM
* Dedicate 512 KB OCRAM

So I think the RAM1 (FlexRAM) can be partitioned into 3 parts - ITCM (128K), DTCM(128K), and OCRAM (256K). Then assign the heap in the OCRAM partition. Now I have to think how it can be done in the linker file.
 
Last edited:
seems the linker picks the addresses - hopefully reading the manual not needed.

Seems a rewrite/reorder of this:
Code:
	_heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
	_heap_end = ORIGIN(RAM) + LENGTH(RAM);

	_itcm_block_count = (SIZEOF(.text.itcm) + 0x7FFF) >> 15;
	_flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
	_estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);

Perhaps with ref to prior post as - this allowing 32KB for stack - that 15 could be smaller if small heap is good enough to protect the stack as needed.
Does this work? for: _heap_start and _heap_end::
Code:
	_itcm_block_count = (SIZEOF(.text.itcm) + 0x7FFF) >> 15;
	_flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
	_estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);
	_heap_start = ORIGIN(DTCM) + ( _itcm_block_count << 15);
	_heap_end = ORIGIN(DTCM) + (([B]15[/B] - _itcm_block_count) << 15);
 
Thanks @defragster! It seems to be working.

Awesome - seemed like it should ... which is always a scary trap when just reading the context and not any manual. If that is wholly right ... That should assure the Heap starts over static DTCM allocs, but only reserve 32KB for Stack. Though that doesn't limit the stack from growing down, it will keep heap from growing into the last 32KB ... which may be overkill ... or not depending on the sketch use of the two ... but that is a common concern where those two contend for typically shared memory space

Just now thinking the T_3.6.ld file is possibly similar ... except for half the names.

With a T_4.1 that would seem to allow the whole of the heap to have 8MB ( or 16 with twin chips ) - and in the same way keep all of RAM2 free ( except for DMA users ) it would be slower. Only issue might be a race in startup.c if the heap were in any way touched before the QSPI detect and enable for that ExtendedRam.
 
I have a requirement to assign a single uint_8 array of size 550KB.

Does this array need to be mutable, or is it static initialized data?

If it doesn't have to be mutable, can you declare it as PROGMEM and keep it in flash?
 
Status
Not open for further replies.
Back
Top