Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 20 of 20

Thread: Teensy 4.0: assign large chunk of data in RAM

  1. #1
    Junior Member
    Join Date
    Jun 2020
    Posts
    10

    Teensy 4.0: assign large chunk of data in RAM

    Hi,

    The Teensy 4.0 has two memory blocks each of 512KB. I have a requirement to assign a single uint_8 array of size 550KB. The first memory block is already occupied with 300KB of data but have 212KB free space. So I have around 714 KB of free space in total but in two blocks. Is there any way to reconfigure memory? I have looked into the RT1062 FlexRAM but it seems only the first block can be re-arranged. Any help or hints will be appreciated.

    Thanks,
    Naveen

  2. #2
    Senior Member vjmuzik's Avatar
    Join Date
    Apr 2017
    Posts
    685
    The blocks of RAM are two separate areas accessed in two different ways not 1 block with 2 partitions so it cannot be rearranged for larger buffers, your other option would be to get a Teensy 4.1 and add the extra PSRAM chip on the bottom. If you buy them from PJRC it’s 8MB of extra RAM, it is slower than the onboard RAM but for most applications it’s not much of an impact.

  3. #3
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    Thanks for your suggestion! I will look into Teensy 4.1. For the time being, I am trying to keep the data within 512 KB (in the 2nd block) but I faced another issue.

    Here is a sample code to reproduce the issue:

    Code:
    #define SIZE 504 * 1024
    
    uint8_t  buf[SIZE] DMAMEM;
    
    void setup() {
      for (int i = 0; i < SIZE; i++) {
        buf[i] = 1;
      }
    }
    
    void loop() {
    
    }

    I am getting following error on compilation (Teensyduino):

    arm-none-eabi/bin/ld: /var/folders/ck/sketch_jun01d.ino.elf section `.bss.dma' will not fit in region `RAM'
    arm-none-eabi/bin/ld: region `RAM' overflowed by 4192 bytes
    collect2: error: ld returned 1 exit status
    Error compiling for board Teensy 4.0.

    My assumption is the 2nd RAM Block has 524288 bytes size (512*1024). The program above is just asking for 516096 bytes (504 * 1024).
    When I change the SIZE to 499*1024 it works but 500* 1024 or above does not.

  4. #4
    Senior Member vjmuzik's Avatar
    Join Date
    Apr 2017
    Posts
    685
    The Teensy Core uses some DMAMEM for buffers, I don’t know the exact amount, but that would be why it’s not all free.

  5. #5
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    I see! Is there any way to disable those buffers?

  6. #6
    Senior Member vjmuzik's Avatar
    Join Date
    Apr 2017
    Posts
    685
    You would have to edit the core files to change them from DMAMEM, but you can’t just get rid of them without disabling features either. I know they are used for USB buffers, probably HardwareSerial as well, so if you need those features the buffers have to reside somewhere whether in RAM1 or RAM2.

  7. #7
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    I can happily move the buffers to the RAM1. Could you please point me to any documentation/example and location of the file where I have to make changes? I appreciate your help. I have done FlexRAM configuration using MCUExpresso for RT1010, I guess they make changes into linker files. This is my first Teensy board so I need to learn many things.

  8. #8
    Senior Member vjmuzik's Avatar
    Join Date
    Apr 2017
    Posts
    685
    It would be littered across multiple files, if you load all the core files into your editor of choice you just have to search for DMAMEM and then delete it from all the variables found that use it. Any calls to malloc or new also go to RAM2 so you would have to look out for those, but I don’t believe the core files make any calls to them.

  9. #9
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    Thanks, I will try to find and change. I will post here if I succeed.

  10. #10
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    After looking into the core files I found removing DMAMEM allows to use almost all the RAM2. Most of the places DMAMEM is undefined when CPU speed is below 30MHz. I did a quick testing by changing the Teensy 4.0 CPU speed to 24 MHz in the Teensyduino IDE. And the code works up to the SIZE 512*1024 -96. Thanks @vjmuzik for pointing me to right direction!

  11. #11
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,657
    That 24 Mhz to use RAM1 - and no DMA was a last minute change Paul noted as TD 1.52 shipped. I was offline today ... but you found the right place it seems.

    When T_4.0 started it wasn't using DMAMEM - so reverting that just goes back to the initially tested code as it was done before moving to DMA, now non-DMA back in use for for 24 Mhz operation where USB glitches would appear with the slower clocking of the RAM2/DMAMEM area.

  12. #12
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    Although the simple example above works but the library I am using seems have new/malloc which is taking up some memory in the RAM2.
    Is there any way to move the heap from RAM2 to RAM1?
    Last edited by yokonav; 06-01-2020 at 04:43 PM.

  13. #13
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,657
    Quote Originally Posted by yokonav View Post
    Although the simple example above works but the library I am using seems have new/malloc which is taking up some memory in the RAM2.
    Is there any way to move the heap from RAM2 to RAM1?
    Currently malloc and new allocate from the heap which is defined to be RAM2. With the T_4.1 easily having an added 8MB of RAM on the PSRAM having a way to get RAM from that area became apparent. Having a way to 'heap' alloc from RAM1 may come along with that change - though that may not apply to default usage in libraries without change.

    The .ld files define where the heap is T_4.0 :: ...\hardware\teensy\avr\cores\teensy4\imxrt1062.ld
    Proper local edits there might allow it to work moving to RAM1 - though the Stack lives there with the DTCM area.

  14. #14
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    The imxrt1062.ld file seems cryptic to me. I tried to change

    Code:
     _heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
     _heap_end = ORIGIN(RAM) + LENGTH(RAM);
    to

    Code:
     _heap_start = ADDR(.bss) + SIZEOF(.bss);
    _heap_end = ORIGIN(DTCM) + LENGTH(DTCM);
    Compilation is OK but upload gets failed. I guess it overrode the stack with the heap. I am kinda lost now.

  15. #15
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,657
    Quote Originally Posted by yokonav View Post
    The imxrt1062.ld file seems cryptic to me. I tried to change

    Code:
     _heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
     _heap_end = ORIGIN(RAM) + LENGTH(RAM);
    to

    Code:
     _heap_start = ADDR(.bss) + SIZEOF(.bss);
    _heap_end = ORIGIN(DTCM) + LENGTH(DTCM);
    Compilation is OK but upload gets failed. I guess it overrode the stack with the heap. I am kinda lost now.
    ... that was generic advise as I've not looked at the details.

    But putting HEAP at RAM( DTCM ) will have HEAP and RAM starting in the same place:
    DTCM (rwx): ORIGIN = 0x20000000, LENGTH = 512K

    The stack start is here to grow down:
    _estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);

    So perhaps putting the HEAP at above DTCM to grow up:
    _???? = ORIGIN(DTCM) + ( _itcm_block_count << 15);

    Would have to figure out the name '_????' creation to appease the linker.

  16. #16
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    In the i.MX RT1060 docs, they mention
    On-chip RAM(1MB) :
    * 512 KB FlexRAM shared between ITCM/DTCM and OCRAM
    * Dedicate 512 KB OCRAM

    So I think the RAM1 (FlexRAM) can be partitioned into 3 parts - ITCM (128K), DTCM(128K), and OCRAM (256K). Then assign the heap in the OCRAM partition. Now I have to think how it can be done in the linker file.
    Last edited by yokonav; 06-02-2020 at 06:41 AM.

  17. #17
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,657
    seems the linker picks the addresses - hopefully reading the manual not needed.

    Seems a rewrite/reorder of this:
    Code:
    	_heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
    	_heap_end = ORIGIN(RAM) + LENGTH(RAM);
    
    	_itcm_block_count = (SIZEOF(.text.itcm) + 0x7FFF) >> 15;
    	_flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
    	_estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);
    Perhaps with ref to prior post as - this allowing 32KB for stack - that 15 could be smaller if small heap is good enough to protect the stack as needed.
    Does this work? for: _heap_start and _heap_end::
    Code:
    	_itcm_block_count = (SIZEOF(.text.itcm) + 0x7FFF) >> 15;
    	_flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
    	_estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);
    	_heap_start = ORIGIN(DTCM) + ( _itcm_block_count << 15);
    	_heap_end = ORIGIN(DTCM) + ((15 - _itcm_block_count) << 15);

  18. #18
    Junior Member
    Join Date
    Jun 2020
    Posts
    10
    Thanks @defragster! It seems to be working.

  19. #19
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,657
    Quote Originally Posted by yokonav View Post
    Thanks @defragster! It seems to be working.
    Awesome - seemed like it should ... which is always a scary trap when just reading the context and not any manual. If that is wholly right ... That should assure the Heap starts over static DTCM allocs, but only reserve 32KB for Stack. Though that doesn't limit the stack from growing down, it will keep heap from growing into the last 32KB ... which may be overkill ... or not depending on the sketch use of the two ... but that is a common concern where those two contend for typically shared memory space

    Just now thinking the T_3.6.ld file is possibly similar ... except for half the names.

    With a T_4.1 that would seem to allow the whole of the heap to have 8MB ( or 16 with twin chips ) - and in the same way keep all of RAM2 free ( except for DMA users ) it would be slower. Only issue might be a race in startup.c if the heap were in any way touched before the QSPI detect and enable for that ExtendedRam.

  20. #20
    Senior Member
    Join Date
    Dec 2014
    Posts
    310
    I have a requirement to assign a single uint_8 array of size 550KB.
    Does this array need to be mutable, or is it static initialized data?

    If it doesn't have to be mutable, can you declare it as PROGMEM and keep it in flash?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •