Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 14 of 14

Thread: I can't seem to use all of the Teensy 4.0's memory?

  1. #1

    I can't seem to use all of the Teensy 4.0's memory?

    I'm working on a project that requires a very large amount of RAM. I've tried allocating 640 KB in two different ways.

    I first tried to just declare a global array like so:

    Code:
    uint8_t RAM[655360];
    That won't link.

    So I attempted to see if I could malloc it:

    Code:
    uint8_t *RAM;
    
    void setup() {
        while (!Serial) { }
        Serial.begin(9600);
        RAM = (uint8_t *)malloc(655360UL);
        if (RAM == NULL) {
            Serial.println("Failed to allocate 640 KB!");
        } else {
            Serial.println("Successfully allocated 640 KB!");
        }
    }
    It failed.

    What am I missing here as far as getting access to all of the memory?

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Location
    Portland OR
    Posts
    653
    "1024K RAM (512K is tightly coupled)" from https://www.pjrc.com/store/teensy40.html
    Given apparently two different blocks of RAM which are connected to the CPU in different ways, are they also accessed with different code syntax?
    Or even if not, it's still possible to imagine there might be complications allocating a single object across the 512K boundary. However, this is purely speculation on my part.

  3. #3
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,334
    All of RAM memory is ~1024 KB.

    But it exists in TWO parts/segments of 512 KB each. I've not seen a clear note of the rules I could share or learned enough to explain.

    A rough guess would need to do some half of each in both methods. But code not marked PROGMEM is loaded to RAM to run faster.

    Here is the CODE based doc [ ...\hardware\teensy\avr\cores\teensy4\imxrt1062.ld ] for memory - it shows the segmentation - but it doesn't clarify where 640KB could be found:
    Code:
    MEMORY
    {
    	ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
    	DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
    	RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
    	FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 1984K
    }

  4. #4
    Quote Originally Posted by JBeale View Post
    "1024K RAM (512K is tightly coupled)" from https://www.pjrc.com/store/teensy40.html
    Given apparently two different blocks of RAM which are connected to the CPU in different ways, are they also accessed with different code syntax?
    Or even if not, it's still possible to imagine there might be complications allocating a single object across the 512K boundary. However, this is purely speculation on my part.
    I considered that too after making this post, and tried even breaking it into ten 64 KB arrays and that wouldn't link, either. So you're probably right about there being some other syntax to use the TCM. Although the free memory report after compilation doesn't seem to differentiate the two blocks of 512 KB, saying "Maximum is 1048576 bytes."

  5. #5
    Quote Originally Posted by defragster View Post
    All of RAM memory is ~1024 KB.

    But it exists in TWO parts/segments of 512 KB each. I've not seen a clear note of the rules I could share or learned enough to explain.

    A rough guess would need to do some half of each in both methods. But code not marked PROGMEM is loaded to RAM to run faster.

    Here is the CODE based doc [ ...\hardware\teensy\avr\cores\teensy4\imxrt1062.ld ] for memory - it shows the segmentation - but it doesn't clarify where 640KB could be found:
    Code:
    MEMORY
    {
    	ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
    	DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
    	RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
    	FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 1984K
    }
    Ah yes, the linker error I got was complaining that it was out of space in the DTCM region. So, I'll just need to figure out how to put some of it in the other 512 KB.

  6. #6
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,296
    I am not an expert on some of this, but, there are some interesting differences between different types of memory. Some of this I put up in the thread:
    https://forum.pjrc.com/threads/57225...and-malloc-new

    I am not sure if this will help you, but if you look at simple program:
    Code:
    uint8_t lower_buffer[32];
    uint8_t upper_buffer[32] DMAMEM;
    void setup() {
      uint8_t stack_buffer[32];
      uint8_t *heap_buffer = malloc(32);
    
      while (!Serial && millis() < 4000) ;
      Serial.begin(115200);
      delay(500);
      Serial.printf("Lower_Buffer: %x\n", (uint32_t)lower_buffer);
      Serial.printf("upper_buffer: %x\n", (uint32_t)upper_buffer);
      Serial.printf("stack buffer: %x\n", (uint32_t)stack_buffer);
      Serial.printf("Heap Buffer: %x\n", (uint32_t)heap_buffer);
      pinMode(13, OUTPUT);
    }
    
    void loop() {
      digitalWrite(13, !digitalRead(13));
      delay(500);
    }
    I had the following output:
    Code:
    06:52:48.079 -> Lower_Buffer: 2000107c
    06:52:48.079 -> upper_buffer: 20200000
    06:52:48.079 -> stack buffer: 20077fd0
    06:52:48.079 -> Heap Buffer: 20200028
    What is interesting is that addresses below 20200000 is in the lower memory, and this appears to include everything you define as a normal global variable and at the top of it, is the stack.

    Then anything you defined as DMAMEM will go into the RAM section that defragster mentioned. And above this is the HEAP, where malloc and new allocate memory from.
    Note: This upper memory has memory caching which can make some stuff interesting... (More on the other thread).

    Now as I mentioned in the other thread in the lower memory, there is a probably a lot of unused memory between all of the global variables and the stack. And so far I don't know of any way to allocate this memory...

    Again sorry I know that is not a complete answer, but...

  7. #7
    KurtE, this is exactly what I needed to know, thanks! The DMAMEM keyword allowed me to split this up into two arrays, and it compiled just fine.

  8. #8
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,334
    Quote Originally Posted by Mike Chambers View Post
    KurtE, this is exactly what I needed to know, thanks! The DMAMEM keyword allowed me to split this up into two arrays, and it compiled just fine.
    Nice, can you post a short sketch showing the alloc and usage?

    @KurtE - I skimmed that informative post - but after seeing the usage of .printf() versus .print() I got distracted with what I was doing ... I was doing .print( double foo ) for the delayCycle() - and saw it working - but with limited "#.2" resolution, but at least it didn't break the build as it seemed it used to.

  9. #9
    Quote Originally Posted by defragster View Post
    Nice, can you post a short sketch showing the alloc and usage?

    @KurtE - I skimmed that informative post - but after seeing the usage of .printf() versus .print() I got distracted with what I was doing ... I was doing .print( double foo ) for the delayCycle() - and saw it working - but with limited "#.2" resolution, but at least it didn't break the build as it seemed it used to.
    I basically just copied what KurtE was doing in his post. I put 512 KB in the DMAMEM area, and the other 128 KB in the regular global memory.

    Code:
    uint8_t RAM0[524288] DMAMEM;
    uint8_t RAM1[131072];
    And then just used both arrays normally, nothing special needs to be done to access the DMAMEM array. The compiler takes care of it behind the scenes.

    Here it is in action: https://i.imgur.com/i77jW2s.jpg

    The screen text is a bit hard to make out, but you should be able to see why I needed 640 KB.

    The Teensy 4.0 is absolutely perfect for this project. I had to use a bunch of SPI RAM when doing it on the 3.6, which slowed it way down.

    Thanks for releasing this board, Paul!

  10. #10
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,334
    Quote Originally Posted by Mike Chambers View Post
    I basically just copied what KurtE was doing in his post. I put 512 KB in the DMAMEM area, and the other 128 KB in the regular global memory.

    ...
    Here it is in action: https://i.imgur.com/i77jW2s.jpg

    The screen text is a bit hard to make out, but you should be able to see why I needed 640 KB.

    The Teensy 4.0 is absolutely perfect for this project. I had to use a bunch of SPI RAM when doing it on the 3.6, which slowed it way down.

    Thanks for releasing this board, Paul!
    Yeah the address space is flat 32 bit - so nothing but Linker assigned addresses in 32 bit pointers - the segmentation is handled in the build as directed - not with 8088 like segment registers.

    Not sure using ALL DMAMEM won't cause trouble with USB (?) or anything else that uses space there - though seems that should fail in the build ... so good that worked.

    … looked at image after posting - cool a CGA 640 KB 8088 PC - ready to run ROM BASIC

  11. #11
    Quote Originally Posted by defragster View Post
    Yeah the address space is flat 32 bit - so nothing but Linker assigned addresses in 32 bit pointers - the segmentation is handled in the build as directed - not with 8088 like segment registers.

    Not sure using ALL DMAMEM won't cause trouble with USB (?) or anything else that uses space there - though seems that should fail in the build ... so good that worked.
    That's true. Unless it wants to allocate some memory there during runtime. Maybe Paul or someone else could chime in on whether we should be leaving a bit of free space in that segment.

    … looked at image after posting - cool a CGA 640 KB 8088 PC - ready to run ROM BASIC
    Or DOS and old PC games, which is really the point of this. It has code to connect with a PS/2 keyboard, I just haven't put the connector on the breadboard yet.

  12. #12
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,458
    Quote Originally Posted by Mike Chambers View Post
    Maybe Paul or someone else could chime in on whether we should be leaving a bit of free space in that segment.
    Future USB will use some DMAMEM, but probably not more than 32K.

    Right now I'm working on the Arduino Serial Monitor side. Really want to solve the Arduino IDE crashing before I work on making the Teensy 4.0 even faster!


    It has code to connect with a PS/2 keyboard, I just haven't put the connector on the breadboard yet.
    Remember Teensy 4.0 is not 5V tolerant, so include a level shifter or other protection so 5V from those ancient keyboards doesn't drive a pin on Teensy 4.0.

  13. #13
    Quote Originally Posted by PaulStoffregen View Post
    Future USB will use some DMAMEM, but probably not more than 32K.

    Right now I'm working on the Arduino Serial Monitor side. Really want to solve the Arduino IDE crashing before I work on making the Teensy 4.0 even faster!
    There's certainly no rush needed for making it faster. This thing is like lightning.


    Remember Teensy 4.0 is not 5V tolerant, so include a level shifter or other protection so 5V from those ancient keyboards doesn't drive a pin on Teensy 4.0.
    Now that you say that, I realize I forgot to put a level shifter for the keyboard on my T3.6 version of this. I guess I was so used to working with AVRs that it didn't even cross my mind. It still works, but that certainly couldn't have been good for it.

    What I should do is just catch up to the late 90's and make it work with USB keyboards. I'm kind of a retrocomputing nerd so I had old keyboards laying around and they were simple to interface with.

  14. #14
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,458
    Quote Originally Posted by Mike Chambers View Post
    What I should do is just catch up to the late 90's and make it work with USB keyboards.
    That's relatively easy, since USBHost_t36 works on Teensy 4.0. You just need to connect your USB keyboard to those 2 bottom side pads, and of course connect the power wires to GND and +5V.

    Most USB keyboards consume very little power and don't have large power supply decoupling caps, so you might even be able to hot-plug the keyboard without brown-out of Teensy's power. Maybe. But of course an external power supply or current limiting chip is the right way to support hot plugging.


    There's certainly no rush needed for making it faster. This thing is like lightning.
    The USB serial support is going to get a *lot* faster....

    But first I need to speed up the Arduino side, since we can already crash the Arduino IDE on every platform except 64 bit Linux (and even there, things slow to a crawl under the Java load).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •