I can't seem to use all of the Teensy 4.0's memory?

Status
Not open for further replies.

Mike Chambers

Well-known member
I'm working on a project that requires a very large amount of RAM. I've tried allocating 640 KB in two different ways.

I first tried to just declare a global array like so:

Code:
uint8_t RAM[655360];

That won't link.

So I attempted to see if I could malloc it:

Code:
uint8_t *RAM;

void setup() {
    while (!Serial) { }
    Serial.begin(9600);
    RAM = (uint8_t *)malloc(655360UL);
    if (RAM == NULL) {
        Serial.println("Failed to allocate 640 KB!");
    } else {
        Serial.println("Successfully allocated 640 KB!");
    }
}

It failed.

What am I missing here as far as getting access to all of the memory?
 
"1024K RAM (512K is tightly coupled)" from https://www.pjrc.com/store/teensy40.html
Given apparently two different blocks of RAM which are connected to the CPU in different ways, are they also accessed with different code syntax?
Or even if not, it's still possible to imagine there might be complications allocating a single object across the 512K boundary. However, this is purely speculation on my part.
 
All of RAM memory is ~1024 KB.

But it exists in TWO parts/segments of 512 KB each. I've not seen a clear note of the rules I could share or learned enough to explain.

A rough guess would need to do some half of each in both methods. But code not marked PROGMEM is loaded to RAM to run faster.

Here is the CODE based doc [ ...\hardware\teensy\avr\cores\teensy4\imxrt1062.ld ] for memory - it shows the segmentation - but it doesn't clarify where 640KB could be found:
Code:
MEMORY
{
	ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
	DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
	RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
	FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 1984K
}
 
"1024K RAM (512K is tightly coupled)" from https://www.pjrc.com/store/teensy40.html
Given apparently two different blocks of RAM which are connected to the CPU in different ways, are they also accessed with different code syntax?
Or even if not, it's still possible to imagine there might be complications allocating a single object across the 512K boundary. However, this is purely speculation on my part.

I considered that too after making this post, and tried even breaking it into ten 64 KB arrays and that wouldn't link, either. So you're probably right about there being some other syntax to use the TCM. Although the free memory report after compilation doesn't seem to differentiate the two blocks of 512 KB, saying "Maximum is 1048576 bytes."
 
All of RAM memory is ~1024 KB.

But it exists in TWO parts/segments of 512 KB each. I've not seen a clear note of the rules I could share or learned enough to explain.

A rough guess would need to do some half of each in both methods. But code not marked PROGMEM is loaded to RAM to run faster.

Here is the CODE based doc [ ...\hardware\teensy\avr\cores\teensy4\imxrt1062.ld ] for memory - it shows the segmentation - but it doesn't clarify where 640KB could be found:
Code:
MEMORY
{
	ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
	DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
	RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
	FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 1984K
}

Ah yes, the linker error I got was complaining that it was out of space in the DTCM region. So, I'll just need to figure out how to put some of it in the other 512 KB.
 
I am not an expert on some of this, but, there are some interesting differences between different types of memory. Some of this I put up in the thread:
https://forum.pjrc.com/threads/57225-T4-DMA-and-Memory-DMAMEM-and-malloc-new

I am not sure if this will help you, but if you look at simple program:
Code:
uint8_t lower_buffer[32];
uint8_t upper_buffer[32] DMAMEM;
void setup() {
  uint8_t stack_buffer[32];
  uint8_t *heap_buffer = malloc(32);

  while (!Serial && millis() < 4000) ;
  Serial.begin(115200);
  delay(500);
  Serial.printf("Lower_Buffer: %x\n", (uint32_t)lower_buffer);
  Serial.printf("upper_buffer: %x\n", (uint32_t)upper_buffer);
  Serial.printf("stack buffer: %x\n", (uint32_t)stack_buffer);
  Serial.printf("Heap Buffer: %x\n", (uint32_t)heap_buffer);
  pinMode(13, OUTPUT);
}

void loop() {
  digitalWrite(13, !digitalRead(13));
  delay(500);
}
I had the following output:
Code:
06:52:48.079 -> Lower_Buffer: 2000107c
06:52:48.079 -> upper_buffer: 20200000
06:52:48.079 -> stack buffer: 20077fd0
06:52:48.079 -> Heap Buffer: 20200028

What is interesting is that addresses below 20200000 is in the lower memory, and this appears to include everything you define as a normal global variable and at the top of it, is the stack.

Then anything you defined as DMAMEM will go into the RAM section that defragster mentioned. And above this is the HEAP, where malloc and new allocate memory from.
Note: This upper memory has memory caching which can make some stuff interesting... (More on the other thread).

Now as I mentioned in the other thread in the lower memory, there is a probably a lot of unused memory between all of the global variables and the stack. And so far I don't know of any way to allocate this memory...

Again sorry I know that is not a complete answer, but...
 
KurtE, this is exactly what I needed to know, thanks! The DMAMEM keyword allowed me to split this up into two arrays, and it compiled just fine. :)
 
KurtE, this is exactly what I needed to know, thanks! The DMAMEM keyword allowed me to split this up into two arrays, and it compiled just fine. :)

Nice, can you post a short sketch showing the alloc and usage?

@KurtE - I skimmed that informative post - but after seeing the usage of .printf() versus .print() I got distracted with what I was doing ... I was doing .print( double foo ) for the delayCycle() - and saw it working - but with limited "#.2" resolution, but at least it didn't break the build as it seemed it used to.
 
Nice, can you post a short sketch showing the alloc and usage?

@KurtE - I skimmed that informative post - but after seeing the usage of .printf() versus .print() I got distracted with what I was doing ... I was doing .print( double foo ) for the delayCycle() - and saw it working - but with limited "#.2" resolution, but at least it didn't break the build as it seemed it used to.

I basically just copied what KurtE was doing in his post. I put 512 KB in the DMAMEM area, and the other 128 KB in the regular global memory.

Code:
uint8_t RAM0[524288] DMAMEM;
uint8_t RAM1[131072];

And then just used both arrays normally, nothing special needs to be done to access the DMAMEM array. The compiler takes care of it behind the scenes.

Here it is in action: https://i.imgur.com/i77jW2s.jpg

The screen text is a bit hard to make out, but you should be able to see why I needed 640 KB. ;)

The Teensy 4.0 is absolutely perfect for this project. I had to use a bunch of SPI RAM when doing it on the 3.6, which slowed it way down.

Thanks for releasing this board, Paul!
 
I basically just copied what KurtE was doing in his post. I put 512 KB in the DMAMEM area, and the other 128 KB in the regular global memory.

...
Here it is in action: https://i.imgur.com/i77jW2s.jpg

The screen text is a bit hard to make out, but you should be able to see why I needed 640 KB. ;)

The Teensy 4.0 is absolutely perfect for this project. I had to use a bunch of SPI RAM when doing it on the 3.6, which slowed it way down.

Thanks for releasing this board, Paul!

Yeah the address space is flat 32 bit - so nothing but Linker assigned addresses in 32 bit pointers - the segmentation is handled in the build as directed - not with 8088 like segment registers.

Not sure using ALL DMAMEM won't cause trouble with USB (?) or anything else that uses space there - though seems that should fail in the build ... so good that worked.

… looked at image after posting - cool a CGA 640 KB 8088 PC - ready to run ROM BASIC
 
Yeah the address space is flat 32 bit - so nothing but Linker assigned addresses in 32 bit pointers - the segmentation is handled in the build as directed - not with 8088 like segment registers.

Not sure using ALL DMAMEM won't cause trouble with USB (?) or anything else that uses space there - though seems that should fail in the build ... so good that worked.

That's true. Unless it wants to allocate some memory there during runtime. Maybe Paul or someone else could chime in on whether we should be leaving a bit of free space in that segment.

… looked at image after posting - cool a CGA 640 KB 8088 PC - ready to run ROM BASIC

Or DOS and old PC games, which is really the point of this. It has code to connect with a PS/2 keyboard, I just haven't put the connector on the breadboard yet.
 
Maybe Paul or someone else could chime in on whether we should be leaving a bit of free space in that segment.

Future USB will use some DMAMEM, but probably not more than 32K.

Right now I'm working on the Arduino Serial Monitor side. Really want to solve the Arduino IDE crashing before I work on making the Teensy 4.0 even faster!


It has code to connect with a PS/2 keyboard, I just haven't put the connector on the breadboard yet.

Remember Teensy 4.0 is not 5V tolerant, so include a level shifter or other protection so 5V from those ancient keyboards doesn't drive a pin on Teensy 4.0.
 
Future USB will use some DMAMEM, but probably not more than 32K.

Right now I'm working on the Arduino Serial Monitor side. Really want to solve the Arduino IDE crashing before I work on making the Teensy 4.0 even faster!

There's certainly no rush needed for making it faster. This thing is like lightning.


Remember Teensy 4.0 is not 5V tolerant, so include a level shifter or other protection so 5V from those ancient keyboards doesn't drive a pin on Teensy 4.0.

Now that you say that, I realize I forgot to put a level shifter for the keyboard on my T3.6 version of this. I guess I was so used to working with AVRs that it didn't even cross my mind. It still works, but that certainly couldn't have been good for it.

What I should do is just catch up to the late 90's and make it work with USB keyboards. I'm kind of a retrocomputing nerd so I had old keyboards laying around and they were simple to interface with.
 
What I should do is just catch up to the late 90's and make it work with USB keyboards.

That's relatively easy, since USBHost_t36 works on Teensy 4.0. You just need to connect your USB keyboard to those 2 bottom side pads, and of course connect the power wires to GND and +5V.

Most USB keyboards consume very little power and don't have large power supply decoupling caps, so you might even be able to hot-plug the keyboard without brown-out of Teensy's power. Maybe. But of course an external power supply or current limiting chip is the right way to support hot plugging.


There's certainly no rush needed for making it faster. This thing is like lightning.

The USB serial support is going to get a *lot* faster....

But first I need to speed up the Arduino side, since we can already crash the Arduino IDE on every platform except 64 bit Linux (and even there, things slow to a crawl under the Java load).
 
Status
Not open for further replies.
Back
Top