Putting objects instantiated with 'new' into EXTMEM, instead of RAM2?

doctea

Active member
I have tons of free EXTMEM but am running out of RAM2 due to the huge number of objects I'm creating. (hundreds of controls in a menu system created at boot time)

Is there a way to mark some 'new' operations so that they will be allocated out of EXTMEM instead?

(Any other tricks or tips on how to reduce the memory usage of non-static objects, or at least figuring out what's chewing up the most RAM also much appreciated).

Thanks
 
My guess is you could probably try to hack up the file cores/teensy4/new.cpp

Maybe something like:
Code:
void * operator new(size_t size)
{
    void * new_obj = malloc(size);
    if (new_obj) return new_obj;
    return extmem_alloc(size);
}

Note: delete would also need to be updated (assuming you delete things)
Code:
void operator delete(void * ptr)
{
    if ((uint32_t) ptr >= 0x70000000) {
        extmem_free(ptr);
    } else {
        free(ptr);
    }
}

Note: I have not tried building this let alone seeing if it works or not... But might...
 
My guess is you could probably try to hack up the file cores/teensy4/new.cpp


Note: I have not tried building this let alone seeing if it works or not... But might...

Hi KurtE, many thanks for the this suggestion and swift reply! Huge help.

I've had a play around with this and managed to get it to work. I had to make a couple of tweaks to get it to compile, and also had to implement a way to reserve some RAM2, otherwise I ran into crashes.

Also had to ensure that both delete(void*) and delete(void*ptr, size_t) had the extmem_malloc/extmem_free changes, otherwise I'd get crashes (presumably there are some libs I use that use both of those, since I don't think I use them anywhere in my own code).

Once the reserve level is hit, I'm really surprised by how much slower it is new()ing things into EXTMEM than into RAM2. Like, it takes 5+ seconds to set up in EXTMEM what only takes less than a second to do usually - to the point where I'll need to add some visual indication during startup that the app is still starting and hasn't frozen.

Are there any techniques I can use to mitigate this? I believe I've seen mention before of alternate memory allocation libraries that could be faster at this ('nanolib', maybe?). Would such a thing be likely to help here? (And if so, is there a dummy's guide to using such a thing?)

Thanks again for your help -- now I can cram even more ridiculous functionality into my project!

Replacement cores/teensy4/new.cpp code -- for arduinoteensy 1.157.220801 -- follows:-

Code:
/* Teensyduino Core Library
 * http://www.pjrc.com/teensy/
 * Copyright (c) 2018 PJRC.COM, LLC.
 *
 * Permission is hereby granted, free of charge, to any person obtaining
 * a copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *
 * 1. The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
 * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
 * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */
#include <stdlib.h>
#include "wiring.h"
#define RESERVE_RAM2 65536
extern unsigned long _heap_start;
extern unsigned long _heap_end;
extern char *__brkval;
void * operator new(size_t size)
{
    if ((char *)&_heap_end - __brkval > RESERVE_RAM2+size) {
        void * new_obj = malloc(size);
        if (new_obj) return new_obj;
    }
    return extmem_malloc(size);
}
void * operator new[](size_t size)
{
    return malloc(size);
}
void operator delete(void * ptr)
{
    if ((uint32_t) ptr >= 0x70000000) {
        extmem_free(ptr);
    } else {
        free(ptr);
    }
}
void operator delete[](void * ptr)
{
    if ((uint32_t) ptr >= 0x70000000) {
        extmem_free(ptr);
    } else {
        free(ptr);
    }
}
void operator delete(void * ptr, size_t size)
{
    if ((uint32_t) ptr >= 0x70000000) {
        extmem_free(ptr);
    } else {
        free(ptr);
    }
}
void operator delete[](void * ptr, size_t size)
{
    if ((uint32_t) ptr >= 0x70000000) {
        extmem_free(ptr);
    } else {
        free(ptr);
    }
}
 
If you haven't already, you can increase the PSRAM clock speed. It defaults to 88Mhz, but 133Mhz is commonly the recommended speed in this forum usually compatible with the "stock" PSRAM chips. I've gone higher than that on PSRAM chips rated for 133Mhz default ((180Mhz one some and 198Mhz on others, reliably).

One other issue is - the performance of smalloc (the code base used for extem_alloc, etc) itself leaves a lot to be desired, it's considerably slower than other malloc routines. I initially took a TLSF implementation and modified and used that for EXTMEM allocations, and it was significantly faster. It did have an unresolved, infrequent bug though, and so I eventually just moved my heap pointer from DMAMEM to PSRAM, allowing the standard newlib malloc to do it's stuff as that is both fast and battle-tested, as the performance was acceptable with oc'd PSRAM and fine for my general needs - anything requiring faster RAM I pre-defined in DMAMEM or DTCM.

I also don't think you need to modify the new, delete, etc directly in the core files - I just redefined them in my main code base, keeping the cores as-is and saving headaches down the line with upgrades, etc.
 
There's a couple of ways to do this, without resorting to overriding the global new and delete functions.

There is placement new; this is a standard c++ feature that lets you explicitly specify the memory location to be used for the new object. So you'd allocate a suitably sized chunk of extmem first, then call placement new to initialize the object. If you delete it, you'd have to manually call the destructor and free the memory separately.

For STL objects like vector and string, you can specify a custom allocator when creating them to make use of extmem instead of the default allocator that uses malloc/free.
 
If you haven't already, you can increase the PSRAM clock speed. It defaults to 88Mhz, but 133Mhz is commonly the recommended speed in this forum usually compatible with the "stock" PSRAM chips. I've gone higher than that on PSRAM chips rated for 133Mhz default ((180Mhz one some and 198Mhz on others, reliably).

[..]

I also don't think you need to modify the new, delete, etc directly in the core files - I just redefined them in my main code base, keeping the cores as-is and saving headaches down the line with upgrades, etc.
You've come to my rescue again @beermat -- thanks for this!

Can confirm that it seems to work fine if I keep the replacement new/delete in my own workspace, very handy to avoid modifying the core packages.

Clocking the PSRAM to 132Mhz hasn't made a noticeably great difference to the start-up time, but when I get chance I'll experiment some more with this and do some benchmarks to see what's happening.

@jmarsh, thanks for this also. The "total override" approach works for now, and works without modifying any other third-party libraries too, but I can see how it will be useful to choose more carefully what-goes-where-in future, so thanks for telling me about "placement"!

Big thanks to everyone here for all this help 🙏
 
I probably should have mentioned, it's also possible to override operator new on a class basis if you wanted to pick and choose which classes utilize extmem. Of course you should also override delete in those cases so the memory for them gets released appropriately.
 
Once the reserve level is hit, I'm really surprised by how much slower it is new()ing things into EXTMEM than into RAM2. Like, it takes 5+ seconds to set up in EXTMEM what only takes less than a second to do usually - to the point where I'll need to add some visual indication during startup that the app is still starting and hasn't frozen.

This doesn't seem right. Is there anything else going on when setting up your objects in extmem.

I would do some profiling with using ARM_DWT_CYCCNT and count how many clock cycles it's taking.
 
I probably should have mentioned, it's also possible to override operator new on a class basis if you wanted to pick and choose which classes utilize extmem. Of course you should also override delete in those cases so the memory for them gets released appropriately.
Oh, interesting, thanks for pointing that out . That'll come in very useful if/when I need to get more selective!
 
This doesn't seem right. Is there anything else going on when setting up your objects in extmem.

I would do some profiling with using ARM_DWT_CYCCNT and count how many clock cycles it's taking.
Hm, yeah potentially quite a lot going on -- its instantiating tons of objects that represent UI controls (or options) and putting them into linked lists.

I've got things working well and quickly enough for now but when I get the urge I'll try and dig into it some more and do some more rigorous benchmarking to try and figure out exactly where the issue is. (Maybe worth noting though that giving even an extra 32K of RAM2 over to be used for this instantiation makes a noticeable difference.)

Thanks again for everyone's contributions here!
 
Hm, yeah potentially quite a lot going on -- its instantiating tons of objects that represent UI controls (or options) and putting them into linked lists.

I've got things working well and quickly enough for now but when I get the urge I'll try and dig into it some more and do some more rigorous benchmarking to try and figure out exactly where the issue is. (Maybe worth noting though that giving even an extra 32K of RAM2 over to be used for this instantiation makes a noticeable difference.)

Thanks again for everyone's contributions here!

Ahh yep, linked lists are horrible when it comes to caching. You would be doing a lot of out of order memory reading and writing on the psram, which would be killing the effeciency.
 
Back
Top