Call to arms | Teensy + SDRAM = true

Great that the SDRAM Experiment shows :)

NO IDEA WHAT :alien: the Christmas Miracle :ninja: or a great delusion? Added the p#57 code init to PSRAM/SDRAM(EVKB) sketch:
> built with restore unedited imxrt1062_mm.ld : no association for the SDRAM region ... just sdramconfig.address = 0x80000000;
Is this with the extra MPU configuration lines added, to make the SEMC area cacheable? I can understand why direct (uncached) writes are faster than reads - the CPU simply issues a write on the bus and doesn't need to wait for a response - but things are different if caching is enabled, since a 32-bit write means a whole cacheline must be fetched first to perform a partial update.
 
@defragster - thanks for testing looking good. Will experiment more when I get the board.

Agree but at least we now know that SDRAM works with the current board. So will make it easier to debug whether its an issue with the added code to make it a board or not. :)

Think I am done for the night

EDIT: looks faster than on the EVKB :)

INDEED, SDRAM (all 32 MEGABYTES) Seems properly and usably WIRED! { Nice PCB Copy Paste @Dogbone06 }

I wondered how the EVKB speed compared - as noted that defaults to DEBUG -NO_OPT builds that doesn't help it.

I know I'm Done - as noted sources are a mess here, at least I am after cobbling that together and keeping straight EVKB .vs. T_4.1 .vs T_MM_DIY.
> Hopefully github has some usable record, the DIY.LD has the wrong address - and using that with new BOARD a total build fail losing MICROMOD setup.

Concept and DevBoard ver 4.0 seem perfect @Dogbone06 - just need to tackle that CAP
CODE SUPPORT: Awesome work @PaulStoffregen and @mjs513 - while I revised the full validation test sketch and gathered Ref info and got the ball rolling here :)

Test sketch is typical @defragster :sick: - #ifdef and 3+ sketches combo in one that also runs under MCUexpresso IDE for EVKB - so good luck reading it ... the build can ... and it works :)
 
Is this with the extra MPU configuration lines added, to make the SEMC area cacheable? I can understand why direct (uncached) writes are faster than reads - the CPU simply issues a write on the bus and doesn't need to wait for a response - but things are different if caching is enabled, since a 32-bit write means a whole cacheline must be fetched first to perform a partial update.
As far as could be known (see note sources a mess) the startup.c in use does have the provided MPU cache lines in place and used in building
Wasn't sure where to add them seems they went here - with comment but no #ifdef
FLASHMEM void configure_cache(void) { //printf("MPU_TYPE = %08lX\n", SCB_MPU_TYPE); //printf("CCR = %08lX\n", SCB_CCR); // TODO: check if caches already active - skip? SCB_MPU_CTRL = 0; // turn off MPU uint32_t i = 0; SCB_MPU_RBAR = 0x00000000 | REGION(i++); //https://developer.arm.com/docs/146793866/10/why-does-the-cortex-m7-initiate-axim-read-accesses-to-memory-addresses-that-do-not-fall-under-a-defined-mpu-region SCB_MPU_RASR = SCB_MPU_RASR_TEX(0) | NOACCESS | NOEXEC | SIZE_4G; SCB_MPU_RBAR = 0x00000000 | REGION(i++); // ITCM SCB_MPU_RASR = MEM_NOCACHE | READONLY | SIZE_512K; // TODO: trap regions should be created last, because the hardware gives // priority to the higher number ones. SCB_MPU_RBAR = 0x00000000 | REGION(i++); // trap NULL pointer deref SCB_MPU_RASR = DEV_NOCACHE | NOACCESS | SIZE_32B; SCB_MPU_RBAR = 0x00200000 | REGION(i++); // Boot ROM SCB_MPU_RASR = MEM_CACHE_WT | READONLY | SIZE_128K; SCB_MPU_RBAR = 0x20000000 | REGION(i++); // DTCM SCB_MPU_RASR = MEM_NOCACHE | READWRITE | NOEXEC | SIZE_512K; SCB_MPU_RBAR = ((uint32_t)&_ebss) | REGION(i++); // trap stack overflow SCB_MPU_RASR = SCB_MPU_RASR_TEX(0) | NOACCESS | NOEXEC | SIZE_32B; SCB_MPU_RBAR = 0x20200000 | REGION(i++); // RAM (AXI bus) SCB_MPU_RASR = MEM_CACHE_WBWA | READWRITE | NOEXEC | SIZE_1M; SCB_MPU_RBAR = 0x40000000 | REGION(i++); // Peripherals SCB_MPU_RASR = DEV_NOCACHE | READWRITE | NOEXEC | SIZE_64M; SCB_MPU_RBAR = 0x60000000 | REGION(i++); // QSPI Flash SCB_MPU_RASR = MEM_CACHE_WBWA | READONLY | SIZE_16M; SCB_MPU_RBAR = 0x70000000 | REGION(i++); // FlexSPI2 SCB_MPU_RASR = MEM_CACHE_WBWA | READWRITE | NOEXEC | SIZE_16M; // TEENSY_SDRAM SCB_MPU_RBAR = 0x80000000 | REGION(i++); // SDRAM SCB_MPU_RASR = MEM_CACHE_WBWA | READWRITE | NOEXEC | SIZE_32M;
 
Last edited:
The sketch version does not use the DQS pin as Paul mentioned - if you remove the cap you will have to change a couple of lines around line 111:

Code:
// turn on SEMC hardware, same settings as NXP's SDK
SEMC_MCR |= SEMC_MCR_MDIS | SEMC_MCR_CTO(0xFF) | SEMC_MCR_BTO(0x1F);
// uncomment to enable SEMC_MCR_DQSMD (EMC_39) but make sure you comment out the above line first
//SEMC_MCR |= SEMC_MCR_MDIS | SEMC_MCR_CTO(0xFF) | SEMC_MCR_BTO(0x1F) | SEMC_MCR_DQSMD;
@mjs513 - if you put into those lines in place with #ifdef NOCAP on the github examples main/examples I will risk editing the board here to remove the CAP to confirm that works.
Then the 'psram'(teensyEVKB_xRAM_memtest) test sketch can get RE-re-Hacked to sit in loop() and do a single WRITE and repeat ReRead some 'x' times to confirm long term integrity when the first single passes all worked.

Just added another work/feature item on github: src/smalloc_pool.extmem32MB.txt
> when SDRAM present then it can be used like PSRAM for smalloc()
> that involves having a way to know SDRAM is present and the size and add #ifdef in some few functions like for : #ifdef ARDUINO_TEENSY41

To make a new BOARD notes in ARDUINO_TEENSY_MICROMOD.txtwould need resolution. Is this DevBoard V4.0 exactly how @Dogbone06 will need this board without pin changes? Any pin changes would need reworked and any new TeensyDuino would need them re-merged if not made as a DIY with SDRAM PJRC supported option.
 
Good to hear the SDRAM is working.

When you run the test and measure speed, remember to note clock speed you're using for the SDRAM. If the divider is set at 5, you're running the SDRAM at 133 MHz. To test at 166 MHz, just change the divider to 4. To try 198 MHz, you'll need to also change the mux bits to select a different PLL source, as I mentioned in msg #62. But if you do the simplest possible thing with the same source and only change the divider to 3, you'll get 221 MHz.

248 MB/sec is quite good for 133 MHz, since the raw speed of 16 bits on every clock (no CAS latency, precharge, refresh, other overhead) would be 266 MB/sec.

The higher you try to run, the more important tuning that C29 capacitor (might) become.

The SDRAM chip is rated for 166 MHz, so attempting 198 or 221 MHz would be considered overclocking and may or may not actually work, or might work only at lower temperatures.
 
Last edited:
There has been some discussion about wanting to add this board to the official Core/TD. Would you entertain this suggestion when its all said and done?

Undecided at this point.

My current thinking is most of this should go into a library or user sketch, perhaps using one of the startup hooks if it's needed before C++ constructors.

The MPU setup probably isn't practical to do that way. Configuring the region for SDRAM by default seems harmless.
 
My current thinking is most of this should go into a library or user sketch, perhaps using one of the startup hooks if it's needed before C++ constructors.

The MPU setup probably isn't practical to do that way. Configuring the region for SDRAM by default seems harmless.
Thanks Paul
Working on the library version as we speak - but think going for coffee first!!!
 
here is your library with an example. For some reason cant find the .h file if I put it under src?Must have forgot something. Any suggestions.

Anyway library.properties may need to get updated.
 

Attachments

  • SDRAM_t4.zip
    8.8 KB · Views: 77
Any suggestions.

In SDRAM_t4.h, I'd recommend declaring all the functions as static, and add an empty constexpr constructor. Calling class member functions from the startup hooks is technically not correct, but like our Serial, HardwareSerial, SPI, Wire classes this ought to be enough for it to work anyway. Would be nice if the SDRAM could be initialized from the middle or late startup hook, so a C++ class needing large memory could access it from its constructor.


wondering about smalloc support?

Not sure about this. For now, plan on separately named library functions, such as sdram_malloc() rather than extmem_malloc(). In theory, anyone could build custom hardware with both the PSRAM chip and a SDRAM chip, though I have a hard time imagining why. But for now, let's plan to keep these separate and maybe discuss again in the distant future after SDRAM support is mature whether combining them makes sense.
 
@PaulStoffregen - pulling extmem_malloc to sdram_malloc seems a doable plan. Indeed, two PSRAM's for 16 MB is the same $Cost at 1 Faster 32MB SDRAM ... 'given' that option.

Just:: CAP REMOVED on @Dogbone06 DevBoard V4.0 - And fails without going to : #define NOCAP 1 // as expected: current lib on https://github.com/Defragster/EVKB_1060/tree/main/SDRAM_t4

With NOCAP and clockDiv 5 at 133 speed is same 40.62 secs {for 32MB and all 57 test Pattern calls}
With NOCAP and ClockDiv=4 at 166 MHz the test time is 37.29 secs! {for 32MB and all 57 test Pattern calls}

Versus T_4.1 PSRAM completing same test over only 16MB in 72.07 seconds { << 3.86 times faster at 166 MHz }

note: a significant portion of the test timing is the calc of the psuedo random data stream on 45 of the 57 test values

Reworking and cleaning/LessHacking {ext SDRAM_t4 library and no EVKB support) the PJRC PSRAM code that "#if 1" runs on SDRAM
 
PSRAM/SDRAM Test sketch updated with more cute stats: https://github.com/Defragster/EVKB_1060/tree/main/SDRAM_t4/examples/SDRAMdiy_Test

T_4.1 16 MB PSRAM:
EXTMEM Memory Test, 16 Mbyte
EXTMEM Memory begin, 70000000
EXTMEM Memory end, 71000000
...
test Pattern 00000000 Write 650325 us Read/Test 589825 us
test RndSeed 1100533073 Write 654325 us Read/Test 617310 us
test ran for 72.08 seconds
Fixed Pattern Write ran for 8.46 and Read/Test 7.67 secs
Fixed Pattern Test 25.80 MB per sec
PsuedoRnd Patt Write ran for 28.79 and Read/Test 27.16 secs
PsuedoRnd Patt Test 25.16 MB per sec
All memory tests passed :)

32MB SDRAM on 'T_MM' DevBoard {CAP removed at 166 MHz):
EXTMEM Memory Test, 32 Mbyte
EXTMEM Memory begin, 80000000
EXTMEM Memory end, 82000000
... // note SDRAM writes faster than reads and RndSeed adds PsuedoRnd compute time
test Pattern 00000000 Write 102720 us Read/Test 295892 us
test RndSeed 1100533073 Write 237702 us Read/Test 506766 us
test ran for 37.94 seconds
Fixed Pattern Write ran for 1.34 and Read/Test 3.85 secs
Fixed Pattern Test 160.55 MB per sec **
Fixed Pattern Test WRITES 311.50 MB per sec
Fixed Pattern Test & READ 108.17 MB per sec

PsuedoRnd Patt Write ran for 10.46 and Read/Test 22.30 secs
PsuedoRnd Patt Test 85.97 MB per sec **
All memory tests passed :)

** These MB/sec combine MB Write and Read times in testing

Here is the NXP doc on SDRAM write versus read: https://www.nxp.com/docs/en/application-note/AN12437.pdf
1703624792428.png
 
Last edited:
In SDRAM_t4.h, I'd recommend declaring all the functions as static, and add an empty constexpr constructor.
Got and done will push it in a few minutes but here is what is in it:

Code:
class SDRAM_t4 {
public:
    constexpr SDRAM_t4() {};
    static bool init();
    
private:
    static unsigned int ns_to_clocks(float ns, float freq);
    static void configure_sdram_pins();
    static bool SendIPCommand(uint32_t address, uint16_t command, uint32_t write, uint32_t *read);
    static bool IPCommandComplete();
    
    //set NOCAP to 1 if cap C29 is removed
    uint8_t NOCAP = 0;
    
};

@defragster - note the change where i put NOCAP setting.
 
You have a buyer right here if that board ever becomes available for purchase.
They will not be sold. They are only for development. If you want to buy an SDRAM board, talk to Paul. If there’s enough interest then maybe he’ll make one. I will never create products that compete with Teensy, sorry. :giggle:
 
Folks - getting confused answering dev stuff in 2 places. Going to put my comments here :)

SDRAM_t4 has been updated with recommended changes in the library. Also deleted src directory - no idea why it does not find SDRAM_t4 in the src directory - ran into this issue before.
 
SDRAM at 166MHz with NOCAP is meeting NXP Spec Speed on DevBoard/T_MM!
AND COMPLETING 57 full pass I/O cycles with NO ERRORS!

And yes, it fails at 221 MHz OC: 10 patterns worked and two did not and ALL Psuedo Random testing FAILED.
@defragster that's great news. Now I know what to do when I get the board.
 
Folks - getting confused answering dev stuff in 2 places. Going to put my comments here :)

SDRAM_t4 has been updated with recommended changes in the library. Also deleted src directory - no idea why it does not find SDRAM_t4 in the src directory - ran into this issue before.
Even worse chatting with @Dogbone06 on added back channels :)

NOCAP change NOTED, Thx. Since it only runs at SDRAM Spec speed of 166MHz with NOCAP the default value is debatable?
> Maybe do the same for clockdiv so edit is in 'smaller' .h not 'big' .cpp?: const unsigned int clockdiv = 4;
- those two are related without the cap. It came off cleanly - but that thing is TINY putting back on doubtful.

And yes, it fails at 221 MHz OC: 10 patterns worked (same val repeated weaker test) and two did not and ALL Psuedo Random testing FAILED.
The mod to PJRC PSRAM test does the same single run of tests in setup. Then rather than doing nothing in loop() as uploaded when no errors it repeats all tests where after the single WRITE pass it sits and Reads/Compares for those values readRepeat = 5; times.

@mjs513 does it seem possible to add the EXTMEM for SDRAM as Paul p#88 notes into the same SDRAM_t4 library?

Still running:
test ran for 37.94 seconds Fixed Pattern Write ran for 1.34 and Read/Test 3.85 secs Fixed Pattern Test 160.58 MB per sec Fixed Pattern Test WRITES 311.52 MB per sec Fixed Pattern Test & READ 108.17 MB per sec PsuedoRnd Patt Write ran for 10.46 and Read/Test 22.30 secs PsuedoRnd Patt Test 85.97 MB per sec All memory tests passed :-)
 
As far as could be known (see note sources a mess) the startup.c in use does have the provided MPU cache lines in place and used in building
Wasn't sure where to add them seems they went here - with comment but no #ifdef
It looks to me like that code is effectively unused - your main() calls NXP's BOARD_ConfigMPU() which wipes the existing MPU configuration and redoes it from scratch.
 
@mjs513 does it seem possible to add the EXTMEM for SDRAM as Paul p#88 notes into the same SDRAM_t4 library?
Just from briefly looking at the code the simple answer is Yes. Looks doable. Not sure how it would fit into existing library until I start digging - its been a few years since I last looked at all that stuff to be honest.

Edit: Mentioned this on the other comm channel.

Just as an edit - if you follow the t41 pattern and add it to the mm ld. You could add this to the mm.ld
ERAM (rwx): ORIGIN = 0x70000000, LENGTH = 16384K (have to mod of course

and this would be added as well
.bss.extram (NOLOAD) : {
*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.externalram)))
. = ALIGN(32);
} > ERAM

EDIT2: took a peak at extmem.c and doesn't look bad. Just need to decide whether we are going to use existing T41 or TMM ld or create a new one.
 
Last edited:
The extram.c equivalent for sdram.c would be something like this:
Code:
// External  SDRAM memory allocation functions.  Attempt to use external memory,
// but automatically fall back to internal RAM if external RAM can't be used.

#include <stdlib.h>
#include "smalloc.h"
#include "wiring.h"

#if defined(ARDUINO_TEENSY41)
// Teensy 4.1 external RAM address range is 0x80000000 to 0x8FFFFFFF
#define HAS_SDRAM
#define IS_SDRAM(addr) (((uint32_t)ptr >> 28) == 7)
#endif


void *sdram_malloc(size_t size)
{
#ifdef HAS_SDRAM
    void *ptr = sm_malloc_pool(&sdram_smalloc_pool, size);
    if (ptr) return ptr;
#endif
    return malloc(size);
}

void sdram_free(void *ptr)
{
#ifdef HAS_SDRAM
    if (IS_SDRAM(ptr)) {
        sm_free_pool(&sdram_smalloc_pool, ptr);
        return;
    }
#endif
    free(ptr);
}

void *sdram_calloc(size_t nmemb, size_t size)
{
#ifdef HAS_SDRAM
    // Note: It is assumed that the pool was created with do_zero set to true
    void *ptr = sm_malloc_pool(&sdram_smalloc_pool, nmemb*size);
    if (ptr) return ptr;
#endif
    return calloc(nmemb, size);
}

void *sdram_realloc(void *ptr, size_t size)
{
#ifdef HAS_SDRAM
    if (IS_SDRAM(ptr)) {
        return sm_realloc_pool(&extmem_smalloc_pool, ptr, size);
    }
#endif
    return realloc(ptr, size);
}
 
It looks to me like that code is effectively unused - your main() calls NXP's BOARD_ConfigMPU() which wipes the existing MPU configuration and redoes it from scratch.

That code is from the PJRC core ... not seeing it repeated in current driver? NXP code not in use.
 
Back
Top