Call to arms | Teensy + SDRAM = true

The extram.c equivalent for sdram.c would be something like this:

Figures you'd get that looked at while I made and ate lunch :)
> So it seems fitting to create\Include the support for that when the SDRAM_t4 driver is in use ... (not #ifdef T_4.1)

Got current SDRAM_t4 lib and default 'with cap' at 133 MHz runs:
test ran for 41.20 seconds
Fixed Pattern Write ran for 1.67 and Read/Test 4.59 secs
Fixed Pattern Test 132.80 MB per sec
Fixed Pattern Test WRITES 248.71 MB per sec
Fixed Pattern Test & READ 90.58 MB per sec

PsuedoRnd Patt Write ran for 10.47 and Read/Test 24.47 secs
PsuedoRnd Patt Test 80.60 MB per sec
All memory tests passed :)
And edit No Cap and 166 MHz - with a quick fix to the library (on github):
test ran for 37.94 seconds
Fixed Pattern Write ran for 1.34 and Read/Test 3.85 secs
Fixed Pattern Test 160.58 MB per sec
Fixed Pattern Test WRITES 311.51 MB per sec
Fixed Pattern Test & READ 108.17 MB per sec

PsuedoRnd Patt Write ran for 10.46 and Read/Test 22.30 secs
PsuedoRnd Patt Test 85.97 MB per sec
All memory tests passed :)
Bump to 166 from 133 huge diff in fixed pattern (normal write) versus lost in the overhead of psuedoRand value calc's!
 
Looks like the SEMC_SDRAMCR1 config didn't get the -1 adjustment mentioned in msg #53.

Should probably edit this:

Code:
   SEMC_SDRAMCR1 =
        SEMC_SDRAMCR1_ACT2PRE(ns_to_clocks(42, freq)) | // tRAS: ACTIVE to PRECHARGE
        SEMC_SDRAMCR1_CKEOFF(ns_to_clocks(42, freq)) |  // self refresh
        SEMC_SDRAMCR1_WRC(ns_to_clocks(12, freq)) |     // tWR: WRITE recovery
        SEMC_SDRAMCR1_RFRC(ns_to_clocks(67, freq)) |    // tRFC or tXSR: REFRESH recovery
        SEMC_SDRAMCR1_ACT2RW(ns_to_clocks(18, freq)) |  // tRCD: ACTIVE to READ/WRITE
        SEMC_SDRAMCR1_PRE2ACT(ns_to_clocks(18, freq));  // tRP: PRECHARGE to ACTIVE/REFRESH

to this:

Code:
   SEMC_SDRAMCR1 =
        SEMC_SDRAMCR1_ACT2PRE(ns_to_clocks(42, freq)-1) | // tRAS: ACTIVE to PRECHARGE
        SEMC_SDRAMCR1_CKEOFF(ns_to_clocks(42, freq)-1) |  // self refresh
        SEMC_SDRAMCR1_WRC(ns_to_clocks(12, freq)-1) |     // tWR: WRITE recovery
        SEMC_SDRAMCR1_RFRC(ns_to_clocks(67, freq)-1) |    // tRFC or tXSR: REFRESH recovery
        SEMC_SDRAMCR1_ACT2RW(ns_to_clocks(18, freq)-1) |  // tRCD: ACTIVE to READ/WRITE
        SEMC_SDRAMCR1_PRE2ACT(ns_to_clocks(18, freq)-1);  // tRP: PRECHARGE to ACTIVE/REFRESH

Would be interesting to see if this has any noticeable impact on the memtest speed.
 
Also if you want to try 198 MHz overclock, change this

Code:
    const unsigned int clockdiv = 5;

    CCM_CBCDR = (CCM_CBCDR & ~(CCM_CBCDR_SEMC_PODF(7))) |
        CCM_CBCDR_SEMC_CLK_SEL | CCM_CBCDR_SEMC_ALT_CLK_SEL |
        CCM_CBCDR_SEMC_PODF(clockdiv-1);

to this:

Code:
    const unsigned int clockdiv = 2; // PLL2_PFD2 / 2 = 396 / 2 = 198 MHz

    CCM_CBCDR = (CCM_CBCDR & ~(CCM_CBCDR_SEMC_PODF(7) | CCM_CBCDR_SEMC_ALT_CLK_SEL)) |
        CCM_CBCDR_SEMC_CLK_SEL | CCM_CBCDR_SEMC_PODF(clockdiv-1);

If it doesn't work, maybe try soldering a 5pF or 10pF capacitor at C29...
 
Last edited:
Looks like the SEMC_SDRAMCR1 config didn't get the -1 adjustment mentioned in msg #53.
...
to this:

Code:
   SEMC_SDRAMCR1 =
        SEMC_SDRAMCR1_ACT2PRE(ns_to_clocks(42, freq)-1) | // tRAS: ACTIVE to PRECHARGE
        SEMC_SDRAMCR1_CKEOFF(ns_to_clocks(42, freq)-1) |  // self refresh
        SEMC_SDRAMCR1_WRC(ns_to_clocks(12, freq)-1) |     // tWR: WRITE recovery
        SEMC_SDRAMCR1_RFRC(ns_to_clocks(67, freq)-1) |    // tRFC or tXSR: REFRESH recovery
        SEMC_SDRAMCR1_ACT2RW(ns_to_clocks(18, freq)-1) |  // tRCD: ACTIVE to READ/WRITE
        SEMC_SDRAMCR1_PRE2ACT(ns_to_clocks(18, freq)-1);  // tRP: PRECHARGE to ACTIVE/REFRESH

Would be interesting to see if this has any noticeable impact on the memtest speed.
built with warnings:
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp: In static member function 'static bool SDRAM_t4::init()':
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp:282:53: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses]
282 | SEMC_SDRAMCR1_ACT2PRE(ns_to_clocks(42, freq)-1) | // tRAS: ACTIVE to PRECHARGE
| ~~~~~~~~~~~~~~~~~~~~~~^~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/imxrt.h:8542:53: note: in definition of macro 'SEMC_SDRAMCR1_ACT2PRE'
8542 | #define SEMC_SDRAMCR1_ACT2PRE(n) ((uint32_t)(n & 0x0F)<<20)
| ^
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp:283:52: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses]
283 | SEMC_SDRAMCR1_CKEOFF(ns_to_clocks(42, freq)-1) | // self refresh
| ~~~~~~~~~~~~~~~~~~~~~~^~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/imxrt.h:8543:53: note: in definition of macro 'SEMC_SDRAMCR1_CKEOFF'
8543 | #define SEMC_SDRAMCR1_CKEOFF(n) ((uint32_t)(n & 0x0F)<<16)
| ^
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp:284:49: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses]
284 | SEMC_SDRAMCR1_WRC(ns_to_clocks(12, freq)-1) | // tWR: WRITE recovery
| ~~~~~~~~~~~~~~~~~~~~~~^~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/imxrt.h:8544:53: note: in definition of macro 'SEMC_SDRAMCR1_WRC'
8544 | #define SEMC_SDRAMCR1_WRC(n) ((uint32_t)(n & 0x07)<<13)
| ^
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp:285:50: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses]
285 | SEMC_SDRAMCR1_RFRC(ns_to_clocks(67, freq)-1) | // tRFC or tXSR: REFRESH recovery
| ~~~~~~~~~~~~~~~~~~~~~~^~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/imxrt.h:8545:53: note: in definition of macro 'SEMC_SDRAMCR1_RFRC'
8545 | #define SEMC_SDRAMCR1_RFRC(n) ((uint32_t)(n & 0x1F)<<8)
| ^
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp:286:52: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses]
286 | SEMC_SDRAMCR1_ACT2RW(ns_to_clocks(18, freq)-1) | // tRCD: ACTIVE to READ/WRITE
| ~~~~~~~~~~~~~~~~~~~~~~^~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/imxrt.h:8546:53: note: in definition of macro 'SEMC_SDRAMCR1_ACT2RW'
8546 | #define SEMC_SDRAMCR1_ACT2RW(n) ((uint32_t)(n & 0x0F)<<4)
| ^
T:\T_Drive\tCode\libraries\SDRAM_t4\SDRAM_t4.cpp:287:53: warning: suggest parentheses around '-' in operand of '&' [-Wparentheses]
287 | SEMC_SDRAMCR1_PRE2ACT(ns_to_clocks(18, freq)-1); // tRP: PRECHARGE to ACTIVE/REFRESH
| ~~~~~~~~~~~~~~~~~~~~~~^~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/imxrt.h:8547:53: note: in definition of macro 'SEMC_SDRAMCR1_PRE2ACT'
8547 | #define SEMC_SDRAMCR1_PRE2ACT(n) ((uint32_t)(n & 0x0F)<<0)
|
very minor increase:

Fixed Pattern Write ran for 1.33 and Read/Test 3.84 secs
Fixed Pattern Test 160.99 MB per sec
Fixed Pattern Test WRITES 312.55 MB per sec
Fixed Pattern Test & READ 108.41 MB per sec


from above:
Fixed Pattern Write ran for 1.34 and Read/Test 3.85 secs
Fixed Pattern Test 160.58 MB per sec
Fixed Pattern Test WRITES 311.51 MB per sec
Fixed Pattern Test & READ 108.17 MB per sec
 
The extram.c equivalent for sdram.c would be something like this:
Code:
#define IS_SDRAM(addr) (((uint32_t)ptr >> 28) == 7)
I know this is copied from core but if you're going to duplicate it, please fix this macro - the argument is addr but ptr is tested instead.
 
The test we were discussing at the time was this one: https://github.com/Defragster/EVKB_...EVKB_xRAM_memtest/teensyEVKB_xRAM_memtest.ino
(linked from #74)
which overrides main(), which then reconfigures the MPU.
Still not following - what line#? if the left edge github web number is clicked it puts that in the URL. (and really cool how a selection does line numbers)

That code is now retired having served its purpose of dual execute EVBK & DevBoard so it is ifdef he11 - NXP calls main() and then main2() calls setup(). setup() has the test and ifdef excludes Teensy specific stuff. Looks like anything before line #406 is NXP IDE specific. Paul's p#57 Teensy SDRAM init starts at line 656 and that code has been moved as currently used into SDRAM_T4 library. The code between is the PJRC PSRAM test w/mods.
 
Still not following - what line#? if the left edge github web number is clicked it puts that in the URL. (and really cool how a selection does line numbers)

That code is now retired having served its purpose of dual execute EVBK & DevBoard so it is ifdef he11 - NXP calls main() and then main2() calls setup(). setup() has the test and ifdef excludes Teensy specific stuff. Looks like anything before line #406 is NXP IDE specific. Paul's p#57 Teensy SDRAM init starts at line 656 and that code has been moved as currently used into SDRAM_T4 library. The code between is the PJRC PSRAM test w/mods.
main() called BOARD_ConfigMPU(), which wipes the existing MPU config and rewrote it.
That is why I was asking you if you were sure the cache was enabled, because the NXP MPU configuration has a bunch of extra ifdef stuff in it that can configure the SDRAM in different ways.
Teensy's MPU configuration was irrelevant because it was either being overwritten (ResetHandler2() branches to main() at the end after the MPU setup) or never being invoked at all.
 
main() called BOARD_ConfigMPU(), which wipes the existing MPU config and rewrote it.
That is why I was asking you if you were sure the cache was enabled, because the NXP MPU configuration has a bunch of extra ifdef stuff in it that can configure the SDRAM in different ways.
Teensy's MPU configuration was irrelevant because it was either being overwritten (ResetHandler2() branches to main() at the end after the MPU setup) or never being invoked at all.
Thanks for the pointer. That code for sure is 'as provided' in the EVK 1060 example and only for the EVKB board where it calls main() as it is under: #ifndef TeensySketch. Given the 'baremetal' nature of the NXP project that is probably needed as there may not be anything useful done on entry to main().

And in the NXP code I'm pretty sure there was NO cache enabled as I put a syntax error under one of the: #if defined(CACHE_MAINTAIN) && CACHE_MAINTAIN. I started there to have a code path to follow just to see the settings needed to get SDRAM working ... Then Paul just provided the critical code needed to init on PJRC's cores for Teensy. {Thanks again @Paul - you and this Teensy environment rock!}

AFAIK having the cache work takes another 'driver' include that wasn't built into that project. That IDE has a bit of a learning curve I never got through. To get GPT timer for manitou's micros tick I ended up digging into a project with it and copying the needed 'gpt' .c .h 'driver' to have the #include not fail and then the code worked.
 
It would also be interesting to see some 8-bit benchmarks rather than 32-bit, since it's 16-bit RAM... the cache would play a big part here in speeding up writes but also eLCDIF supports 8-bit (paletted) modes (8-bit pixels specifying indexes into a 32-bit palette).
 
It would also be interesting to see some 8-bit benchmarks rather than 32-bit, since it's 16-bit RAM... the cache would play a big part here in speeding up writes but also eLCDIF supports 8-bit (paletted) modes (8-bit pixels specifying indexes into a 32-bit palette).
With current full 32 MB pass the Cache is wasted. You are correct, it would be good to do a pass with 8 and 16 bit values. The NXP Example claimed to do that. Perhaps even a version without the Cache_Flush to see if it cycles any faster. Maybe a 16KB block test with cache allowed to see if read speed really goes up from 4X PSRAM.

And need to contrive a test with interrupts and UART or i2c activity in some fashion to see if doing 'other work' is tolerated as it should be.
 
@PaulStoffregen
In post #112 @jmarsh recommend changing the macro
Code:
#define IS_SDRAM(addr) (((uint32_t)ptr >> 28) == 7)
to what I am assuming would be
In post #112 @jmarsh recommend changing the macro
Code:
#define IS_SDRAM(addr) (((uint32_t)addr>> 28) == 7)

would this have to get changed in the core for extmem as well?
 
I have kludged up the following core files:
  • pgmspace.h
  • smalloc.h
  • startup.c
  • wiring.c
  • imxrt1062_mm.ld
that includes a kludge for sdram where you can comment out
Code:
#define extSDRAM
to exclude using sdram, at least for the .h/.c files. Probably a better way to do the kludge for testing purposes.

also added a extSDRAM.c file that is the equivalent of the extmem.c

the SDRAM_t4 library was also moded to include the init operation when then sdram instance is created. Could just create a SDRAM.c from the SDRAM_t4.cpp file - would make it easier.

The sdram_t4.zip had only sdram_t4.h file change.

Could test that by just removing the init from setup in the sample sketch.

Cheers
 

Attachments

  • sdram_core.zip
    14.4 KB · Views: 267
  • SDRAM_t4.zip
    554 bytes · Views: 52
Just on ... stopped by PM's - now I see what is up here ... fun!

Github is: https://github.com/Defragster/EVKB_1060/tree/main
Seeing SDRAM_t4 got updated.
Made placeholder folder to pull 'coresBeta' for Paul's updates
Made placeholder folder for mjs513 work for 'mallocSDRAM'

Started to add a MB transfer count in the test but didn't so manually calculating after 17 hours error free the updated PSRAM/SDRAM test on the DevBoard had done 410 passes in loop (32MB 1 write 5 read/test) : 4,487,040 MB
Still running now with 432 passes complete.
 
Pulled @PaulStoffregen cores updates and no build warnings!
> The Rd and Wr speeds at 166 MHz are stable and micro improved
> above overnight test got to 489 passes before new upload from 410 in prior post - no issue.
Did not look at @mjs513 SDRAMextmem yet.

Did steal PJRC's memory_copy() and called it memory_copy32() in sketch - it does _asm xfer!
Then using the provided "ifdef 0" C source I made a 16 and 8 bit version (with @jmarsh reminder)
Rather than make a NEW test the old PJRC pseudo-random check_lfsr_pattern() was modified:
> instead of FILL write 32MB and Read/Test all 32MB
> Fill 16MB Half with pseudo-random
>> 'HalfCopy' that 16MB to the upper 16MB { either with 32 bit registers or 16 or 8 bit pointers }
>> Read/Test Upper 16MB to be the expected 32 bit values as written to the lower 16MB

That cuts 57 combined iteration test time down to 21.52 seconds because the pseudo-random generation is only used half as much, but the memory_copyXX() doing an extra 16MB read and 16MB write HalfCopy results in the same amount of data transferred - and without the overhead of the extra pseudo-random calc's is giving over double the observed net MB/sec for those tests. The 'Fixed Pattern' was not modified from full 32MB write and read/test - as copy errors could be hidden by the unchanging nature of the 'fixed' value'
test RndSeed HalfCopy 1427530695 Write 118848 us Read/Test 252850 us
test RndSeed HalfCopy 1100533073 Write 118846 us Read/Test 252850 us
test ran for 21.52 seconds
Fixed Pattern Write ran for 1.33 and Read/Test 3.84 secs
Fixed Pattern Test 160.99 MB per sec
Fixed Pattern Test WRITES 312.57 MB per sec
Fixed Pattern Test & READ 108.41 MB per sec

PsuedoRnd Patt Write ran for 5.23 and Read/Test 11.13 secs
PsuedoRnd Patt Test 172.18 MB per sec
All memory tests passed :)
Loop doTest() count=18
versus prior runs just ended:
test RndSeed 1427530695 Write 237691 us Read/Test 505700 us
test RndSeed 1100533073 Write 237691 us Read/Test 505700 us
test ran for 37.88 seconds
Fixed Pattern Write ran for 1.33 and Read/Test 3.84 secs
Fixed Pattern Test 160.99 MB per sec
Fixed Pattern Test WRITES 312.57 MB per sec
Fixed Pattern Test & READ 108.41 MB per sec

PsuedoRnd Patt Write ran for 10.46 and Read/Test 22.25 secs
PsuedoRnd Patt Test 86.09 MB per sec
All memory tests passed :)
Loop doTest() count=489

And for confirmation here are the 16 and 8 bit copy codes:
void memory_copy16(uint32_t *d32, const uint32_t *s32, uint32_t *de32) { uint16_t *dest = (uint16_t *)d32; uint16_t *src = (uint16_t *)s32; uint16_t *dest_end = (uint16_t *)de32; if (dest == src) return; do { *dest++ = *src++; } while (dest < dest_end); } void memory_copy8(uint32_t *d32, const uint32_t *s32, uint32_t *de32) { uint8_t *dest = (uint8_t *)d32; uint8_t *src = (uint8_t *)s32; uint8_t *dest_end = (uint8_t *)de32; if (dest == src) return; do { *dest++ = *src++; } while (dest < dest_end); }
 
160MB per second, thats not bad!
Do we expect even faster speeds with optimizations or is this it? How fast is the internal 1062 RAM as comparison?
 
DTCM has a dedicated 64 bit bus clocked at 600 MHz, so it can be expected to run a lot faster. Crafting code which can leverage its full speed is quite tricky.
 
Back
Top