Teensy 4.1 Speed up the clock for PSRAM (and LittleFS_QSPI...)

KurtE

Senior Member+
Often times when I am playing around with a Teensy 4.1 that has PSRAM and I wish to use it, I find that the default 88mhz speed is too slow.

I am playing around right now with a T4.1 with an NT35510 display on it that is setup using a 16 data pin parallel interface, using 24 bit color, as such
I allocated a frame buffer in PSRAM 800*480*4 = 1536000, which needless to say it won't fit in DTCM nor DMAMMEM ;)

When I try to do an update screen, to the display with FlexIO speed of about 30mhz,
the display does not update properly. Note: it also does not update properly using 16 bit colors, but does if configured for 120mhz...
Note: the 24 bit color mode also has issues at 120mhz PSRAM speed.

Originally, we started off at 88Mhz and there was a TODO to speed it up, as you can see in the blame listing, that was 4 years ago:
1724431538230.png

My assumption is that if we have not changed the default in 4 years, we probably wont.

In the past (and with my current sources I have this edited) like:
Code:
// turn on clock  (TODO: increase clock speed later, slow & cautious for first release)
    //CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
    //    | CCM_CBCMR_FLEXSPI2_PODF(5) | CCM_CBCMR_FLEXSPI2_CLK_SEL(3); // 88 MHz
        CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
            | CCM_CBCMR_FLEXSPI2_PODF(5) | CCM_CBCMR_FLEXSPI2_CLK_SEL(1); // 120?
    CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);

Which allows me to try things out, but this is a global change and as maybe not all PSRAMs are created equal,
and likewise I assuming that this same setting is used for QSPI Flash chips as well.

So, I was wondering if we should add an Api to core, that allows us to change it on a per sketch basis.

I currently experimenting in one of my sketches with a function that currently looks like:

C++:
void update_psram_speed(int speed_mhz) {
    // What clocks exist:
    static const int flexspio2_clock_speeds[] = { 396, 720, 665, 528 };

    // See what the closest setting might be:
    uint8_t clk_save, divider_save;
    int min_delta = speed_mhz;
    for (uint8_t clk = 0; clk < 4; clk++) {
        uint8_t divider = (flexspio2_clock_speeds[clk] + (speed_mhz / 2)) / speed_mhz;
        int delta = abs(speed_mhz - flexspio2_clock_speeds[clk] / divider);
        if ((delta < min_delta) && (divider < 8)) {
            min_delta = delta;
            clk_save = clk;
            divider_save = divider;
        }
    }

    // first turn off FLEXSPI2
    CCM_CCGR7 &= ~CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);

    divider_save--; // 0 biased.
    Serial.printf("Update FLEXSPI2 speed: %u clk:%u div:%u Actual:%u\n", speed_mhz, clk_save, divider_save,
        flexspio2_clock_speeds[clk_save]/ (divider_save + 1));

    // Set the clock settings.
    CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
                | CCM_CBCMR_FLEXSPI2_PODF(divider_save) | CCM_CBCMR_FLEXSPI2_CLK_SEL(clk_save);

    // Turn FlexSPI2 clock back on
    CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);
}
Probably overkill, but wondering if something like this should be added to cores...

Kurt
 
As a datapoint, the APS6404L is the main PSRAM currently available on the market and the one I install on Fully Loaded versions of Teensy. This is also what Paul is shipping as well per his earlier comments.

I have started testing all Teensy 4.1 with these PSRAM using the standard PSRAM memory test running at the higher 132Mhz speed for around the last 500 units using the code below that someone had posted and I have seen zero failures.

Code:
//**************************
//Reset QSPI clock from 88Mhz to 132 Mhz.
      CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_OFF);
      CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
          | CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz
      CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);
//**************************

I leave that speed setting with the NOR and NAND flash chip test that are included in the same program and there also have been no failures reported, but I also don't know if the Flash is actually making use of that higher speed. Been curious if that was the case as the smaller NOR chips are rated for 133MHz, but the larger NAND chips are only rated for 104MHz.

Since the Teensy is all about performance, it seems like a waste not to make it as easy as possible to maximize the PSRAM speed when it is needed IMHO. Also be nice to clarify whether that speed is used and/or safe if the NAND Flash chip is also installed.
 
@KurtE
Can't seem to find the post where Frank B and I were testing different PSRAM clock speeds but did find a bunch of posts from 2020 where I kept mentioning we were testing at 132 Mhz without any issue so kept recommending that we bump the clock up. So what you propose is probably the way we should go with it.

Think the only thing I would add is that if its set over 132 default to 132mhz just in case
 
@KurtE
Sent you a PR to your variants_override branch to add a new function to clockspeed.c, set_psram_clock(int speed_mhz). Just did a rename to make in line with set_arm_clock
MEMTEST EXAMPLE:

C++:
extern "C" uint8_t external_psram_size;
extern "C" void set_psram_clock(int speed_mhz);

bool memory_ok = false;
uint32_t *memory_begin, *memory_end;

bool check_fixed_pattern(uint32_t pattern);
bool check_lfsr_pattern(uint32_t seed);

void setup()
{
    while (!Serial) ; // wait
    pinMode(13, OUTPUT);
    uint8_t size = external_psram_size;
    Serial.printf("EXTMEM Memory Test, %d Mbyte\n", size);
    if (size == 0) return;

  set_psram_clock(132);

    const float clocks[4] = {396.0f, 720.0f, 664.62f, 528.0f};
    const float frequency = clocks[(CCM_CBCMR >> 8) & 3] / (float)(((CCM_CBCMR >> 29) & 7) + 1);
    Serial.printf(" CCM_CBCMR=%08X (%.1f MHz)\n", CCM_CBCMR, frequency);
    memory_begin = (uint32_t *)(0x70000000);
    memory_end = (uint32_t *)(0x70000000 + size * 1048576);
    elapsedMillis msec = 0;
    if (!check_fixed_pattern(0x5A698421)) return;
    if (!check_lfsr_pattern(2976674124ul)) return;
    if (!check_lfsr_pattern(1438200953ul)) return;
    if (!check_lfsr_pattern(3413783263ul)) return;
    if (!check_lfsr_pattern(1900517911ul)) return;
    if (!check_lfsr_pattern(1227909400ul)) return;
    if (!check_lfsr_pattern(276562754ul)) return;
    if (!check_lfsr_pattern(146878114ul)) return;
    if (!check_lfsr_pattern(615545407ul)) return;
    if (!check_lfsr_pattern(110497896ul)) return;
    if (!check_lfsr_pattern(74539250ul)) return;
    if (!check_lfsr_pattern(4197336575ul)) return;
    if (!check_lfsr_pattern(2280382233ul)) return;
    if (!check_lfsr_pattern(542894183ul)) return;
    if (!check_lfsr_pattern(3978544245ul)) return;
    if (!check_lfsr_pattern(2315909796ul)) return;
    if (!check_lfsr_pattern(3736286001ul)) return;
    if (!check_lfsr_pattern(2876690683ul)) return;
    if (!check_lfsr_pattern(215559886ul)) return;
    if (!check_lfsr_pattern(539179291ul)) return;
    if (!check_lfsr_pattern(537678650ul)) return;
    if (!check_lfsr_pattern(4001405270ul)) return;
    if (!check_lfsr_pattern(2169216599ul)) return;
    if (!check_lfsr_pattern(4036891097ul)) return;
    if (!check_lfsr_pattern(1535452389ul)) return;
    if (!check_lfsr_pattern(2959727213ul)) return;
    if (!check_lfsr_pattern(4219363395ul)) return;
    if (!check_lfsr_pattern(1036929753ul)) return;
    if (!check_lfsr_pattern(2125248865ul)) return;
    if (!check_lfsr_pattern(3177905864ul)) return;
    if (!check_lfsr_pattern(2399307098ul)) return;
    if (!check_lfsr_pattern(3847634607ul)) return;
    if (!check_lfsr_pattern(27467969ul)) return;
    if (!check_lfsr_pattern(520563506ul)) return;
    if (!check_lfsr_pattern(381313790ul)) return;
    if (!check_lfsr_pattern(4174769276ul)) return;
    if (!check_lfsr_pattern(3932189449ul)) return;
    if (!check_lfsr_pattern(4079717394ul)) return;
    if (!check_lfsr_pattern(868357076ul)) return;
    if (!check_lfsr_pattern(2474062993ul)) return;
    if (!check_lfsr_pattern(1502682190ul)) return;
    if (!check_lfsr_pattern(2471230478ul)) return;
    if (!check_lfsr_pattern(85016565ul)) return;
    if (!check_lfsr_pattern(1427530695ul)) return;
    if (!check_lfsr_pattern(1100533073ul)) return;
    if (!check_fixed_pattern(0x55555555)) return;
    if (!check_fixed_pattern(0x33333333)) return;
    if (!check_fixed_pattern(0x0F0F0F0F)) return;
    if (!check_fixed_pattern(0x00FF00FF)) return;
    if (!check_fixed_pattern(0x0000FFFF)) return;
    if (!check_fixed_pattern(0xAAAAAAAA)) return;
    if (!check_fixed_pattern(0xCCCCCCCC)) return;
    if (!check_fixed_pattern(0xF0F0F0F0)) return;
    if (!check_fixed_pattern(0xFF00FF00)) return;
    if (!check_fixed_pattern(0xFFFF0000)) return;
    if (!check_fixed_pattern(0xFFFFFFFF)) return;
    if (!check_fixed_pattern(0x00000000)) return;
    Serial.printf(" test ran for %.2f seconds\n", (float)msec / 1000.0f);
    Serial.println("All memory tests passed :-)");
    memory_ok = true;
}

bool fail_message(volatile uint32_t *location, uint32_t actual, uint32_t expected)
{
    Serial.printf(" Error at %08X, read %08X but expected %08X\n",
        (uint32_t)location, actual, expected);
    return false;
}

// fill the entire RAM with a fixed pattern, then check it
bool check_fixed_pattern(uint32_t pattern)
{
    volatile uint32_t *p;
    Serial.printf("testing with fixed pattern %08X\n", pattern);
    for (p = memory_begin; p < memory_end; p++) {
        *p = pattern;
    }
    arm_dcache_flush_delete((void *)memory_begin,
        (uint32_t)memory_end - (uint32_t)memory_begin);
    for (p = memory_begin; p < memory_end; p++) {
        uint32_t actual = *p;
        if (actual != pattern) return fail_message(p, actual, pattern);
    }
    return true;
}

// fill the entire RAM with a pseudo-random sequence, then check it
bool check_lfsr_pattern(uint32_t seed)
{
    volatile uint32_t *p;
    uint32_t reg;

    Serial.printf("testing with pseudo-random sequence, seed=%u\n", seed);
    reg = seed;
    for (p = memory_begin; p < memory_end; p++) {
        *p = reg;
        for (int i=0; i < 3; i++) {
            // https://en.wikipedia.org/wiki/Xorshift
            reg ^= reg << 13;
            reg ^= reg >> 17;
            reg ^= reg << 5;
        }
    }
    arm_dcache_flush_delete((void *)memory_begin,
        (uint32_t)memory_end - (uint32_t)memory_begin);
    reg = seed;
    for (p = memory_begin; p < memory_end; p++) {
        uint32_t actual = *p;
        if (actual != reg) return fail_message(p, actual, reg);
        //Serial.printf(" reg=%08X\n", reg);
        for (int i=0; i < 3; i++) {
            reg ^= reg << 13;
            reg ^= reg >> 17;
            reg ^= reg << 5;
        }
    }
    return true;
}

void loop()
{
    digitalWrite(13, HIGH);
    delay(100);
    if (!memory_ok) digitalWrite(13, LOW); // rapid blink if any test fails
    delay(100);
}
 
@KurtE
Can't seem to find the post where Frank B and I were testing different PSRAM clock speeds but did find a bunch of posts from 2020 where I kept mentioning we were testing at 132 Mhz without any issue so kept recommending that we bump the clock up. So what you propose is probably the way we should go with it.

Think the only thing I would add is that if its set over 132 default to 132mhz just in case
I have one that can't handle over 120MHz. It is sold from a vendor bundled with the T4.1.
 
I have one that can't handle over 120MHz. It is sold from a vendor bundled with the T4.1.
During the beta testing we had gone through the full range of clock frequencies, 132 was just the max of the chip PJRC uses for PSRAM.

Using the approach that @KurtE suggested, set_psram_clock will allow you select the max freq for the any chip that you decide to use.
 
Potentially could also take it one step further and add it as a menu to the T4.1.
Maybe like add to boards.txt:
Code:
menu.psram=PSRam Speed
...
teensy41.menu.psram.120=120 MHz
teensy41.menu.psram.88=88 MHz
teensy41.menu.psram.132=132 MHz
teensy41.menu.psram.88.build.psramspeed=88
teensy41.menu.psram.120.build.psramspeed=120
teensy41.menu.psram.132.build.psramspeed=132

Probably need to update platform.txt as well: Like maybe add:
Code:
## Compile c files
recipe.c.o.pattern="{compiler.path}{build.toolchain}{build.command.gcc}" -c {build.flags.optimize} {build.flags.common} {build.flags.dep} {build.flags.c} {build.flags.cpu} {build.flags.defs} -DARDUINO={runtime.ide.version} -DARDUINO_{build.board} -DF_CPU={build.fcpu} -DF_PSRAM={build.psramspeed} -D{build.usbtype} -DLAYOUT_{build.keylayout} {includes} "{source_file}" -o "{object_file}"

And then change startup.c to look for this define and there call off to the API, else call it with default value...

I have not tried it. Not sure if would need to do anything special for defining this for non-T41 boards...
And obviously if we were to do something like that could add add other speeds as well.

Also might give it a different name, as it impacts both PSRAM as well as QFlash LittleFS drives...
 
A few people reported being unable to also use a flash chip when the clock speed is higher. That's the main reason I've been reluctant to increase the default.
 
I have the Lyontek LY68L6400 - 8MB PSRAM installed in the TGX4 guitar pedal, with reverb and delay extensively using the external ram.
Just tried the 132MHz setting. Everything works fine so far, but the setting has to be done before configuring the AudioMemory. Otherwise i got a boot loop.
 
I have one that can't handle over 120MHz. It is sold from a vendor bundled with the T4.1.
The Lyontek LY68L6400S is the one part I tested that did not like 132MHz. As far as I know they are no longer available but there are Teensy running around with them on the board. At the time only the standard 88MHz test was being run.

The original ESP PSRAM64H seem to work fine at 132MHz at least the few that I have gone back and tested.

Unfortunately all the chips have the same chip ID it looks like, so can't easily determine which mfr of PSRAM is onboard via software.

Pio and my posts crossed paths. I only had one Lyontek part to test so perhaps there is some variability with those parts.
 
A few people reported being unable to also use a flash chip when the clock speed is higher. That's the main reason I've been reluctant to increase the default.
I have been testing the following Flash chips at 132MHz.
W25Q128JV, W25N01GVZEIG, W25N02KVZEIR

The test is pretty basic. With the NAND chips, littleFS automatically formats them. A directory is then created and a couple of small text files are written to the directory and then one is read back and printed to the screen for verification and then the chip is erased. It is based on test software posted here by someone when LittleFS was being developed to work with new larger Flash chips.

The test works fine for verifying the chips are installed correctly and basically functional, but not sure if it is really adequate to catch subtle timing issues at the higher speed.
 
The one I have is an ESP PSRAM64H.
So it seems like the APS6404L is the only chip that has had a lot of testing at the higher speed without any reported failures (so far). However looking at the datasheet it has this note.

Performance: Clock rate up to
o 133MHz for 32 Bytes Wrapped Burst operation at VDD=3.0V+/-10%
o 109MHz for 32 Bytes Wrapped Burst operation at VDD=3.3V+/-10%
o 84MHz for Linear Burst operation commands
(that can cross page boundary)

Seems to like a lower VDD for higher speed. Teensy tends to be right at around 3.3V at the chip, at least on the couple that I just measured with no external load on the 3.3V, which is right on the border between the 133MHz vs 109MHz max clock speed. They also recommend a 1uF bypass cap and the Teensy has a .22uf, though probably not major a factor.
 
A few people reported being unable to also use a flash chip when the clock speed is higher. That's the main reason I've been reluctant to increase the default.
Can totally understand. However, the flip side is also true. I have had the PSRAM fail to work properly when used with a few different displays, like the NT35510, where I try to use it as a Frame buffer. The current setup is 16 bit parallel to display, 24 bits per color 800x480 pixels. If I try to do a simple update screen (read this as writeRect from the PSRAM to the display, at 88mhz, nothing shows up. At 120 maybe some/most does, at 122 it appears to be happy. This display is touchy with the FlexIO. If I run it at 30mhz it runs fine, at 20 it fails. ...

So maybe we should make it easier for developers to choose the speed here that works for them. And it looks like this may be on a case-by-case basis. i.e.. which Flash was installed on each board... Or which version of PSRAM.

So far the API, which was mentioned earlier appears to work fine. As an extension we could have menu items, like I mentioned.

Alternatives would be to maybe like @jmarsh mentioned in an earlier post of using an overridable weak define for the speed.

Like change startup.c from something like:
Code:
// turn on clock  (TODO: increase clock speed later, slow & cautious for first release)
    //CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
    //    | CCM_CBCMR_FLEXSPI2_PODF(5) | CCM_CBCMR_FLEXSPI2_CLK_SEL(3); // 88 MHz
        CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
            | CCM_CBCMR_FLEXSPI2_PODF(5) | CCM_CBCMR_FLEXSPI2_CLK_SEL(1); // 120?
    CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);

to something like:
Code:
const int FLISPI2_SPEED_MHZ __attribute__((weak)) = 88;
    set_psram_speed(FLISPI2_SPEED_MHZ);

And potentially if a sketch wants to override it, they could define it:
Code:
extern "C" {
    const int FLISPI2_SPEED_MHZ=132;
}
 
A few people reported being unable to also use a flash chip when the clock speed is higher. That's the main reason I've been reluctant to increase the default.
Sorry for the delay in responding. Rather out of it. Wife and I are both pretty sick with some bug. So checking forum only periodically now.

I agree with @KurtE. When the default remains at 88Mhz there needs to be some way the user to change the psram clock depending on need and if the chip will support. This can be done with the set_psram_clock method or using the @KurtE's define method without the need for the user to edit the core. This avoids the user having to remember to update core when the core is eventually updated. Been burnt on this a few times.

This will also satisfy the number of posts I have seen where users want to increase the clock of the PSRAM. Responded to a number of those in the past
 
Back
Top