TravisSmith
Member
Hello all-
I have a Teensy 4.1 project that would greatly benefit by being able to read a single byte of a >1MB array in the order of 200nS. Unfortunately, I can't squeeze enough space out of RAM, so I'm looking to read directly from the on-board flash (preferred since there's plenty) or external PSRAM or flash if needed. I also need full random access, so caching doesn't help in this case.
Currently, using the following code, I can read a (cache miss) byte in about 400nS from on-board flash and >500nS for PSRAM. I haven't bothered playing with the ext clk freq since it would have to be twice as fast or more.
If I change the code to provide a scope trigger:
I see the following on the trigger and clock (ext PSRAM in this case)

Zooming in, I count about 40 clocks before the function returns the byte I need. However, I believe QSPI should only need about 14 clocks (2 inst, 6 addr, 4 turn-around, 2 payload) for a single byte, so it appears to be loading at least the full 32 bit value, or more, before returning.
So, my question is, is there a way to instruct it to read a single byte and return quickly? Or other method to more quickly fetch a single byte?
I thought about going to the manual SPI method for external, but not sure about the overhead or ability for the local flash. Would prefer to use the integrated capabilities, but I'll take whatever ideas you may have.
Thank you much!
I have a Teensy 4.1 project that would greatly benefit by being able to read a single byte of a >1MB array in the order of 200nS. Unfortunately, I can't squeeze enough space out of RAM, so I'm looking to read directly from the on-board flash (preferred since there's plenty) or external PSRAM or flash if needed. I also need full random access, so caching doesn't help in this case.
Currently, using the following code, I can read a (cache miss) byte in about 400nS from on-board flash and >500nS for PSRAM. I haven't bothered playing with the ext clk freq since it would have to be twice as fast or more.
Code:
volatile uint8_t dummy;
//volatile uint8_t* RAM_Image = (uint8_t*)0x60000000; //on-board Flash
volatile uint8_t* RAM_Image = (uint8_t*)0x70000000; //ext PSRAM
StartCycCnt = ARM_DWT_CYCCNT;
dummy = RAM_Image[Cnt];
StartCycCnt = ARM_DWT_CYCCNT - StartCycCnt;
If I change the code to provide a scope trigger:
Code:
digitalWriteFast(TriggerPin, HIGH);
dummy = RAM_Image[Cnt];
digitalWriteFast(TriggerPin, LOW);
I see the following on the trigger and clock (ext PSRAM in this case)

Zooming in, I count about 40 clocks before the function returns the byte I need. However, I believe QSPI should only need about 14 clocks (2 inst, 6 addr, 4 turn-around, 2 payload) for a single byte, so it appears to be loading at least the full 32 bit value, or more, before returning.
So, my question is, is there a way to instruct it to read a single byte and return quickly? Or other method to more quickly fetch a single byte?
I thought about going to the manual SPI method for external, but not sure about the overhead or ability for the local flash. Would prefer to use the integrated capabilities, but I'll take whatever ideas you may have.
Thank you much!
Last edited: