Currently the read speed for onboard flash on Teensy 4.x is not as optimal as it could be.
It supports a "continuous read" mode where successive reads can skip sending the command byte (which is 0xEB for reads). Since this byte is sent in SPI mode using one data line, it takes 8 FlexSPI clocks (60ns / 36 CPU cycles @ 600MHz) for every read. FlexSPI supports this mode by using JUMP_ON_CS in the read LUT entry; the only tricky part is disabling continuous read mode if we want to send a different command to the flash (e.g. erasing/writing), but that's pretty easy to fix.
PR to implement it is here: https://github.com/PaulStoffregen/cores/pull/785
It supports a "continuous read" mode where successive reads can skip sending the command byte (which is 0xEB for reads). Since this byte is sent in SPI mode using one data line, it takes 8 FlexSPI clocks (60ns / 36 CPU cycles @ 600MHz) for every read. FlexSPI supports this mode by using JUMP_ON_CS in the read LUT entry; the only tricky part is disabling continuous read mode if we want to send a different command to the flash (e.g. erasing/writing), but that's pretty easy to fix.
PR to implement it is here: https://github.com/PaulStoffregen/cores/pull/785