I have a low latency application that requires much more memory than than is available on Teensy 4.0. I want to use a parallel memory. I need to know the turn around time measured in CPU cycles for changing the pin modes a port at a time.

Teensy 4.0 is poor on pins. The address space of the memory needs to be large enough that it alone uses up all the pins. I think I can get around this by using the pins as a two-way bus to supply the address and read back the word.

I need to retrieve a set of audio samples from a memory and overlay them to produce an output. I'm targeting storing 16 unique audio samples of 65535 samples in length. This means a 20 bit address space.

Assuming 44.1khz and 16 bit samples, a Teensy 4.0 will have 10000ish cycles to compute each output sample. I'm looking to have 10+ samples overlaid. This means I will have less than 1000 cycles to get 16 bits from a memory and conduct arithmetic. I think this likely rules out SPI and requires parallel input...but it also faces me with rapidly switching pin modes to achieve the 20 bit address space needed and read the data back in parallel.