Sorry, there are multiple things involved here and I am not sure what you're asking or what your needs are:
When I am mentioning 16mhz, this is the SPI speed. Not the CPU speed. That is by default the processor will still be running at 600mhz
What the SPI speed impacts are things like how long it takes to update the screen. For example with the ILI9341 display, to output a full
screen of pixels, the CPU will send about 320*240*2 bytes to the display (plus a few others to setup ).
Each of these bytes will drive the SCLK pin 8 times per byte and it is this speed we are talking about. How fast you can drive these
pins reliably depends on many things like: Max speed that the chip says it can handle.
View attachment 35632
Which looks like by spec SPI should not be > 10MHZ... But usually, we can go higher... How high? Not sure
Also, it depends on the connection. For example, I have my own boards, where I mount the teensy and have a connector
to plug the display into. Typically, I can drive this a lot faster than hooking up the display, with it plugged into a breadboard using jumper wires...
So the display will take however long it takes to send that number of bytes to it. How much overhead on the system depends. If you simply call things like: tft.fillScreen(RED); by default it be in this call as long as it takes to send the data. However, with some of our libraries, we have the concept of Frame buffer. When enabled the code will simply output the bytes into a memory buffer, and we can then choose when to actually update the display. This update can be done using DMA.
However, if your concern about the speed is how fast can you update the screen: Again... Depends!
a) If you are only updating small portions of the screen at a time, the can be implemented to just update those areas. And in the case you
only send the bytes associated with those regions of the screen.
b) You use a different interface between the processor and the Teensy... For example 8 or 16 bit parallel. With this instead of needing 16 clocks to output a pixel, it only takes 1 or 2 clocks of data. They are not typically 8 or 16 times faster as there is still overhead... But they are a lot faster.
But trade off is a lot more IO pins are required:
That is for SPI it typically takes: SCLK, MISO, MOSI, CS, DC (plus power and gnd, and maybe reset).
For Parallel interface like an ILI9486 on Teensy 4.1 with 8 bit parallel, we are typically using the following pins:
Code:
#define DISPLAY_WR 36
#define DISPLAY_RD 37
#define DISPLAY_D0 19
#define DISPLAY_D1 18
#define DISPLAY_D2 14
#define DISPLAY_D3 15
#define DISPLAY_D4 40
#define DISPLAY_D5 41
#define DISPLAY_D6 17
#define DISPLAY_D7 16
#define DISPLAY_D8 22
#define DISPLAY_D9 23
#define DISPLAY_D10 20
#define DISPLAY_D11 21
#define DISPLAY_D12 38
#define DISPLAY_D13 39
#define DISPLAY_D14 26
#define DISPLAY_D15 27
The bottom one are only needed for 16 bit. With our current code, most of these pins are semi-fixed in that we are using
subsystem called FlexIO and we want all of the D0-D7 need to be contiguous FlexIO pins...
In addition to thise pins you still, need, the CS and DC pins and optionally Reset.
On Teensy 4.1 these are on FlexIO3, which does not support DMA, so any asynchronous type updates we do using interrupts.
Sorry for the shotgun answer, as not sure what you are asking.