Unlike the Teensy 3.x processors, I'm not aware of any of the display driver libraries having special optimizations if you use the special CS pins. In the Teensy 3.x processors, you needed both the CS and the D/C pins to be in the special set. The Teensy 4.0 did not have 2 CS pins for any of the 3 SPI ports. In theory, the Teensy 4.1 could do the optimization. On the other hand, the Teensy 4.0/4.1 drivers now have DMA support.
Minor background comments:
CS Pins: Most libraries do not do anything special with the CS pin. That is they typically use something like digitalWrite to set and clear the state. However, there are some libraries, like some of our display libraries that uses the hardware to control the CS (and/or DC) pins.
On T3.x - in most of the display libraries, I tried to relax the requirement that both CS and DC had to be hardware CS pins. There is a significant performance difference with the DC pin being on one and some for the CS pin.
Why: on The T3.x processors you can encode the state of up to 4 hardware CS pins onto each item you push onto the Hardware output queue. Without this: each time the DC pin changes state, you have to wait until the hardware has completely shifted out the last bits of the data being output before you change the state of the IO pin and then can queue up the next data. This leaves gaps between groups of bytes output.
So if your code does something like: tft.drawPixel(x, y, color):
This outputs something like: <2A> xs xs xe xe <2B> ys ys ye ye <2C> CC CC
Note: some of the <2A> ... and <2B> ... parts on some displays might be optimized out if X start and end are the same as previous.
But in the above case the state of DC changed 6 times.
on T4.x - The Chip select pins work differently. As I mentioned in the T3.x section, the output queue you can encode the state of each of up to 4 CS pins states as part of the queue as a mask. With T4.x the data that you can encode onto the SPI output queue is which of the 4 CS pins should be asserted during the output. So, for example you cannot tell it to have both the DC and CS pins asserted. ...
In some of the libraries there is some performance gain if you use a hardware CS pin for the DC pin.
Good luck