SumoToy can answer this a lot better than I can as I have not used the TFT7735, but have done a reasonable amount of hacking on the ILI9341 display, and while testing some stuff for the new beta Teensy board, I played with his library some and know that it uses the same stuff that Paul did for the ILI9341 display.
You might try looking through the postings on the ILI9341 display if you wish to get complete details, but I will take a shot at it:
There are a couple of main features with keeping the display going as fast as possible, but it all boils down to keeping data going out on the SPI queue with no delays between the bytes and minimal delays between commands and Paul did a great job on this.
If you do not have a fifo, your code typically when it want's to output something to the display, spins waiting for the SPI output register to be empty, and then puts a byte out on the queue. This will add some amount of delays. On many processors including the T3.2 on SPI1, you at least have some form of double buffering. It may say it has a queue size of 1 or it may say there are actually two registers, the output register and the shift register... With these you can very often minimize this part of the delay.
However with T3.2 on SPI (SPI0) the queue is more than just a simple queue for outputting the byte (or word), with it you can also encode information on how this command should change the Chip select pins associated with the SPI device. So again with the ILI9341 display there are two other signals other than Clock/MISO/MOSI, which is CS (Chip Select) and DC (Data/Command).
When you are talking to the display, you need to Assert the CS pin in order to tell the SPI device the data is for it. With the ILI9341 you also need to say which byte is the start of a new command. With most display code you will see the code does something like:
<Wait until SPI is empty>,<Assert CS>, <Assert DC>, Output command byte, <wait until SPI is empty>, <Deassert DC>, output data bytes... <wait for SPI empty> <Deassert CS>.
Which introduces delays between bytes, as we we wait for something to complete, before we can change the other two signals with some form of digitalWrite command... With these drivers, we encode the appropriate changes to the CS/DC pins as part of the command going out to the SPI, which is where the big win is.
But to do this, you need to make sure CS and DC are on pins that can be controlled by SPI0. These are shown in several places, including the page for the ILI9341 display that PJRC sells:
http://www.pjrc.com/store/display_ili9341.html
Hope that helps.