So what will I lose if I don't have DMA, because I don't quite understand how that thing works.
To simplify a bit, the DMA engine is a separate unit in the microcontroller, that transfers 8-bit bytes or 16- or 32-bit words, while the processor itself is working on something else.
ILI9341, ILI9488, ST7789, and other similar displays that support parallel interface, all use a D/C (Data/Command, AKA RS for Register Select) pin, to distinguish commands from parameters/pixel data. Because this is necessary for longer sequences, and setting up the DMA transfer takes enough time so that it is not worth it for short sequences, the optimal data width is 7, 15, or 31 bits.
Long story short, DMA via parallel interfaces to these displays isn't worth it, unless you are willing to use double the RAM for the data sent, and have the 9/17/33 bits available in the same GPIO port (or FlexIO; I haven't used FlexIO yet myself).
If you DO NOT use DMA, the processor must send the data in a loop of some sort. You can still use DMA to receive data from UART or SPI into a buffer. This is mostly a design issue: with parallel interfaces to these displays, you'll want your "main loop" to be around sending data to the TFT display, and fit everything else in/around that.
Fortunately, when you do not use DMA, the processor can do funky stuff to the data when it sends it.
In particular, you can use just 320×480=153,600 bytes for your framebuffer, and 2×256=512 bytes for a 256-entry palette of 16-bit colors, so that your framebuffer can display any 256 unique colors picked from the 65,536 possible ones. When sending pixel data, Teensy will look at the byte in the framebuffer, and instead of sending it directly, use the palette to look up the corresponding color to send to the display. (You can even split the display into multiple regions that have their own palettes.)
Then, if you always refresh the entire display, you can do "animation" by modifying the palette only. This is how
color cycling is done for fractals, for example: only the palette data changes, the framebuffer itself is unchanged. (Of course, because the palette is on the Teensy and not on the display, you do need to update the entire display for this to work. But Teensy 4.0 has ample power; you can even calculate these fractals directly on it.)
In summary, you don't lose much, if you cannot use DMA with parallel interface to these displays. I just wanted you to know the tradeoff.
Also how to do port manipulation correctly
First, you look at the
relevant manual.
For example, on Teensy 4, the GPIO stuff is described on page 1009 forwards. (There is also a note somewhere there that says that each 8-bit byte or 16-bit half of the 32-bit register can be accessed separately. So, an 8-bit access can set/clear/modify bits 0-7, 8-15, 16-23, or 24-31; a 16-bit access can set/clear/modify bits 0-15 or 16-31; and a 32-bit access can set/clear/modify all 32 bits at once.)
Let's ignore direction (input or output) and interrupt capabilities, and concentrate on setting several output pins at the same time. (You use the Teensyduino interfaces to set pin directions et cetera; we're just basically looking at how to do digitalWriteFast() to several pins in parallel.)
On page 1020, we see four interesting registers mentioned: GPIO data register (DR), GPIO data register SET (DR_SET), GPIO data register CLEAR (DR_CLEAR), and GPIO data register TOGGLE (DR_TOGGLE). If we read the following pages, we find that
- DR contains the port output pin states. Pins that are reserved or not outputs, should be zero. You can read and write this register.
- DR_SET is a write-only register. When you write to this register, the bits set in the value cause the corresponding bits in DR to be set.
For example, writing 20+25=33 to this register sets bits 0 and 5 of DR .
- DR_CLEAR is a write-only register. When you write to this register, the bits set in the value cause the corresponding bits in DR to be cleared.
For example, writing 20+23=5 to this register clears bits 0 and 3 of DR .
- DR_TOGGLE is a write-only register. When you write to this register, the bits set in the value cause the corresponding bits in DR to change state.
For example, writing 21+22=6 to this register toggles bits 1 and 2 of DR .
We also know that
GPIOn base address is
0x401B8000+(n-1)*0x4000, where
n is between 1 and 4 inclusive;
GPIO5 base address is
0x400C0000, and
GPIOn base address is
0x42000000+(n-6)*0x4000, where
n is between 6 and 9, inclusive;
and that
DR is at the base address,
DR_SET is at base address plus
0x84,
DR_CLEAR at base address plus
0x88, and
DR_TOGGLE at base address plus
0x8C.
We can also find these constants in the Teensyduino Core, in
hardware/teensy/avr/cores/teensy4/imxrt.h, as
GPIOn_DR,
GPIOn_DR_SET,
GPIOn_DR_CLEAR, and
GPIOn_DR_TOGGLE.
We can either use the
schematics to find out the names of the pins we wish to access and the manual to find their corresponding GPIO bank numbers and bits, or we can do it the easy way, and look at
hardware/teensy/avr/cores/teensy4/core_pins.h, especially the
CORE_PINn_BIT,
CORE_PINn_PORTREG,
CORE_PINn_PORTSET, and
CORE_PINn_PORTCLEAR macro definitions. These are all for Teensy 4, so if using another Teensy model, use the correct core.
Let's say we want to toggle pins 20, 21, and 22, at the same time if possible. We find that the bits are 26, 27, and 24, respectively, and that for Fast I/O, we can use GPIO6 for all of them.
So, to toggle them, we can write
GPIO6_DR_TOGGLE = (1<<26) | (1<<27) | (1<<24);
Or, better yet,
// Note: pins 20, 21, and 22 are assumed to be in GPIO bank 6!
GPIO6_DR_TOGGLE = CORE_PIN20_BITMASK | CORE_PIN21_BITMASK | CORE_PIN22_BITMASK;
(In fact, we can also do a byte access to the most significant byte of GPIO6 DR_TOGGLE register, writing 2
24-24+2
26-24+2
27-24=13 to it, but better leave that sort of optimization for the compiler to worry about.)
The GPIO module in the Teensy will toggle each pin in parallel. That is, if they were high-low-low, they simultaneously become low-high-high.
In the same GPIO bank, a single write to the DR_SET register can set any set of pins, and a single write to the DR_CLEAR register can clear any set of pins, using the same logic as above.
If you look at the implementation of
hardware/teensy/avr/cores/teensy4/core_pins.h:digitalWriteFast(), you'll see that as long as you use literal constants or compiler macros (i.e., use
#define MY_FOO_PIN 20 instead of
const int my_foo_pin = 20;), it optimizes to the thing we did by hand above: it is pretty darned fast. So, unless you are doing something special, you are better off using
digitalWriteFast() with macros (not
consts!).
One such special thing could be using pins 14-19, 22, and 23 for the eight parallel data pins, as they are all in GPIO bank 6:
Pin Bit TFT
19 16 D0
18 17 D1
14 18 D2
15 19 D3
17 22 D4
16 23 D5
22 24 D6
23 25 D7
Do note that there are two additional pins affected,
20 26
21 27
that should be either inputs, used for digital audio or UART 5; not GPIOs. If they are output GPIO pins, we'll set them also.
You see, to set these pins (in the above order, 19,18,14,15,17,16,22,23 from D0 to D7) corresponding to byte
d, we can do
*((volatile uint16_t *)(&GPIO6_DR) + 1) = ((d & 0xF0) << 2) + (d & 0x0F);
where the cast obtains a 16-bit pointer to GPIO6_DR, we add +1 since we want the high 16 bits of this 32 bit register (and Teensy 4.0 is little-endian, least significant byte first); the volatile tells the compiler it cannot cache the access to the register, and the initial asterisk means we dereference the pointer, i.e. access that location. The value we assign is halved, with lower part shifted by 16 bits, and upper part by 18 bits. As we skip the first 16 bits (so as to not affect any GPIO6 bits 0-15, if they happen to be outputs), it means we only need to shift the upper four bits of the byte two places up. We do not need to cast
d to any other type, as C integer promotion rules means that when we do the binary AND operation (
&), the arguments are promoted to
int anyway. Since we only keep the necessary bits, everything else will be zero.
In other words, if pins 20 or 21 are GPIO outputs also, they will be set to low/0 by the above code.
But, like I wrote earlier, the speed at which Teensy 4.0 can set those eight pins using
digitalWriteFast() is probably as fast as the displays can handle anyway, so doing it the complicated -- and very hard to maintain!! -- way like this, is just uncalled for.
At minimum, I'd put a clear comment explaining what it does, and why; and write but comment out the equivalent digitalWriteFast() commands. I know me being me, I'd bumble something at some point anyway, and I'd want to verify that it isn't that which is causing problems, so switching to the known good way of setting the pins is the first step in debugging.