Any way to remap GPIOs?

Status
Not open for further replies.

Rena

Active member
I'm using Teensy 3.2 to drive an ILI9481 TFT LCD on a breakout board. The protocol for transferring image data (after initialization) is pretty simple: you send X, Y, width, height, and pixel data over the 8 data lines, using the WR line as a clock. (WR has to be pulled low and then high again each time a byte is written.)

To improve performance I've connected the TFT board's 8 data lines to the 8 GPIOs that correspond to PORTC bits 0-7. So instead of 8 separate digitalWrites I can just write the byte to GPIOC_PDOR. However it still takes two more digitalWrites to toggle the WR line each time. I also can't see any way to use DMA for this with the WR requirement.

One solution I realized might be to connect WR to C8 and do two 16-bit writes to GPIOC_PDOR, first with bit 8 clear and then set, to set the data and WR lines at the same time.

The problem with this approach is that C8 is pin 28, on the bottom of the Teensy board, difficult to reach. Also one of the C pins is 13, so the LED flashes while transferring data, and I don't know what C12 through C15 are connected to. (Is it safe to touch those?)

Is there any way to change which bits of the port registers map to which physical pins, so that I can drive all of the TFT control+data lines from one register and not worry about the LED and the extra bits? Or some other clever way to push the data out a bit faster?
 
The ports are tied to the physical hardware, and support for the Arduino pinout and minimising resorce conflicts means the ports are not neatly arranged. Ugly hack method would be to pulse the read process with hardware, but would question how fast the display can actually read the byte? Might be worth setting one of the byte pins to gnd and put WR on the 8 bit port and run a test write to a sub section of the display and confirm it can actually keep up. The teensy pin toggle is a fast command as long as you use a pin that's known at compile time so suspect you would not gain much in terms of writing pixels.

That said would be a good trick if you could DMA a whole write cycle (4 bytes+8 WR changes) while CPU preped the next pixel.
 
Unfortunately all 8 data bits are needed to send commands during initialization. Maybe I could physically switch the wires around afterward...

I was looking at using DMA, but I'm not sure how I'd achieve the "set D0-D7, WR low, WR high" for each byte, besides connecting WR as a "9th bit", but that would then quadruple the memory requirements (each byte has to be doubled, plus a second byte to hold the WR state). I was hoping the DMA controller had some kind of signal I could use. It might be possible to set up some kind of clever multi-channel trick in which one channel transfers a byte, then another transfers a zero and a one to the WR line. I'll have to look into that.

I don't know how much faster DMA is vs the CPU, but the display seems to have no trouble keeping up so far, even when writing a constant value in a loop. I'll do some more tests but I suspect I'm never going to outspeed it.

Right now I'm mainly concerned about the upper bits of GPIOC. Are they just unused, or are they mapped to some internal peripherals that won't be happy with me spamming bits at them?
 
Have you tried simply using digitalWriteFast for the WR pin. If you give it a fixed IO pin number, the code code reduces down to one memory write.
 
I'm actually using a library I made from scratch. It does also optimize digitalWrite to a single memory write.
 
and I don't know what C12 through C15 are connected to. (Is it safe to touch those?)

They aren't implemented in this chip. But even if they were, you can safely write to the GPIO register without actually affecting pins which are in their default disabled state, or are configured to be controlled by a peripheral instead of GPIO. The pins are only actually controlled by GPIO when their mux register is configured to ALT1.

Details for the registers are in chapter 11, and the huge table showing all the setting for all pins is in chapter 10.
 
Are you trying to achieve more than 600 fps on the lcd? I have had issues optimizing setting the IO and toggling WR too fast. It starts to corrupt the data that is sent, and I had to add some 'nop' to improve the timings. You may need to check the timings for the display.

for my 16 bit display I do this:

Code:
#define __PINRD 15 // bit positions
#define __PINRS 16
#define __PINWR 17
#define __PINCS 18
#define __PINRST 19

#define CHILL asm volatile("nop\n nop\n nop\n nop\n");

#define RD_LOW GPIOC_PCOR = 0x1 << __PINRD;
#define RD_HIGH GPIOC_PSOR = 0x1 << __PINRD;

#define RS_LOW GPIOC_PCOR = 0x1 << __PINRS;
#define RS_HIGH GPIOC_PSOR = 0x1 << __PINRS;

#define WR_LOW GPIOC_PCOR = 0x1 << __PINWR;
#define WR_HIGH GPIOC_PSOR = 0x1 << __PINWR;

#define CS_LOW GPIOC_PCOR = 0x1 << __PINCS;
#define CS_HIGH GPIOC_PSOR = 0x1 << __PINCS;

#define RST_LOW GPIOC_PCOR = 0x1 << __PINRST;
#define RST_HIGH GPIOC_PSOR = 0x1 << __PINRST;

#define PULSE_WR WR_LOW CHILL WR_HIGH CHILL
#define PULSE_RD RD_LOW CHILL RD_HIGH CHILL

#define setIO(a) (GPIOD_PDOR = a)

and then I just call "PULSE_WR" to toggle the WR line. I use the macro "setIO" as shown so that I dont call any functions to set the data lines.

The "CHILL" macro is to add some delay between the WR low and high.
 
I did some benchmarking and I'm not getting anywhere near that. At 72mHz, using the display in landscape mode (480x320 resolution, 16 bits per pixel):

  • Clearing to a solid colour, where both bytes are the same (no need to change the GPIO, just pulse WR 320*480*2 times) takes 34163us, giving ~29 FPS.
  • Clearing to a solid colour where the bytes differ (GPIO value needs to be set each time) takes 38442us, giving ~26FPS.
  • Uploading one line from RAM takes 196us, giving ~16FPS for all 320 lines (assuming we had that much RAM).
  • Uploading one line from flash takes 200us, giving ~15FPS for all 320 lines.

Maybe I don't need it to be any faster (or I could increase the CPU speed) but I'm really curious how you're reaching 600FPS?
 
OK. I am not getting 600 fps, but I am above 100 fps for all screen draws. I am using a different display, in 16bit mode, so I set GPIO (I am using the whole PORT D for the data lines) once per pixel. I am also using a Teensy 3.6, and so running at 180 Mhz, therefore, my results will be different than yours.

I have other optimizations going on, such as Canvas drawing, before sending to the display.
 
Ahh, that explains a lot.

I did manage to improve performance quite a bit though with a few tweaks. I found that using GPIOx_PTOR instead of the bit-band version is much faster, and unrolling loops also sped things up considerably.
 
Status
Not open for further replies.
Back
Top