Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 9 of 9

Thread: ILI9341 Parallel without using direct GPIO manipulation

  1. #1

    ILI9341 Parallel without using direct GPIO manipulation


    I've spent a while modifying the Adafruit_TFTLCD library to get the ILI9341 breakout board working with the T3.1.

    Currently, it runs very slow (at a similar speed to the speed of the Due in this demo)

    This is most likely because of the way I write each byte of data. Instead of using
    #define write8inline(wr) { GPIOD_PDOR = (wr); }
    from this thread, this is what I've defined in pin_magic.h
    #define write8inline(wr) { \
        digitalWriteFast(0, (wr & (1<<0))); \
        digitalWriteFast(1, (wr & (1<<1))); \
        digitalWriteFast(2, (wr & (1<<2))); \
        digitalWriteFast(3, (wr & (1<<3))); \
        digitalWriteFast(4, (wr & (1<<4))); \
        digitalWriteFast(5, (wr & (1<<5))); \
        digitalWriteFast(6, (wr & (1<<6))); \
        digitalWriteFast(7, (wr & (1<<7))); \
        WR_STROBE; }
    Is there any better way to optimise this? My understanding of digitalWriteFast is that it's as fast as it'll get.

    Unfortunately, I have no option to use different pins.

    Last edited by BrianC; 12-27-2014 at 01:34 AM.

  2. #2
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Maybe use the UTFT library? It's already ported and optimized for Teensy 3.1.

  3. #3
    Perhaps I wasn't clear enough on my issue.
    From what I see the UTFT library uses the B and D ports of the Teensy. I am unable to connect the lcd to any of these ports, but instead have connected the pins to D0 though to D7. Is there any hope of getting it any faster?

  4. #4
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Not really. If you want it to run fast, you'll need to use connections similar to UTFT.

    There's an old thread on this forum, back when UTFT was ported by Dawnmist, about which pins can be used. There were a few different ways, but all were pretty similar and involved wiring the 8 or 16 bits to specific pins that correspond to the Kinetis native 8 bit ports.

  5. #5
    ok then. Thanks.

  6. #6
    Unfortunately, I've made a pcb that doesn't have the right connections. The 8 pins are all split between 4 ports so I'm assuming it probably wouldn't be any faster than the method I'm using currently.

  7. #7
    Senior Member
    Join Date
    Feb 2013
    Is it completely impossible to re-make your boards? I've got an ILI9341 screen (not the adafruit ones, so my init code is probably at least somewhat different) running parallel on GPIOD and it's fast as hell. Currently on my overclocked (to 144mhz) teensy 3.1 custom board, it completes one pass of the entire adafruit gfx test sketch in 1.286 seconds. I'm not sure exactly how slow your screen is running, or how fast the UTFT optimized library runs the same test (Paul, maybe you have some metrics posted elsewhere/somewhere?) but I guess you'd have to make a cost/benefit analysis of whether or not to make new PCBs for the added speed boosts. I will say this - my screen/menu system had previously been setup to run on a Teensy ++ 2.0, in parallel, and I put a button 'down' state for any touched button. Now, on the 3.1, I'm probably eventually going to remove all that code, because it's so fast that you never actually SEE that button state before it gets completely redrawn. It's fast enough that I've implemented some kinda snazzy real-time draggable sliders (albeit heavily smoothed so as to avoid jitter).

    The only possible speed improvements I might suggest would be to overclock to 144mhz (168mhz causes crashing, for me, but 144mhz seems to be very stable), and POSSIBLY a slight gain could be made by hardcoding the shift values (ie, instead of "1<<0" just 1, and instead of 1<<1, just "2," and "4," "8," etc. ) I'm not sure HOW much of a speed boost that last one might get you, but it should get rid of at least one operation per pin, I'd think? So you'd save 8 operations overall, which isn't bad.

    Edit - looks like using Paul's SPI mode is (surprisingly, for me!) about twice as fast as using the parallel 8-bit connection (Something along the lines of 634000 microseconds to my 1286000 microseconds). Perhaps, since I'm already planning to do a revision on my current prototype PCBs, I'll swap to SPI. But maybe instead, I'll dig more into the ILI9341_t3 code and see if I can't take some of your other optimizations and apply them to parallel mode?
    Last edited by MuShoo; 12-27-2014 at 10:48 AM.

  8. #8
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Yeah, it can be painful to redo a PCB layout. But unless you've already fabricated and soldered hundreds or thousands of boards, in the end the cost to rev the PCB is pretty low.

  9. #9
    The PCB is expensive and quite large (the lcd is just a small part of it). They haven't arrived yet, but I'm not expecting to make another revision for a while until this one is fully tested. Furthermore, it's impossible for me to have all the 8 pins on one port (though there are suggestions to split them up into 2?) due to the pins I've used. Speed isn't that important for my application (if necesssary, I could just disable the lcd). It's only for debugging and convenience.

    The sketch runs in a few seconds (I'd say about 1/4 of the speed of Paul's optimised library). This is after modifying the adafruit libraries (removing delays, changing to digitalWriteFast() & digitalReadFast() etc.).

    MuShoo, if you're using the Adafruit library, you may find that removing all the delays in Adafruit_TFTLCD::writeRegister24() and Adafruit_TFTLCD::writeRegister32() will significantly increase speed. Also, in pin_magic.h, there's a bunch of macros. For some reason, these are directly passed to functions in the Adafruit_TFTLCD library. I recided to remove all of these functions and made everything inline. What's the point of declaring a macro and then putting it inside a function with a similar name that does the identical thing?

    I'm constantly amazed at how fast the tft is with the optimised spi library. This was my original plan, however, there were many devices using SPI already (something like 4 devices with the slowest clock being 2MHz) and I though parallel would probably be faster.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts