ILI9241_t3 support for Teensy LC?

Status
Not open for further replies.

KurtE

Senior Member+
ILI9341_t3 support for Teensy LC?

I decided to split this off from the Arduino 1.6.0 thread....

Hi Paul,

I have also done some playing with the faster ili9341 driver and trying to enable the fifo, and as I mentioned, some of the register status values were not clear. Can detect when there is room in the queue, and when the output queue is empty and nearly empty, but was unclear if the empty was just for the fifo or had it shifted out the last bits of the last byte to output....

defragster - The issue, is timing when the CS and the DC(Data/Command) IO pins are changed as related to the data lines (Clk, MISO, MOSI). Example suppose you wish to set a pixel, the code for this turns in to:
Code:
ILI9341_t3::drawPixel(int16_t x, int16_t y, uint16_t color) {
	writecommand_cont(ILI9341_CASET); // Column addr set
	writedata16_cont(x);   // XSTART
	writedata16_cont(x);   // XEND
	writecommand_cont(ILI9341_PASET); // Row addr set
	writedata16_cont(y);   // YSTART
	writedata16_cont(y);   // YEND
	writecommand_cont(ILI9341_RAMWR);
	writedata16_last(color);
}
I won't go into the full details about above, except: each time you go from command to data or data to command, the DC line changes, likewise the _cont says to continue holding the CS line and last says we will release CS after the write. Also 16 is a write of 16 bits versus 8 bits (On the 3.1, each one of these writes translate to one entry in the fifo queue, which encodes both the CS and the DC changes as part of the queue entry. The only waiting is after we add an item to the queue, we wait until the queue is not full... With this the system is doing all of the work for changing the DC and CS pins at the right time.

Now with the LC, we have to wait until the output queue is completely empty (last bit output) and then do the change of the DC and/or CS pins through digitalWrite() or hopefully faster equivalent.
will assume queue empty when I try to output: ILI9341_CASET command as probably previous command cleared queue. So we set CS and DC, push it on the queue, then we go to write the X values out, DC changes, so we have to wait until queue is empty, then clear DC and push the 16 bits of X twice (start and end), then we need to wait until those bytes are output, set DC, push PASET, wait, clear DC, push 4 bytes for Y...

So in the above: we probably have to wait for the queue to empty at least 6 times, where we change DC and/or CS. Each time we do this, there will be a gap of time when SPI pins are not outputting something.

However I still think there can be some pretty big wins. For example in things like fill rect. The code to do this is more or less identical to the draw pixel, except we output two different X's and Y's instead of the same ones twice and then after the ILI9341_CASET, we call: writedata16_cont(color); (w*h -1) times then writedata16_last(color); So would have the same number of IO pin state changes. So again things like: fill screen, fill rect, horizontal line, vertical line, write rect, text with background color specified, ... All should be pretty fast. However things that draw random lines, text with transparency ... Will not gain much speed over standard ILI9341 driver.

Hope that makes sense?

Paul: in the ST7735 driver I have this done like:
Code:
void Adafruit_ST7735::writecommand(uint8_t c)
{
  *rsport &= ~rspinmask;
  *csport &= ~cspinmask;
  //Serial.print("C ");
  spiwrite(c);
  *csport |= cspinmask;
}
Which I personally don't like, as *csport & ~cspinmask is a two step process. Read the current state of N pins (8), change the state of 1 of them and write it back. Now if something like an interrupt changes the state of one of the 8 pins after I did the read but before the write, this code would stomp on it... I am pretty sure I can change this to keep two registers (set/clear) and do this atomic?
 
Last edited:
Paul (and others),

One thing I have been wondering about with this driver as well as some others, is the usage of the SPI.transfer function. This function is lock step to do a transfer of one byte, wait for it to go out, get the response (MISO), and return it. So there is a delay for each byte...

With our display drivers, about 99% of the time, we are only doing output and don't care. I could be wrong, but I believe the SPI might be double buffered. That is when you write a byte into the SPI_DL register, it will write the byte to the output shift register and during this shift you can output another byte to the SPI_DL register, which will start as soon as the previous one completed...

So wish there was a clean way to use this: something like SPI.write(n), where the write function is sort of asynchronous. This probably would not produce as fast of an output as DMA or FIFO, but would probably still give a pretty good win...

Does this make sense?

Kurt
 
Thanks Kurt for the extra detail. I did some high level command I/O as code above on a tiny 1306 OLED driver - but didn't track the control lines for transfer. I'll be following.

SPI:: <3.1 hdw>------<LC (times TWO)>-----<3.1 soft>

It seems 3.1 Hardware SPI is one end and 3.1 Software SPI the other and the LC SPI ports are somewhere in the middle lacking the control line buffering but having a more useful fifo?
 
Thanks Paul,

But this is specific to KINETISK (not L). Note: there is an #ifdef at start for some of the defines, but probably should make the whole file under the ifdef as:
for example:
Code:
inline void begin(uint8_t pin, uint32_t speed, uint32_t mode=SPI_MODE0) __attribute__((always_inline)) {
    		uint32_t p, ctar = speed;
		SIM_SCGC6 |= SIM_SCGC6_SPI0;

		KINETISK_SPI0.MCR = SPI_MCR_MSTR | SPI_MCR_MDIS | SPI_MCR_HALT | SPI_MCR_PCSIS(0x1F);
Would not compile as for LC as KINETISK_SPI0 would not be defined let alone the registers are different.

But all of this is specific with FIFO. What I am wondering about is trying to enable better output by trying to double buffer the output. In particular, I am wondering about trying to make use of the
SPTEF bit of the SPI0_S(or SPI1_S), to try to keep the queue going as quick as possible. From the PDF:
For an idle SPI, data written to DH:DL is transferred to the shifter almost immediately so that
SPTEF is set within two bus cycles, allowing a second set of data to be queued into the transmit buffer.
After completion of the transfer of the data in the shift register, the queued data from the transmit buffer
automatically moves to the shifter, and SPTEF is set to indicate that room exists for new data in the
transmit buffer. If no new data is waiting in the transmit buffer, SPTEF simply remains set and no data
moves from the buffer to the shifter.
With this, I would hope to keep the SPI running reasonably full speed until we have to wait for it to be done in order to output a changes CS/DC...

I am currently hacking up a version of the library that first I will try to get it to work and then try something like this. currently calling it: ILI9341_TLC
 
I have a version that compiles and runs. Still using SPI.transfer to do the work. Also don't have read functions yet.
Current times show:
Code:
Benchmark                Time (microseconds)
Screen fill              1255474
Text                     77467
Lines                    343117
Horiz/Vert Lines         104304
Rectangles (outline)     67377
Rectangles (filled)      2608050
Circles (filled)         443575
Circles (outline)        411272
Triangles (outline)      83748
Triangles (filled)       906829
Rounded rects (outline)  171138
Rounded rects (filled)   2863843
Done!

Now to start some more hacking

Quick update: changed compile option to 48mhz optimized and now:
Code:
Benchmark                Time (microseconds)
Screen fill              1043821
Text                     49261
Lines                    289459
Horiz/Vert Lines         84423
Rectangles (outline)     53741
Rectangles (filled)      2167990
Circles (filled)         309077
Circles (outline)        220932
Triangles (outline)      70596
Triangles (filled)       717201
Rounded rects (outline)  105228
Rounded rects (filled)   2353461
Done!
 
Last edited:
I did some hacking, trying to use the double buffering. Still not sure the best way to detect when the last IO bits are output, such that it is safe to change CS/DC, but I took a shot at it.
Here are the current timings.
Code:
Benchmark                Time (microseconds)
Screen fill              630488
Text                     48079
Lines                    288025
Horiz/Vert Lines         52418
Rectangles (outline)     34210
Rectangles (filled)      1312110
Circles (filled)         239618
Circles (outline)        239211
Triangles (outline)      66552
Triangles (filled)       470463
Rounded rects (outline)  95852
Rounded rects (filled)   1444815
Done!
Still does no input stuff. If anyone wishes to try it out. I have included a zip file. The Graphic test program is included in examples
 

Attachments

  • ILI9341_TLC.zip
    15.3 KB · Views: 214
Concepts could work on T3.1 but not code as registers are different. Also 3.1 has better support.

Next up is to add another constructed where you pass in io pins for spi to allow alternate setups including spi1, where you can enable Fifo, again concept will be same, but detecting room on queue and queue empty would change...
 
KurtE - 3.1 would indeed better, but that SPI is claimed for RF24, might use i2c OLED. Having a working SPI2 on 3.1 would be nice for debugging - thus that part of my interest. Until then I'll do an LC w/ILI9351 and debug with serial (or RF24 if LC has SPI2 support?)

Paul - Nice choice for $8 PJRC Display! And seeing only good in the 1.6 IDE and 1.21 (b8)- I picked a good time to plug in my first Teensy!

When others get the LC the PJRC link above shows good install. For now this is a T3.1 (until LC arrives) Pulled updated Beta-8 ILI9341 files - cooled solder on PCB socket board while I compiled/uploaded - plugged wires in per drawing - powered and it came up running fine on USB first time! An hour+ and all okay.
ILI9341 Test!
Benchmark Time (microseconds)
Screen fill 280100
Text 15813
Lines 73146
Horiz/Vert Lines 22942
Rectangles (outline) 14600
Rectangles (filled) 581644
Circles (filled) 85494
Circles (outline) 64601
Triangles (outline) 17686
Triangles (filled) 192092
Rounded rects (outline) 30763
Rounded rects (filled) 634334
Done!
* I didn't find 100ohm part for LED so I paralleled a 330&~160 and ended up with 120ohms and screen is bright from 5v VIN to LED - otherwise I used the drawing.
 
Last edited:
Note: I decided to put up (probably a short term) github repository for this: https://github.com/KurtE/ILI9341_tlc
Not sure what longer term is. Maybe merge back into ILI9341_t3? Maybe on it's own, maybe just use standard Adafruit_ILI9341 driver? ...

defragster: As I mentioned, next up is to add a new constructor to allow alternate pins. I may also do that for the ILI9341_t3 driver to support moving SCK from 13 to 14...
In the case of the _tlc, will add SPI1 support. First git it working with it, then maybe try FIFO...
But from your comment: Please note: I am about 95% sure the Teensy 3.1 does not have a second SPI port on it. So again this will only work with TLC.

Note: some of this may take me awhile, due to some other issues, but...
 
Paul: side comments on SPI stuff. At times I wish that there were SPI objects that I could use to minimize code differences to support multiple busses.

Both at two levels: In code that supports both busses, it would be nice if everywhere I did not have to do things like:
Code:
if (fSPI1)
  SPI1.transfer(c);
else
  SPI.transfer(c);
instead somewhere be able to do something like:
Code:
MySPI = &SPI  or &SPI1
and then do:
  MySPI->transfer(c);
But they are two different classes. Maybe it is not worth it as these would all have to be virtual functions... But might be nice.

Likewise it would be nice if there was a class or structure defined for Hardware registers, such that instead of again:
Code:
if (fSPI1)
  sr = SPI0_S;
else
  sr = SPI1_S;
You could just address off of a pointer to start of the particular SPI registers. I should double check as there may be something like this already?
 
KurtE - Thanks for the T3.1 clarification, indeed only one SPI is declared on PJRC materials, I was wishfully misdirected by a prototype board I got. This board maker says "software SPI (“bit banging”) implementation if preferred" and wired pins to an adapter board header, that left me with a wrong impression about expected support. With that in mind I now see the Teensy 3.1 card (compared to LC card) was not showing DOUT0 and DOUT1 - but perhaps alternate pins for DOUT (and other SPI pins) in grayed text.

I pre-ordered two LC's on general principle and now I'll have a specific use I'll be looking forward to test and get working. BTW - the PJRC_ILI9341 running perfectly now 10 hours on - time to go transition my @onehorse 9DOF to use it.
 
Last edited:
Today I added the ability to specify the spi pins. It will fail, if not valid hardware spi pins. I tested with moving clk to 14. I updated my github with it. For the fun of it, made same changes for ILI9341_t3 driver, and did pull request...
 
Just pushed a new version up to github (TLC), that now supports SPI1...

Does not use FIFO yet. Still experimenting.

Probably next up is to allow reads to work...

Question will be: leave this separate or merge into ILI9341_T3
 
GetCursor() in ILI9341_T3: // KurtE dropping this here for your review/inclusion - LC speedup in usage bigger than the 3.1

If I had a Get cursor I could 'InvalidateRectangle' and then freshen it. Trivial library change with RAM only access to read stored private x and y. Very effective in use!

Converted @onehorse 9DOF sample code from Nokia5110 (2fps) to the ILI9341_T3 (20fps) and the screen takes all the output but flashes 20 times a second for updates.

Clearing the whole screen and updating 20fps is ugly and limits 9DOF updates to .65 kHz
Doing 6 rectangle overwrites before text output doing 20fps is smooth and 9DOF updates are at .96 kHz. [2fps is 1.82 kHz, 100fps is 0.36 kHz]

Code:
ILI9341_T3.h
	void getCursor(int16_t *x, int16_t *y);
Code:
ILI9341_T3.cpp
void ILI9341_t3::getCursor(int16_t *x, int16_t *y) {
  *x = cursor_x;
  *y = cursor_y;
}
Here is a Hardcode example that worked overnight along with 5 other rectangles
Code:
    int16_t cx, cy;
    tft.getCursor(&cx, &cy); tft.fillRect(cx, cy, 50, 24, ILI9341_BLUE);
    tft.println((int)(1000 * ax));
    tft.println((int)(1000 * ay));
    tft.println((int)(1000 * az));
    tft.println(" mg ");

* having the GetCursor() return 'uint8_t textsize' would allow any spot of the code to do the math to invalidate the right region with a single call. In the Hardcode example above I assumed the textsize in my rectangle math.
 
Last edited:
Uploaded a version with getCursor (did not try it, but looked oK).

This version also has a first cur at readPixel, readRect and readRegister8. They appear to be real slow. Need to work on the double buffering and maybe fifo on this.

Kurt
 
Cool - does that put it in Beta_10?
It is as simple as it looks - as long as valid pointers come in. Since the driver calls make library's tracking critical - they already do the math for it. Could do it with twin calls like width() and height(). Little overhead to return(textsize); though the app can track that more easily than (x,y), though driver return would eliminate the global needed for it.
 
Note: I only pushed up a version of my ILI9341_TLC that has the enhancement. My guess is this library will not be in Beta 10 as it is still WIP and not sure if it should live on it's own or be folded in #ifdef into
ILI9341_T3...

I could also add it to my _t3 version and issue a Pull request for it. Not sure if it will get pulled in for the build or not. Already have pull request in that allows alternate SPI pins to be used.
 
I pushed up a Pull Request for ILI9341_t3 with this change. There are now two seperate PRs on this library (trying to do the Arduino/MRAA methods), so Paul can pick and choose which changes to accept or not...
 
Thanks, Paul can hopefully include it with all the HubBub 1.6.1*LC*1.21 - I'm using it like this with my GLOBAL GTSz - BLUE shade for effect:
Code:
    tft.setTextSize(GTSz = 1);
    tft.getCursor(&cx, &cy); tft.fillRect(cx, cy, 5 * GTSz * 6, 3 * GTSz * 8, ILI9341_BLUE);
    { ... print 3 lines of 4 digits plus sign } 
    tft.setTextSize(GTSz = 2);
    tft.getCursor(&cx, &cy); tft.fillRect(cx, cy, 5 * GTSz * 6, 3 * GTSz * 8, ILI9341_BLUE);
    { ... print 3 lines of 4 digits plus sign }
 
I am still playing around with this. The version of the library I have up on github can do some reading of pixels/rectangles and the like, but it is not doing a good job trying to maximize the SPI bus usage.
The code for it. the code looks like:
Code:
void ILI9341_TLC::readRect(int16_t x, int16_t y, int16_t w, int16_t h, uint16_t *pcolors) 
{
    // First quick and dirty version. 
	uint8_t r,g,b;
    uint16_t c = w * h;
	spiBegin();

	setAddr(x, y, x+w-1, y+h-1);
	writecommand_cont(ILI9341_RAMRD); // read from RAM
	r = transferdata8_cont(0);	// Read a DUMMY byte of GRAM

    while (c--) {
        r = transferdata8_cont(0);		// Read a RED byte of GRAM
        g = transferdata8_cont(0);		// Read a GREEN byte of GRAM
        if (c)
            b = transferdata8_cont(0);		// Read a BLUE byte of GRAM
        else
            b = transferdata8_last(0);		// Read a BLUE byte of GRAM
        *pcolors++ = color565(r,g,b);
    }

	spiEnd();
}
The transfer functions are like the write functions, in that they output the character to SPI, wait for a character is ready to be read and read it... Internally it calls:
Code:
    uint8_t transferSPIByte(uint8_t val) {
        uint32_t sr;
        do {
            sr = _pKSPI->S;
    		if ((_pKSPI->S & SPI_S_SPRF)) 
                uint32_t tmp __attribute__((unused)) = _pKSPI->DL;
		} while (!(sr & SPI_S_SPTEF)) ; // room for byte to output.
		_pKSPI->DL = val;

    	while (!(_pKSPI->S & SPI_S_SPRF)) ; // wait until we have a character available to output. 
        fByteOutput = 0;     // we are not waiting for any output to change...
        return _pKSPI->DL;   // get the byte... 
    }
So The above code always has a delay between bytes, as we wait until the byte has been returned, we process it and then we submit the next one.

What I am trying to do is to keep the transmit buffer full.

The variation I am currently playing with:
Code:
void ILI9341_TLC::readRect(int16_t x, int16_t y, int16_t w, int16_t h, uint16_t *pcolors) 
{
    // First quick and dirty version. 
	uint8_t ab[4];      // buffer to read in colors for one pixel.
    int8_t ib;         // index into the buffer; 
    uint16_t c = w * h;
    uint16_t dontHang;
	spiBegin();

	setAddr(x, y, x+w-1, y+h-1);
   	writecommand_cont(ILI9341_RAMRD); // read from RAM
    dcHigh();
    // We will need to transfer c*3+1 bytes
    uint16_t cWrite = c*3+1;   // see how many bytes we need to write
    uint16_t cRead = cWrite + 0; // keep count of how many bytes we have read. 
    volatile uint8_t sr;
    ib = 0;    // we will ignore first N bytes returned. 
    dontHang = DONTHANGCOUNT;
    while (cRead && dontHang--) {
        sr = _pKSPI->S;
        if ((sr) & SPI_S_SPRF) {
            // we have a byte; 
                ab[ib++] = _pKSPI->DL; // read in the byte;
            cRead--;
            dontHang = DONTHANGCOUNT;
        }

        if (cWrite) {
            if ((sr) & SPI_S_SPTEF) {
                _pKSPI->DL = 0;     // push a 0 to start next transfer
                cWrite--;
            }    
        }
        if (ib == 4) {
            // we have a pixel so lets build it 
            *pcolors++ = color565(ab[1],ab[2],ab[3]);
            ib = 1;
        }

    }  
    if (cRead)
        digitalWriteFast(4, !digitalReadFast(4));
    csHigh();

	spiEnd();
}
It figures out how many bytes need to be output. My assumption from the documents is that the SPI data registers (dl and dh if 16 bit mode) get moved into a shift register to output. As soon as it is moved into the shift register, the TEF will be set? Once a byte completes being output, the system will then automatically move the data from DL into the shift register, likewise the stuff received from miso will be moved into receive buffer and the SPRF flag will be set. This somewhat works, but there is some nuance that I am probably overlooking as a large percentage of the time, my timeout (keep from hanging) code will execute and only some percentage of the bytes are transferred.... Still trying to figure this out. Then maybe try turning on fifo

Some questions I have about fifo include, can I detect when the last bits have transferred as to be able to closely control CS/DC...

Kurt
 
I want to get back to this.

Has anyone else done much work understanding how the SPI status register states work (when DMA and FIFO is turned off)?
 
Run time check SPI0 vs 1

Code:
Sketch uses 39,524 bytes (62%) of program storage space. Maximum is 63,488 bytes.
Global variables use 4,424 bytes (54%) of dynamic memory, leaving 3,768 bytes for local variables. Maximum is 8,192 bytes.

Code:
ILI9341 Test!
Display Power Mode: 0x9C
MADCTL Mode: 0x48
Pixel Format: 0x5
Image Format: 0x9C
Self Diagnostic: 0xC0
Benchmark                Time (microseconds)
Screen fill              563658
Text                     75284
Lines                    349683
Horiz/Vert Lines         49492
Rectangles (outline)     32890
Rectangles (filled)      1171236
Circles (filled)         335826
Circles (outline)        449034
Triangles (outline)      79842
Triangles (filled)       489153
Rounded rects (outline)  157372
Rounded rects (filled)   1328104
Done!

Now for the fun of it, instead of doing run time checking if I am on SPI0 or SPI1, I try to compile the code in for one or the other. With first pass at this, I see:
Code size:
Code:
Sketch uses 38,964 bytes (61%) of program storage space. Maximum is 63,488 bytes.
Global variables use 4,412 bytes (53%) of dynamic memory, leaving 3,780 bytes for local variables. Maximum is 8,192 bytes.
Timings as expected are better!
Code:
Benchmark                Time (microseconds)
Screen fill              467235
Text                     70512
Lines                    322588
Horiz/Vert Lines         41521
Rectangles (outline)     28878
Rectangles (filled)      971307
Circles (filled)         303236
Circles (outline)        421385
Triangles (outline)      73314
Triangles (filled)       421744
Rounded rects (outline)  145550
Rounded rects (filled)   1109212
Done!

So I obviously think this is the way to go. But question is how best to do it? That is how best to setup objects that can compile for SPI0 or SPI1, without run time checks.
a) Other IDE's would simply use a project parameter -DUSE_SPI1 that gets passed through and compiles specifically for some project.
b) Put all of the code in header file. Have done this before, but I think it is not the optimal solution...

c) If compiling for same functionality (Example don't use FiFO), I could (and did) keep a pointer to the Kinetisl SPI object and use that for references to all of the SPI objects. However I still was needing to do run time checking to know if I should call SPI.beginTransaction or SPI1.beginTransaction. (likewise endTransaction). Since the SPI and SPI1 objects are not from same class or base class with virtual functions, can not hold on to pointer for object. Could hack it and have my objects hold onto a pointer to the function to call for begin and end (do my own logical VTable).

c1) But I am wanting to use FIFO on spi1, so functions change. Example checking if I can output a byte(or word) to SPI. Without FIFO you look at the SPI_S_SPTEF status bit (Transmit Empty flag) but with FIFO, you look at SPI_S_TXFULLF (TX Fifo full) and the case of the if is different...

d) One option is to simply create two distinct classes for this ILI9341_TLC_S0 and ILI9341_TLC_S1. This is probably the cleanest way for end users, but sort of a pain...

Thoughts?

P.S - both versions of the code or up in the github project (https://github.com/KurtE/ILI9341_tlc/) under two different branches
 
Last edited:
Run time check SPI0 vs 1
<snip>
So I obviously think this is the way to go. But question is how best to do it? That is how best to setup objects that can compile for SPI0 or SPI1, without run time checks.
<snip>
Thoughts?

Couldn't entries be made in boards.txt that reflect the specific compller flags you want to set and then you would select that "board" from the drop down menu? I admit that I'm not entirely familiar with what the limitations of that file are. But speed and optimization flags can be set there.

Matt
 
Status
Not open for further replies.
Back
Top