You should see my desk when it is clutteredNice, Now I don't feel so bad about my clean desks
Micromod SPI | MM Parallel |
Benchmark Time (microseconds) Screen fill 615251 Text 10915 Lines 154798 Horiz/Vert Lines 51142 Rectangles (outline) 28386 Rectangles (filled) 1487087 Circles (filled) 177501 Circles (outline) 103593 Triangles (outline) 33562 Triangles (filled) 487144 Rounded rects (outline) 54398 Rounded rects (filled) 1626225 | Benchmark Time (microseconds) Screen fill 2592 Text 430 Lines 4847 Horiz/Vert Lines 1092 Rectangles (outline) 413 Rectangles (filled) 6436 Circles (filled) 2558 Circles (outline) 1827 Triangles (outline) 932 Triangles (filled) 2105 Rounded rects (outline) 838 Rounded rects (filled) 12433 |
615251/2592= 237.4 times fasterOnly timing tests were with the graphics test sketch:
Micromod SPI MM Parallel Benchmark Time (microseconds)
Screen fill 615251
Text 10915
Lines 154798
Horiz/Vert Lines 51142
Rectangles (outline) 28386
Rectangles (filled) 1487087
Circles (filled) 177501
Circles (outline) 103593
Triangles (outline) 33562
Triangles (filled) 487144
Rounded rects (outline) 54398
Rounded rects (filled) 1626225Benchmark Time (microseconds)
Screen fill 2592
Text 430
Lines 4847
Horiz/Vert Lines 1092
Rectangles (outline) 413
Rectangles (filled) 6436
Circles (filled) 2558
Circles (outline) 1827
Triangles (outline) 932
Triangles (filled) 2105
Rounded rects (outline) 838
Rounded rects (filled) 12433
Note: mine appear to be a bit different:
Device Status: F4530400
Order: BGR
interface pixel format: 16 bit
Benchmark Time (microseconds)
Screen fill 138255
Text 7357
Lines 213504
Horiz/Vert Lines 11567
Circles (filled) 76909
Circles (outline) 91726
Rectangles (outline) 6867
Rectangles (filled) 334102
Triangles (outline) 40666
Triangles (filled) 122419
Rounded rects (outline) 33574
Rounded rects (filled) 372867
Done!
ILI948x_t4x_p::MulBeatWR_nPrm_DMA(2c, 60001670, 153600
unsigned long testFillScreen() {
unsigned long start = micros();
lcd.fillScreen(ILI9488_BLACK);
lcd.fillScreen(ILI9488_RED);
lcd.fillScreen(ILI9488_GREEN);
lcd.fillScreen(ILI9488_BLUE);
lcd.fillScreen(ILI9488_BLACK);
return micros() - start;
}
Benchmark Time (microseconds)
Screen fill 138254
Text 7652
Lines 217629
Horiz/Vert Lines 11572
Circles (filled) 77834
Circles (outline) 94868
Rectangles (outline) 6886
Rectangles (filled) 334115
Triangles (outline) 41759
Triangles (filled) 123022
Rounded rects (outline) 34875
Rounded rects (filled) 373094
Done!
We fill the screen 5 times... Now if you have use Frame buffer turned on, It will not update the screen at all if done outside of this function.
If you put in the UpdateScreen just before the return with timings, it will only update the screen once. The rest of the time it issimply how long it takes to write to memory
Screen fill 169861
Text 449
Lines 5589
Horiz/Vert Lines 1587
Rectangles (outline) 409
Rectangles (filled) 5786
Circles (filled) 2761
Circles (outline) 2139
Triangles (outline) 1083
Triangles (filled) 2559
Rounded rects (outline) 931
Rounded rects (filled) 11922
Done!
unsigned long testFillScreen() {
unsigned long start = micros();
lcd.fillScreen(ILI9488_BLACK);
lcd.updateScreen();
lcd.fillScreen(ILI9488_RED);
lcd.updateScreen();
lcd.updateScreen();
lcd.fillScreen(ILI9488_GREEN);
lcd.updateScreen();
lcd.fillScreen(ILI9488_BLUE);
lcd.updateScreen();
lcd.fillScreen(ILI9488_BLACK);
lcd.updateScreen();
return (micros() - start);
}
Thanks, we fixed a few issues yesterday with the MMOD, like needing to flush the cache before doing DMA...@KurtE has been working on async updates to screen buffering on the Teensy MM and T41. So far MM seems to be working.
Thanks, that is the library I mentioned in the previous post.@KurtE check out my T4.1 library (first post in this thread)
It should have interrupts implemented for the T4.1 if my memory serves right
With 8 shifters it is close, although it feels like the alighment is off.The T41 library works sort of, but found it breaks down if you try for example to change from using 8 shifters to 4.
FASTRUN void RA8876_t3::flexIRQ_Callback(){
if (p->TIMSTAT & (1 << TIMER_IRQ)) { // interrupt from end of burst
p->TIMSTAT = (1 << TIMER_IRQ); // clear timer interrupt signal
bursts_to_complete--;
if (bursts_to_complete == 0) {
p->TIMIEN &= ~(1 << TIMER_IRQ); // disable timer interrupt
asm("dsb");
WR_IRQTransferDone = true;
CSHigh();
_onCompleteCB();
return;
}
}
if (p->SHIFTSTAT & (1 << SHIFTER_IRQ)) { // interrupt from empty shifter buffer
// note, the interrupt signal is cleared automatically when writing data to the shifter buffers
if (bytes_remaining == 0) { // just started final burst, no data to load
p->SHIFTSIEN &= ~(1 << SHIFTER_IRQ); // disable shifter interrupt signal
} else if (bytes_remaining < BYTES_PER_BURST) { // just started second-to-last burst, load data for final burst
uint8_t beats = bytes_remaining / BYTES_PER_BEAT;
p->TIMCMP[0] = ((beats * 2U - 1) << 8) | (_baud_div / 2U - 1); // takes effect on final burst
readPtr = finalBurstBuffer;
bytes_remaining = 0;
for (int i = 0; i < SHIFTNUM; i++) {
uint32_t data = *readPtr++;
p->SHIFTBUFHWS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
while(0 == (p->SHIFTSTAT & (1U << SHIFTER_IRQ))) {}
}
} else {
bytes_remaining -= BYTES_PER_BURST;
for (int i = 0; i < SHIFTNUM; i++) {
uint32_t data = *readPtr++;
p->SHIFTBUFHWS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
Add this line ------------> while(0 == (p->SHIFTSTAT & (1U << SHIFTER_IRQ))) {} <--------- Is missing in orginal code
}
}
}
asm("dsb");
}
sourceAddress = (uint16_t *)value + minorLoopBytes / sizeof(uint16_t) - 1; // last 16bit address within current minor loop
sourceAddressOffset = -sizeof(uint16_t); // read values in reverse order
minorLoopOffset = 2 * minorLoopBytes; // source address offset at end of minor loop to advance to next minor loop
sourceAddressLastOffset = minorLoopOffset - TotalSize; // source address offset at completion to reset to beginning
destinationAddress = (uint32_t *)&p->SHIFTBUFBYS[SHIFTNUM - 1]; // last 32bit shifter address (with reverse byte order)
destinationAddressOffset = -sizeof(uint32_t); // write words in reverse order
destinationAddressLastOffset = 0;
if (p->SHIFTSTAT & (1 << SHIFTER_IRQ)) { // interrupt from empty shifter buffer
DBGWrite('S');
// note, the interrupt signal is cleared automatically when writing data to the shifter buffers
if (bytes_remaining == 0) { // just started final burst, no data to load
p->SHIFTSIEN &= ~(1 << SHIFTER_IRQ); // disable shifter interrupt signal
} else if (bytes_remaining < BYTES_PER_BURST) { // just started second-to-last burst, load data for final burst
uint8_t beats = bytes_remaining / BYTES_PER_BEAT;
p->TIMCMP[0] = ((beats * 2U - 1) << 8) | (_baud_div / 2U - 1); // takes effect on final burst
readPtr = finalBurstBuffer;
bytes_remaining = 0;
for (int i = SHIFTNUM - 1; i >= 0; i--) {
digitalToggleFast(3);
uint32_t data = readPtr[i];
p->SHIFTBUFBYS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
}
} else {
bytes_remaining -= BYTES_PER_BURST;
// try filling in reverse order
for (int i = SHIFTNUM - 1; i >= 0; i--) {
digitalToggleFast(3);
uint32_t data = readPtr[i];
p->SHIFTBUFBYS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
}
readPtr += SHIFTNUM;
}
if (bytes_remaining == 0) {
DBGWrite('L');
p->SHIFTSIEN &= ~(1 << SHIFTER_IRQ);
}
}
p->SHIFTBUFHWS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
p->SHIFTBUF[i] = data;
Definitely time to merge.I am thinking about PR it back into our master branch.
Also maybe at some point, my have to try out my new shiny unused:
So that was with DMA and not Async? I was using pushPixels16BitAsync() from the ILI948x_t41_p library when I encountered the the problem.I now have it working
The issue is the order we are filling in the shift buffers. If you look at the DMA code, there is some interesting looking stuff:
In the Minor loop, the first item it fills is: SHIFTBUFBYS[SHIFTNUM - 1];Code:sourceAddress = (uint16_t *)value + minorLoopBytes / sizeof(uint16_t) - 1; // last 16bit address within current minor loop sourceAddressOffset = -sizeof(uint16_t); // read values in reverse order minorLoopOffset = 2 * minorLoopBytes; // source address offset at end of minor loop to advance to next minor loop sourceAddressLastOffset = minorLoopOffset - TotalSize; // source address offset at completion to reset to beginning destinationAddress = (uint32_t *)&p->SHIFTBUFBYS[SHIFTNUM - 1]; // last 32bit shifter address (with reverse byte order) destinationAddressOffset = -sizeof(uint32_t); // write words in reverse order destinationAddressLastOffset = 0;
So I now have that portion of code in the interrupt case like:
And everything started to work.Code:if (p->SHIFTSTAT & (1 << SHIFTER_IRQ)) { // interrupt from empty shifter buffer DBGWrite('S'); // note, the interrupt signal is cleared automatically when writing data to the shifter buffers if (bytes_remaining == 0) { // just started final burst, no data to load p->SHIFTSIEN &= ~(1 << SHIFTER_IRQ); // disable shifter interrupt signal } else if (bytes_remaining < BYTES_PER_BURST) { // just started second-to-last burst, load data for final burst uint8_t beats = bytes_remaining / BYTES_PER_BEAT; p->TIMCMP[0] = ((beats * 2U - 1) << 8) | (_baud_div / 2U - 1); // takes effect on final burst readPtr = finalBurstBuffer; bytes_remaining = 0; for (int i = SHIFTNUM - 1; i >= 0; i--) { digitalToggleFast(3); uint32_t data = readPtr[i]; p->SHIFTBUFBYS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000); } } else { bytes_remaining -= BYTES_PER_BURST; // try filling in reverse order for (int i = SHIFTNUM - 1; i >= 0; i--) { digitalToggleFast(3); uint32_t data = readPtr[i]; p->SHIFTBUFBYS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000); } readPtr += SHIFTNUM; } if (bytes_remaining == 0) { DBGWrite('L'); p->SHIFTSIEN &= ~(1 << SHIFTER_IRQ); } }
There were a few other things to fix, plus turn off debug code, but all of that is now checked into the t41_async branch...
I am thinking about PR it back into our master branch.
Also maybe at some point, my have to try out my new shiny unused:
View attachment 34766
Side note: This line through me for a loop:
If I am reading it correctly: you reverse the words that you pass in to the SHIFTBUFHWS which reverses the words?Code:p->SHIFTBUFHWS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
If so you might try:Code:p->SHIFTBUF[i] = data;
p->SHIFTBUFBYS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
p->SHIFTBUFHWS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
Next, wire up your LCD - use Teensy pins:
pin 21 - WR
pin 20 - RD
pin 19 - D0
pin 18 - D1
pin 14 - D2
pin 15 - D3
pin 17 - D4
pin 16 - D5
pin 22 - D6
pin 23 - D7
ILI948x_t40_p lcd = ILI948x_t40_p(10, 8, 9); //(dc, cs, rst)
ILI9488 Initialized
CMD: 0x4, SHIFT: 0x4
Dummy 0x0, data 0x0
Manufacturer ID: 0x00
CMD: 0xB, SHIFT: 0xB
Dummy 0x0, data 0x0
MADCTL Mode: 0x00
CMD: 0xC, SHIFT: 0xC
Dummy 0x0, data 0x0
Pixel Format: 0x00
CMD: 0xD, SHIFT: 0xD
Dummy 0x0, data 0x0
Image Format: 0x00
CMD: 0xF, SHIFT: 0xF
Dummy 0x0, data 0x0
Self Diagnostic: Failed (0x00)
With this one: if we say: that data is in byte order:0 1 2 3Code:
p->SHIFTBUFBYS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
Where this one: you do the shift ands and ors and have: 2 3 0 1Code:
p->SHIFTBUFHWS[i] = ((data >> 16) & 0xFFFF) | ((data << 16) & 0xFFFF0000);
p->SHIFTBUF[i] = data;
Thanks for the input That makes it a lot more clear. Will have experiment...With this one: if we say: that data is in byte order:0 1 2 3
The Shifts and ands and or: put them into the order: 2 3 0 1
and then the SHIFTBUFBYS: into the order 1 0 3 2
So the Words are still in the same order but the two bytes are swapped in each half word...
Where this one: you do the shift ands and ors and have: 2 3 0 1
And then SHIFTBUFHWS I believe swaps the two words again, so you are back to: 0 1 2 3
Which I believe you would get the same results as:
But could easily be wrong. I believe it has happened beforeCode:p->SHIFTBUF[i] = data;
You might try the way I did it by filling the buffers in reversed order in my case 3 2 1 0
I believe what happens is if you fill in 0 first, it might trigger the timer starting and then you might be in
a race with it shifting that 0th item into the output register and see it does not have 1 yet and funny things...
But again just a guess
Thanks. @KurtE mentioned that you need 8 consecutive Flexio pins. Looks at the T4 - not possible to get there from here. The T4 does not have 8 consecutive pins. The pins you have:@mjs513 I’m not sure I ever got the 4.0 version working, although if the 4.1 version works and supports an 8 bit bus, it should work on the 4.0 as well.
#define DISPLAY_D0 19 // FlexIO3: 0/ 19
#define DISPLAY_D1 18 // FlexIO3: 1/ 18
#define DISPLAY_D2 14 // FlexIO3: 2/ 14
#define DISPLAY_D3 15 // FlexIO3: 3/ 15
#define DISPLAY_D4 17 // FlexIO3: 6/ 17
#define DISPLAY_D5 16 // FlexIO3: 7/ 16
#define DISPLAY_D6 22 // FlexIO3: 8/ 22
#define DISPLAY_D7 23 // FlexIO3: 9/ 23