RA8876LiteTeensy For Teensy T36 and T40

@MorganS - thanks for the OSH link! I was building a cart to order and that might get done in coming days. Please post if it updates from V4.1 . I'll aim for T_4.1 usage given already wired SD/USBHost already done - and onboard QSPI RAM.

What does this do for power - just from USB? The 7" display I got specs for 600 mA - so @mjs513 suggested external power was needed.
Also - would it be good to have a pullup on the LCD_CS line? Paul notes that here : pjrc.com/better-spi-bus-design-in-3-steps/

Ordered a few SFun Qwiic devices in Jan - looks like I got 4 of the connectors too.
 
Here is the board you are dreaming of... https://oshpark.com/shared_projects/nYL18hj4

This connects a Teensy 3.x or 4.x to any one of the BuyDisplay displays - both RA8875 or RA8876. It connects the I2C bus for a capacative touchscreen. (If you have a resistive touchscreen, then it's probably accessible through the RAiO chip's interface.) If you use one of the "big" Teensys, they will overhang the board.

I did not put a lot of connections on the board. The goal was to make it small. I put some flat-cable (FFC) connectors on to add some satellite boards which hold buttons and encoder knobs. Those boards are not published publicly yet. Because I wanted to experiment, I've put a FFC connector on to hook up the T4.0 SD connector. You can put an SD card on this board or you can FFC over to the BuyDisplay board's SD card FFC. (Note you can't directly plug a T4.0 FFC to the display board as the pins are different.) For the T4, this gives you access to a second SPI port, if you didn't want to use a memory card but you did want another SPI input.

Then, because there was space, I put footprints on for two QWIIC (SparkFun) connectors. Add sensors or knobs, buttons or even a keypad that way. Or use long breadboard headers on the Teensy and hang jumper wires on the back to connect to whatever you need.

Attached is a PDF schematic so you know where all the pins go to.

I'm going to look into using the T4 parallel bus. I would love to get 8 or 16 bits flowing at 30-50MHz. That's enough for real-time video. Unfortunately the Teensy 4.1 doesn't expose the correct pins for the NXP chip's native LCD interface.

This is great. Are you planning a Resistive touchscreen variant of this board? Or do I need to buy another display with capactive touchscreen.
TOUCH SCREEN (Resistive)
-------------------------------------------------------------
Pin 31 /CS1 --------------------------------> Pin 32 TP_/CS
Pin 32 SCK2 --------------------------------> Pin 35 TP_SLCK
Pin 0 MOSI1 --------------------------------> Pin 34 TP_DIN
Pin 1 MISO1 --------------------------------> Pin 36 TP_DOUT
Pin 24 TSINT <-------------------------------- Pin 33 TS_PEN
--------------------------------------------------------------
Just what I was wanting:)
 
Last edited:
@WWatson I mostly work with RA8875, so I'm used to using its "internal" touchscreen controller for a resistive touch panel (RTP.)

I already wired Teensy pin 0 to pin 33 on the connector for the CTP. If you want that to go somewhere else, then you'll have to scratch through that trace and solder a jumper. A couple more jumpers soldered on the "back" of this board should be pretty reliable because they are so short. I looked at making the wiring compatible with the Teensy 3.6 additional SPI ports but since SCK1 is all the way down the end of the board, reaching that one pin would have almost doubled the length of my board. You have to use a jumper. (And on a Teensy 4, the 3rd SPI port available on the SD card connector is somehow on the "wrong" pins for an SPI-mode SD card.)

Since there's probably not a lot of data going through the RTP SPI interface, I would recommend you switch to a "software SPI" and then you only need to jumper the two additional pins required. This locks up your I2C pins, unfortunately.

@defrgster, I am using USB for power. My power meter shows 220mA for a Teensy 3.2 and a 7" RA8875 screen. Running a Teensy 4.0 and a 7" RA8876 screen, I'm showing 570mA. I have yet to hook up a 10" screen to test.

Why would you pull up CS? It's an output pin. The Teensy drives it high at full power. It never needs to be tri-stated or allowed to float. Paul's "Simple Workaround" is the correct way to do it, without pullups.

In fact, I debated whether to put I2C pullups on my board as the displays I have with the capacative touchscreen seem to have 10K pullups already. 10K is usually, but not always, enough pullup for I2C.

I ordered a set of boards for myself on the standard OSH Park service, so they will arrive in a week or two. Version 4.0 of this board is already at work on my workbench.

I'll take requests (including WWatson's) for a larger version of this board that will utilise a Teensy 3.6 or 4.2. It doesn't cost me much to modify the design and publish it on OSH Park if I don't have to ship boards to myself. What else do you want on such a board? CAN bus would be great to use this as an automotive display.
 
... now I see the discussion that fell out of this thread and into the T4.1 beta thread.

I just got my T4.0 + RA8876 display working with the "redo" library. My first attempt did not work but after updating my code with that SPI.transfer16() code it is working fine. (I broke something.)

I turned the SPI speed up to 80MHz and it stopped working. 75MHz does work.

The speedup from 7.5MHz to 75MHz is not dramatic. It goes from 5000 rectangles/second to 8500 per second. SPI is not the real bottleneck.
 
@morganS - nice you got the SPI speed bump. Did you try with TD 1.52 B4 yet? That or prior had some SPI change on timing.

That is a big mA jump from 220 to 570. This ili9341 OSH board of Paul's has a 5V supply on it ...

Given the best use of this display will be the DMA update - and single device SPI the CS pullup won't be important. Pullup comment was just going by the Paul blog assuring any CS not pulled low - ends up high to avoid confusion.

Your board has awesome promise - so conversation will follow you - but show up elsewhere
 
@WWatson I mostly work with RA8875, so I'm used to using its "internal" touchscreen controller for a resistive touch panel (RTP.)

I already wired Teensy pin 0 to pin 33 on the connector for the CTP. If you want that to go somewhere else, then you'll have to scratch through that trace and solder a jumper. A couple more jumpers soldered on the "back" of this board should be pretty reliable because they are so short. I looked at making the wiring compatible with the Teensy 3.6 additional SPI ports but since SCK1 is all the way down the end of the board, reaching that one pin would have almost doubled the length of my board. You have to use a jumper. (And on a Teensy 4, the 3rd SPI port available on the SD card connector is somehow on the "wrong" pins for an SPI-mode SD card.)

Since there's probably not a lot of data going through the RTP SPI interface, I would recommend you switch to a "software SPI" and then you only need to jumper the two additional pins required. This locks up your I2C pins, unfortunately.

@defrgster, I am using USB for power. My power meter shows 220mA for a Teensy 3.2 and a 7" RA8875 screen. Running a Teensy 4.0 and a 7" RA8876 screen, I'm showing 570mA. I have yet to hook up a 10" screen to test.

Why would you pull up CS? It's an output pin. The Teensy drives it high at full power. It never needs to be tri-stated or allowed to float. Paul's "Simple Workaround" is the correct way to do it, without pullups.

In fact, I debated whether to put I2C pullups on my board as the displays I have with the capacative touchscreen seem to have 10K pullups already. 10K is usually, but not always, enough pullup for I2C.

I ordered a set of boards for myself on the standard OSH Park service, so they will arrive in a week or two. Version 4.0 of this board is already at work on my workbench.

I'll take requests (including WWatson's) for a larger version of this board that will utilise a Teensy 3.6 or 4.2. It doesn't cost me much to modify the design and publish it on OSH Park if I don't have to ship boards to myself. What else do you want on such a board? CAN bus would be great to use this as an automotive display.

Hopefully I did not sound pushy:) I decided to buy another 10.1" display from BuyDisplay. This time with a capacitive touch screen. CAN bus would be nice for remote use.
 
Nope, not pushy at all. You have different uses planned for what I want to use this for. Between us we will create something useful for us and for others.

Question on the touchscreen: The RA8875 library had the CTP driver built in, because it was an organic growth from the RA8875 onboard RTP driver. Going forward, do you want to see all the various CTP drivers mangled into the RA8876 code or use dedicated standalone libraries? My Google searching has not found "the one" definitive FT5316 (7") library. There's one that's part of a weatherstation project (with contributions from Paul Stoffregen) but it is relatively old and not standalone.

The Sparkfun TouchInput library seems pretty good but I haven't tried it yet.
 
@morganS - nice you got the SPI speed bump. Did you try with TD 1.52 B4 yet? That or prior had some SPI change on timing.

That is a big mA jump from 220 to 570. This ili9341 OSH board of Paul's has a 5V supply on it ...

Given the best use of this display will be the DMA update - and single device SPI the CS pullup won't be important. Pullup comment was just going by the Paul blog assuring any CS not pulled low - ends up high to avoid confusion.

Your board has awesome promise - so conversation will follow you - but show up elsewhere

I'm on TeensyDuino 1.51. I will upgrade to check.

I think the Teensy 4.0 is the big power consumer. That chip gets quite warm.

I did follow the link to look at that board. It's got a linear power supply which is going to waste a lot of power on a 12V supply. 80% of the heat will be going out the power supply chip. Good enough for a quick-and-dirty test on the workbench but not something I want to squeeze into a 3D-printed minimal casing with no cooling.
 
Nope, not pushy at all. You have different uses planned for what I want to use this for. Between us we will create something useful for us and for others.

Question on the touchscreen: The RA8875 library had the CTP driver built in, because it was an organic growth from the RA8875 onboard RTP driver. Going forward, do you want to see all the various CTP drivers mangled into the RA8876 code or use dedicated standalone libraries? My Google searching has not found "the one" definitive FT5316 (7") library. There's one that's part of a weatherstation project (with contributions from Paul Stoffregen) but it is relatively old and not standalone.

The Sparkfun TouchInput library seems pretty good but I haven't tried it yet.

Based on the fact that I have never dealt with CPT and the fact that drivers for these devices apparently is not that common I would think keeping separate libraries until more information about available drivers and testing would be a better way to go.Later they could be incorparated into the main TFT drivers after testing.

Just my thoughts:)
 
... now I see the discussion that fell out of this thread and into the T4.1 beta thread.

I just got my T4.0 + RA8876 display working with the "redo" library. My first attempt did not work but after updating my code with that SPI.transfer16() code it is working fine. (I broke something.)

I turned the SPI speed up to 80MHz and it stopped working. 75MHz does work.

The speedup from 7.5MHz to 75MHz is not dramatic. It goes from 5000 rectangles/second to 8500 per second. SPI is not the real bottleneck.

Meant to post over here yesterday but got busy with another design and other stuff. First really cool board - when you get the T4.1 version up on OSH Park think I will order it. Other thing is that using jumper wires we couldn't get over 47Mhz so having a board like yours seems to really allow you to boost the speed.

I do agree that even with that the bottle neck is going to be the RA8876 itself. You might get a bit of a performance boost by adjusting some of the clock settings for system clock etc.
 
The board is usable right now for Teensy 4.1. It will just overhang the end of my board. You can put your own headers or pins directly on the T4.1 to connect whatever else you need. I supposed a smaller/cheaper board would be preferable.

The library right now (I'm using your "redo" branch) is quite inefficient. Every drawing operation switches it to graphics mode, checks it's ready, changes it back, checks it's ready and every single byte of data opens and closes an SPI transaction. To send the 9 registers (18 bytes) required for a rectangle, that's a lot of transactions.
 
The board is usable right now for Teensy 4.1. It will just overhang the end of my board. You can put your own headers or pins directly on the T4.1 to connect whatever else you need. I supposed a smaller/cheaper board would be preferable.

The library right now (I'm using your "redo" branch) is quite inefficient. Every drawing operation switches it to graphics mode, checks it's ready, changes it back, checks it's ready and every single byte of data opens and closes an SPI transaction. To send the 9 registers (18 bytes) required for a rectangle, that's a lot of transactions.

I do agree that its useable as with the T4.1. Just personally I don't like the T4.1 overhang off a board. That wasn't meant as a criticism that its not useable.

Actually, was wondering about the need to open and close SPI transactions each time but haven't got around to playing with it. Been busy with other projects/things as well. Besides setting a test for framebuffer (real preliminary more of a test of concept) was looking at getting the SPI above 38Mhz. At that point I got sidetracked. Feel free to issue a PR for any improvements.
 
I'm on TeensyDuino 1.51. I will upgrade to check.

I think the Teensy 4.0 is the big power consumer. That chip gets quite warm.

I did follow the link to look at that board. It's got a linear power supply which is going to waste a lot of power on a 12V supply. 80% of the heat will be going out the power supply chip. Good enough for a quick-and-dirty test on the workbench but not something I want to squeeze into a 3D-printed minimal casing with no cooling.

The T_4.0 power use is near 100 mA - @manitou posted a speed .vs. mA chart - seems it was loaded with a benchmark - if not it will be more with intense or different usage - he found 174 mA when running the Ethernet adapter.

Yes, that LDO would be wasteful and hot. I have a 1.5 A capable unit handy - that needs min 10V - and I found 12V supply that fits a jack that fits the board - so I might build one to test with. But burning off 7V at ~500 mA isn't pretty. I have some breadboard supplies - want to make sure I have the proper power and not ruin the display.

Good you looked at the code and are seeing some ways to make the display transfer more efficient.
 
Originally Posted by mjs513
Ok just finished a new T4.1 breakout board to test a few of the different displays. Getting ready to order them and then debug. Here is a screen shot. Unfortunately the RA8875/76 is setup for capacitive touch - sorry @wwatson.

Yes added pins for a Sparkfun QWIIC adapter board. Not sure if I need anything else just for Testing displays.
@wwatson said:
This is great I am planning to order another 10.1" from BuyDisplay with capacitive touch. Will you be selling these?

First batch of PCB are still being manufactured at PCBWay so haven't even tested yet. Had no plans on selling the boards but usually post the design on GitHub. Let me get it and test it first. May be overkill if you just want a adapter for a RA8876. Have another in design with jus the RA8876 and the QWIIC connector.
 
Today I'm doing more experimenting with the speed. I've upgraded to TeensyDuino 1.52 Beta 4 and I've hooked up the oscilloscope to check what's really going on.

1. The Teensy 4.0 doesn't seem to do any SPI speed between 60MHz and 79MHz. Use any number in that range and you only get 60. At 80MHz the scope shows that it is trying to give 80MHz but it's not strong enough to drive proper digital transmissions. It just wobbles around 2V.

2. Beating on the library to pull out redundant data transfers and unnecessary begin/end SPI transactions brings measurable improvements but nothing major. At 35MHz it went from 8168 rectangles per second to 8563 rectangles per second.

3. We are really spending most of our time waiting for the RAiO chip to become ready after sending a drawing command. A rectangle might take 100-150 microseconds to finish drawing and a big filled rectangle might take 3,300 microseconds to draw. So it's making hundreds and thousands of requests to the status register waiting for the chip to say it's ready.

4. So, obviously, don't wait. Go do something else. Only wait when you've got data ready to draw to the screen and it's still processing the last command. But this doesn't work. Shapes only get partially drawn (actually they come out white in the lower corners.) I'm not sure why yet. I'm waiting at the top of the drawing routine instead of the bottom and it takes about the same time.
 
1. The Teensy 4.0 doesn't seem to do any SPI speed between 60MHz and 79MHz. Use any number in that range and you only get 60. At 80MHz the scope shows that it is trying to give 80MHz but it's not strong enough to drive proper digital transmissions. It just wobbles around 2V.
Interesting, when we were doing T4 beta testing we had it running at 72Mhz no problem believed we measured this as well but cant find the posts at this point.

2. Beating on the library to pull out redundant data transfers and unnecessary begin/end SPI transactions brings measurable improvements but nothing major. At 35MHz it went from 8168 rectangles per second to 8563 rectangles per second.
Well never hurts to make the lib more efficient. If you want push the changes and I can incorporate it into library.

3. We are really spending most of our time waiting for the RAiO chip to become ready after sending a drawing command. A rectangle might take 100-150 microseconds to finish drawing and a big filled rectangle might take 3,300 microseconds to draw. So it's making hundreds and thousands of requests to the status register waiting for the chip to say it's ready.

4. So, obviously, don't wait. Go do something else. Only wait when you've got data ready to draw to the screen and it's still processing the last command. But this doesn't work. Shapes only get partially drawn (actually they come out white in the lower corners.) I'm not sure why yet. I'm waiting at the top of the drawing routine instead of the bottom and it takes about the same time.
This doesn't surprise me. Did you try playing around with the Display setting for system clock etc. to see if you get any improvement.
 
I don't want to mess with the clock yet. That's another incremental improvement. I just want to use the chip the way it was intended to be used: it does the time-consuming operations while I go do something else. I don't want to spend 3300 microseconds waiting for a single (large) element to draw on the screen. I have better things to do with my microseconds.

Even just issuing the drawing command and then a delay(3300) doesn't successfully draw the shape. It seems like it must get poked in the status register a million times per second.
 
OK, I found the problem. The wrapper functions like fillRect() were setting the text foreground color back to the old color. That was white, so it would paint the lower part of the rectangle in white. (You could actually change color many times while a large shape was being drawn.)

Allowing drawing functions to permanently change the foreground color means that you don't have to wait for the shape to finish drawing. You can do more complex calculations for your next shape without waiting. But as soon as you try to draw, change color or a few other actions, it will check if it needs to wait.

I've added this change to my open pull request.

Works at 60MHz and gets almost 10,000 rectangles per second in my test code. Running at 5MHz doesn't slow you too much: you still get over 5700 rectangles per second. But 60MHz is really going to shine with big block transfers, like what LittlevGL needs. Maybe we can get DMA working for that use-case?

I'm planning to work on a BTE example for the library next, as I use BTE in some of my RA8875 projects and I need it to work on the RA8876.
 
OK, I found the problem. The wrapper functions like fillRect() were setting the text foreground color back to the old color. That was white, so it would paint the lower part of the rectangle in white. (You could actually change color many times while a large shape was being drawn.)

Allowing drawing functions to permanently change the foreground color means that you don't have to wait for the shape to finish drawing. You can do more complex calculations for your next shape without waiting. But as soon as you try to draw, change color or a few other actions, it will check if it needs to wait.

I've added this change to my open pull request.

Works at 60MHz and gets almost 10,000 rectangles per second in my test code. Running at 5MHz doesn't slow you too much: you still get over 5700 rectangles per second. But 60MHz is really going to shine with big block transfers, like what LittlevGL needs. Maybe we can get DMA working for that use-case?

I'm planning to work on a BTE example for the library next, as I use BTE in some of my RA8875 projects and I need it to work on the RA8876.

I saw what you did with the changes and issues you mentioned. Nice work on fixing them. I went ahead and merged your changes into the redo branch and gave it a quick test with 3d rendering and didn't see any issues pop up.
 
@MorganS, @wwatson
Well good news is that I am now running at 75Mhz. I made up a Dupont 2x20 header with short wires to the T4.1 and with the updated transfer16s and other changes it started working at 75Mhz. Just figured what I would let you all know.
 
Sounds great! Mine still runs at 0Mhz :p :D Still no change in the still no change since:
Code:
2020-04-17 22:40 Shenzhen, delivered to air transport

Sounds like good progress
 
Short wires and 75 Mhz sounds great though!

Mine still good, though at 0.00 GHz - here almost a week. Adding things like 2x20 Dupont to cart - amazon delivery 5/8 for those. Though have on hand enough to make a test board if I tried.
 
ROP Cheat Sheet

I'm trying out some of the image-loading options. The "Raster Operation" ROP codes were a bit of a mystery so I've made a "cheat sheet"...
RA8876 ROP Codes.jpg

I'll push the code for this to GitHub as an example sketch for the library. I haven't finished tweaking it yet.

But, to explain the image, first it uses normal drawing commands (rectangle, circle, triangle) to draw 15 "background" images. Then the picture of the Rubik's cube is loaded from a PROGMEM array on the Teensy. 172x172 pixels uncompressed RGB565 in a .h file is 348KB. Kind of large, but not excessive. Sending this to the display is relatively slow but 75MHz gets it done in 42 milliseconds. (220ms @5MHz). Then we send the image again with the chromakey option. The image has been edited so the "white" background is not exactly the same white as the white in the image.

Now that we have the entire image in the display's memory, further copies can be done a lot faster. Really fast if you consider that you can send it the commands in 0.049 milliseconds and then go away to leave it working. Timing how long it keeps working reveals some interesting facts about the ROP operation. Plain black or white takes 0.7ms to fill the area. XORing the images together is the slowest and takes 2.1ms.

The ROP operation takes 2 sources and a destination. The strange part is operations that only use one source are 16% faster to use the second source. A straight copy can take either 1.3ms for source 0 or 1.1ms for source 1. I still need to drill into this further.
 
@MorganS
Thanks for posting the cheat sheet. I was going through it last night trying to understand the difference! One of the things I added to the lib was to drawing a 3d render of an image to one of the memory buffers and then transfer it back to the main display. If I am not doing gourand or phong shading works beautifully but I am just using a straight memcopy, was wondering if doing a memorycopywithROP would make a difference. Here are the functions that do that:
Code:
void RA8876_t3::useCanvas()
{
	displayImageStartAddress(PAGE1_START_ADDR);
	displayImageWidth(_width);
	displayWindowStartXY(0,0);
	
	canvasImageStartAddress(PAGE2_START_ADDR);
	canvasImageWidth(_width);
	activeWindowXY(0, 0);
	activeWindowWH(_width, _height);
	check2dBusy();
	ramAccessPrepare();
}

void RA8876_t3::updateScreen() {
	bteMemoryCopy(PAGE2_START_ADDR,_width,0,0,
				  PAGE1_START_ADDR,_width, 0,0,
				 _width,_height);	
}
this you can see in post 32(https://forum.pjrc.com/threads/5856...sy-T36-and-T40?p=237441&viewfull=1#post237441) at about 12seconds how fast it is - without it the rendering is not smooth as you might imagine.
 
Yes, that's certainly one method to reduce flickering. I use the RA8875 with its two "layers". The RA8876 has a lot of "pages" so I think you should be using 2 pages to do that kind of thing. Paint the next frame to the page that isn't displayed and then switch pages. Using bteMemoryCopy() adds another delay and it won't align with the true refresh rate of the screen. Switching pages should update the entire screen starting on the next refresh rather than risking copying pixels while it's in the middle of a refresh. (I have not tested this.)

The ROP is used to do slightly more complex tricks than chromakey. Like if you wanted to mask an image to a circle (delete everything outside the circle) then drawing the not-circle in chromakey color is more difficult than drawing a circle of white on a black base and then doing an AND with the original image. But since it's only just slightly more capable than chromakey, I don't see it being used very often.

The chip in the Teensy 4 has the same ROP abilities. If you needed to do that kind of stuff then it might be easier to do it there rather than at arm's length on the RA8876 memory.

For the 3D rendering, I would like to try something down the path of what LittlevGL does. It doesn't need a big frame buffer to hold the entire screen in memory. It tries very hard to identify which areas of the screen have changed and then spin up a small buffer which gets re-used if the area to be repainted is bigger than the buffer. That system does very well to reduce the data to be pushed down SPI. Combine that with layers/pages so you don't see the blocks being painted and you could get good animation speed on non-full-screen areas.

You would think that the memory-copy functions would be useful to have a mouse cursor or game character move over a background without erasing it. Save the piece of background you are going to damage and replace it after the character moves away. But what if your industrial computer display wants to update the numbers under the cursor? You end up just repainting the whole display and drawing the mouse cursor last.
 
Back
Top