Game Console with iMXRT1062

sRGB565

Member
The system itself was mentioned earlier in the SDRAM thread but I decided to give the concept some love and bring it into its own thread.

This system has 3 iMXRT cores (1 is the CPU with 128 MB RAM, 1 acts as the GPU with 64 MB RAM, and one is the security processor), with an ESP32 as a boot processor and wifi controller. The system will use an SSD for storage and display HDMI output via the ADV7513.

If you would like more details, please consult this design document: https://docs.google.com/document/d/1T3CWqKoCLykcZJrLABx4obLHwfnEyQMU5Q8tE1WKycc/edit?usp=drivesdk

We are currently in the PCB design stage, which will probably be the worst for my team, since we already have experience with low-level programming and drivers. This will be a long journey, but as I already told @Dogbone06, I plan to bring a lot of the proprietary code I write in a modified form over to the Teensy ecosystem.

If anyone has any questions, comments, or critiques, please feel free to respond below!
 
This is a very cool and interesting project!
Is this a product you are developing, or is it just a very advanced project?
 
Is this a product you are developing, or is it just a very advanced project
Currently it's a advanced project but if we can do a half decent job of marketing we'll do a true product release. But that'll probably be in a couple years at best.
 
This is a super cool idea!!

I'm curious on the link between the cpu and the gpu.
Graphicly what does the cpu handle?

Does it send vertex data etc every frame?

Or is that all loaded initially on the gpu and the gpu will handle all screen space transformations?

Are you using a parallel data link between the 2?

The teensy does have a pixel pipeline built into it. Have you looked into using this on your gpu.

What kind of games are you designing for it?
 
Thanks! It has taken no small amount of time to come up with the concept, and we're still refining it. The current major step we're on is PCB design, which since I'm not at all good at it, is going to take a while.

I'm curious on the link between the cpu and the gpu.
Graphicly what does the cpu handle?
I'll detail the link further below. As for the CPU's data handling capabilities, I'd like the CPU to upload mesh data once if possible then provide a base transform for it (so the GPU does T&L). I'm looking at an OpenGL-like command set (Maybe OpenGLES if that makes more sense). The end goal is to have the CPU run the game tick and dispatch rendering commands so the GPU can use what is in VRAM and render the game.

Does it send vertex data etc every frame?
From my limited knowledge of OpenGL it does this but I'd like to avoid re-uploading mesh data constantly. I'll have to find a balance between performance and flexibility.

Or is that all loaded initially on the gpu and the gpu will handle all screen space transformations?
The GPU will have a rendering code stub loaded onto it at boot time by the OS, and shaders can be linked to this code stub. The GPU will handle T&L, the geometry, vertex, and fragment shading steps, and render output (though for a lack of TMUs this may not play too nicely). All meshes and textures are uploaded at runtime when the programmer uploads them.

Are you using a parallel data link between the 2?
Yes. Since I already have to route SDRAM and the PS3 uses a large parallel link to communicate with the RSX, I'm looking at either an 8 bit or 16 bit parallel FlexIO link at 120 MHz for the CPU and GPU to communicate.

The teensy does have a pixel pipeline built into it. Have you looked into using this on your gpu.
I'm currently planning on using the PXP for hardware accelerated blitting (so good for UI). I'm not sure how much I can do the main 3D render pass with it but I also plan to have a 2D blob for the GPU so it shouldnt be too bad.

What kind of games are you designing for it?
We're planning on 2D game development for a while until the 3D blob matures. We'll probably port a tech demo for a Java engine I wrote and also release a 2D RPG on there for launch titles.

Hope this gives you a better idea so far of what we plan for this system and what it may be capable of.
 
Finally got around to PCB design. We took the Gen. 5 design provided by @Dogbone06 and redesigned it for our current console design.

However, I have a few questions. I'm curious about this, first of all:
1718905918640.png

Why is there +5V running under the chip?

Additionally, how would I relocate the SDRAM? I need to do this so I can add another SDRAM. Was there a specific way that this has to be length-tuned, or does any length work, so long as I mitigate signal skewing?

And lastly, while unrelated, where would I find the toolchain for this chip? I've personally not really liked using the Arduino IDE and would far rather get a bare toolchain. I do understand that I will lose a lot of library support by doing this, but it was kind of a given that my team and I would have to write a ton of code anyway. I'd also assume the barrier isn't so massive that I can't also port the libraries I write back to Teensyduino.

As I said, I am not good at PCB design. I would find someone else to do it, but I don't really have budget for that.
 
Last edited:
Finally got around to PCB design. We took the Gen. 5 design provided by @Dogbone06 and redesigned it for our current console design.

However, I have a few questions. I'm curious about this, first of all:
View attachment 34726
Why is there +5V running under the chip?

Additionally, how would I relocate the SDRAM? I need to do this so I can add another SDRAM. Was there a specific way that this has to be length-tuned, or does any length work, so long as I mitigate signal skewing?

And lastly, while unrelated, where would I find the toolchain for this chip? I've personally not really liked using the Arduino IDE and would far rather get a bare toolchain. I do understand that I will lose a lot of library support by doing this, but it was kind of a given that my team and I would have to write a ton of code anyway. I'd also assume the barrier isn't so massive that I can't also port the libraries I write back to Teensyduino.

As I said, I am not good at PCB design. I would find someone else to do it, but I don't really have budget for that.
* I am far from an expert, I've only designed a few BGA boards thus far. It's sometimes hard to route, the 5V is there because it was the easiest way. Going to a 6-layer board would make life many times easier. I just got my first 6-layer board today. And just as you said about yourself, I am also not very good at PCB design. I just try my best as no one else will do it for me. I've learned allot but I've got heaps left to learn.

* Moving the SDRAM isn't something I'm familiar with. I simply copied the traces from the 1062 EVAL board, to save myself lots of headache. I pasted in the traces and then renamed them manually. Moving the SDRAM probably need equal length tuning and such.

* I personally and @Rezo uses PlatformIO, it's quite good. It's highly customisable. I haven't done much tho in terms of customising. I just use it together with the Teensy. Below is a bare Teensy config. It's in the core folder of your PlatformIO folder and is called "platformio.ini"

platformio.ini
Code:
[env:stable]
;platform = teensy@4.17.0 // Pick specific version
platform = teensy
board = teensymm
framework = arduino
;change MCU frequency
board_build.f_cpu = 240000000L // 240MHz
;board_build.f_cpu = 600000000L // 600MHz
 
Thanks for the answers. If you can give some progress reports on it.

Roll your own gpu is a really cool idea (and a large amount of work)

So the gpu will be handling a lot on 1 core. Pixel transformations, shader pixel manipulation etc as well as listening for incoming commands.

I'm guessing you will be using multiple buffers on the gpu for the pixel manipulation. Are you then going to use 2 final buffers for the output. 1 buffer with the final output being written to. The other buffer using dma to output to the screen.
Then switch between the buffers each frame to output to the screen.

In my current project I'm using a 800x480 16 bit parallel display running at constant full screen refreshes at 60hz using the vertical sync on the display to know when to start outputting the next frame.

I'm using an 8 bit back buffer with a 256 colour look up table on 1 teensy. (Not enough memory for a 16 bit buffer) it's fast enough to run primitive 3d graphics (triangles with depth sorting and back face culling) and it works well. But I can't do any post processing with a look up table. (No Dma is used either) Depending on the number of pixel colour changes on the output I can output all bytes to the display from 2 to 6 milliseconds.

But 1 of the benefits to an 8 bit back buffer. For fast screen clearing, drawing rectangles etc I can just use memset to change large amounts of pixels fast. Switching to a 16 or 24 bit backbuffer you would lose this ability.

Unless each colour byte for each buffer is stored in its own array. You could then use memset to do fast sequential pixel changes. The trade of is when grabbing the next pixel to the output your having to access different arrays. I've wondered if the trade off is worth it.
 
Currently due to the severe lack of advanced PCB design, we're going to use existing x64 boards (i.e. old laptops) for a testbed for a while. We will continue with this project but currently we need resources that we just cannot acquire at this time.

However, according to our design, I was planning on putting 2 24 bit RGB buffers (1280*720p) in SDRAM (since we'll have 64 MB of "VRAM") and rendering to those. While blits won't be as fast as with an 8 bit buffer, we do get full color. We will use eLCDIF to write the data to an HDMI chip, which will display the data on the TV.
 
3 bytes times 1280 times 720 is 2.64 MB so two of those buffers fit easily in 32MB single SDRAM/VRAM as on the DevBoard design. That leaves 26.7 MB free from the 32MB - enough room for larger display or 10 more such buffers?

Noted on other thread that NXP's 1170 EVAL indicates twin SDRAM's that would give a PCB layout as @Dogbone06 used for the single SDRAM from the 1060 EVAL layout. But that seems like extra complexity for SDRAM that isn't required yet?

Interesting/complex part seems to be the 'BUS' interconnect of three 1062's? If a simple SINGLE PCB design was made where all were the same and each of the expected post #1 were accommodated, then a single batch of boards could be used to test the concept and evolve the BUS soft/hardware as needed to get functionality established at minimal cost/effort. Those first run boards could then be replaced/repurposed where only 32MB SDRAM was needed, and the other boards evolved with advanced specs as needed.
 
This is a really good idea. I had originally thought of making a test batch of boards but discarded it due to a lack of necessity prior.

The bus interconnects are all point to point in a kind of star topology. These aren't the main source of complexity, unfortunately, for us. Rather, we have to learn KiCad and other tools so we can figure out what to route and how to route it. It will take a lot of time but we'll get there eventually.

Once we get this board down, we will re-enter familiar territory for my team.

I do appreciate the time you have given thus far to assisting my and my team on this project. Thanks for all so far!
 
This is a really good idea.
:)
If there is a 'best' common feature set, then getting that working in one batch would be a good start for proof of concept.

The Star interconnect should 'simply' work - but look how long sliced bread took :) Getting it wired and then software refined for usefully fast and efficient transfer will be 'trivial' after it works. There is a Master Slave SPI library tonton81 made for T_3.6/3.5 interconnect when it was needed that was fabulous. He has updated it for T_4.x as well, though not used that. It provided error checked messages over SPI at high throughput.

The 1062+SDRAM work @Dogbone06 has done should be a great help. His first version 4.0 Development Board made it around for the SDRAM development - minor expansion for a v4.5 and v5.0 to extend display and camera support with USB host and CAN and more robust USB-C power as well as (???) given 16 pins on two ports, now a batch of boards that just finished first bench testing to fully prove the design. Those designs as seen are available as a starting point ... Good Luck
 
Back
Top