A Teensy 4.0 demo. Playing the old Stunt Island DOS game. Real time 3D.

Status
Not open for further replies.

Mike Chambers

Well-known member
The Teensy 4.0 is a powerhouse, I'm impressed! Playing an old DOS game, emulated 8086 code on the T4.0 with real time 3D rendered gameplay. It handles it with no problem. I suspect it would be a lot smoother if the SPI LCD module wasn't such a bottleneck, but even still this is very playable. :D

I'm looking to release this emulator project soon, too if anyone else wants to mess with it. Just need to tidy a few things up. It runs just about any real mode DOS program you throw at it.

CPU and system emulation is very intensive, so the speed this thing is cranking out is pretty insane for a microcontroller.

 
Jean-Marc, VERY nice project! I'll have to give yours a try and check out the various systems.

It looks like you're actually using my own 8086 code in yours! :)

It didn't run so well on the AVR, and I didn't expect it to, but it was fun/funny seeing an Arduino do that.

The T3.6 with SPI RAM project was mine too. That one didn't support MCGA/VGA though, this one does.

Looks like you also did NES! That's actually another one I was planning to do myself after this. I already wrote one for the Mega2560, but again of course, it was far too slow to be useful.
 
Last edited:
Hi Mike,
So you are actually the person that did the initial Atmega port but also the T3.6 with spi RAM. I indeed started from that version. I have a greetings section at the bottom of the page, I will add your name there too;-)
I had ordered the spi RAM you used with the t3.6 but never tried on the T4. As I use mqs audio together with the display, the only full spi bus available is the one of the sdio below the teensy. For the AtariST emu, I was planning to use it and move to usb for mass storage device but finally managed to fit it in the 1mb of the T4. I have an Amiga emu port that requires 6MB , I would be interested to use spi RAM for it. I currently have MAME also working, it is in my git project too. Some games would enjoy 4MB as well.
 
Yes, it's actually a port of a Windows/Linux version I started writing from scratch back in 2011 called Fake86. I was very new to the C language then, and it shows. The code is very ugly! It is at least functional, though.

You actually don't need to use SPI RAM on the T4 at all to get a full 640 KB. I allocated 512 KB in the TCM with the DMAMEM keyword when defining the array, and then the rest of it fits easily into the standard global variable area, plus enough room left over for MCGA graphics and then some! That was the main draw of the 4.0 for me, the SPI RAM was very much bottlenecking the emulation speed on the 3.6.

Anyway, I love seeing other microcontroller emulator projects, and you're doing a great job if your github is any indication. I will take some time tonight to try a few of your emulators. Porting MAME is impressive! So does this mean I can play Street Fighter 2 on my Teensy? Or is that one too resource-heavy for it? :)
 
no, don't dream ;-)
Only early 80ths classics are working (dig dug, moon patrol, pacman...). Even Xevious is problematic.
The main issue is that Mame expands all GFXs to min 8bits per pixel in RAM before emulation.
The 512k region of malloc quickly runs out of memory. I also use the trick of an extra array but it all depends of the game. I prototype on windows, tracing memory used then I compile on the Teensy. I also had to split in 8 binaries depending on the game you want to run. I will post a video but I am busy to make a PCB for the T4.

By the way, when you compile for the T4, please use smallest code when compiling. Else code get duplicated in RAM too. What is the gain of DMAMEM versus normal array? You mean for the DMA ILI driver? Normally the Malloc goes to a "almost" dedicated 512k. The rest is bss/data and stack.
 


What is the gain of DMAMEM versus normal array? You mean for the DMA ILI driver? Normally the Malloc goes to a "almost" dedicated 512k. The rest is bss/data and stack.

Nice looking work!

AFAIK the answer to this is : Using DMAMEM provides compile time alloc to the same area that malloc() uses at runtime - the second 512KB RAM block.
 
That makes sense. Paul once indicated that the second 512KB was almost available for malloc at the exception of a library/driver using a bit of DMAMEM.
My experience is that as soon an application goes over 500k of standard bss/data variables (not DMAMEM annotated!), it most of the time does not boot at all (need to press the T4 button to recover). Probably because it also needs to fit the stack in the first 512k RAM.
I was always confused at the beginning because I thought the T4 had 1.5MB of RAM but it has 1MB.
 
That makes sense. Paul once indicated that the second 512KB was almost available for malloc at the exception of a library/driver using a bit of DMAMEM.
My experience is that as soon an application goes over 500k of standard bss/data variables (not DMAMEM annotated!), it most of the time does not boot at all (need to press the T4 button to recover). Probably because it also needs to fit the stack in the first 512k RAM.
I was always confused at the beginning because I thought the T4 had 1.5MB of RAM but it has 1MB.

Indeed only 2 sets of 512KB - 1.5 MB would be cool … maybe the next version.

Never filled the filled the first 512KB of RAM and had T4 halt/not boot? If you have a repro case I wonder if there is a fix for the linker to throw a warning? Something maybe the linker script could trigger? Though suppose there is no way to know the stack usage in advance of it being abused ...
 
I can provide you a sample but indeed, you could only warn the user as the stack usage for the remaining memory will be application dependent.
 
hi Mike,
About the SPI display you should use DMA mode on the T4 (and T3.6)
There are various versions on the T4 forum but the one you can find in the MCUME project is derived from the one of Frank Boesing (C64 emu port). I just adapted it once for the ESP32 where only 32k buffers were allowed => it uses 4x32K. The DMA runs in a loop (interrupt based) to copy them over SPI. It runs at 50FPS. I also added support for both ILI9341 (320x240) and ST display (240x240). It also runs on T4 and T3.6. Not sure why you say that SPI is the bottleneck because most emulator I ported runs at (close to) 50FPS with that driver.
 
hi Mike,
About the SPI display you should use DMA mode on the T4 (and T3.6)
There are various versions on the T4 forum but the one you can find in the MCUME project is derived from the one of Frank Boesing (C64 emu port). I just adapted it once for the ESP32 where only 32k buffers were allowed => it uses 4x32K. The DMA runs in a loop (interrupt based) to copy them over SPI. It runs at 50FPS. I also added support for both ILI9341 (320x240) and ST display (240x240). It also runs on T4 and T3.6. Not sure why you say that SPI is the bottleneck because most emulator I ported runs at (close to) 50FPS with that driver.

Thanks! I was not even aware that there was a DMA mode. I was using tft.DrawPixel. I knew there had to be a better way. 50 FPS sounds great to me! It takes about 200-300 ms (!!) to fill a 320x240 frame by drawing pixels individually.
 
Is there a decent 3D graphics library suitable for the T3/T4? My needs are simple - just wireframe with hidden line removal. I'd be outputting to a vector generator running an XY display.
 
I can provide you a sample but indeed, you could only warn the user as the stack usage for the remaining memory will be application dependent.

@Jean-Marc: Please do. I could use it as a test for my debug_tt to enable FAULT detect and see how it might be handled. Might be an good enough reason to get back to that and make it generally usable.

@KurtE - can your imxrt-size code while scannig the linker output see signs of stack allocs? like from BigList[]:: void foo( ) { int BigList[20000]; // … };
 
Hi,

You can find the example that blocks the Teensy4.0 at startup (needs a button press) in my MAME port git project.

https://github.com/Jean-MarcHarvengt/teensyMAME

load teensyMAMEClassic2 sketch and compile for "smallest code".
The project uses 227504 bytes of code and 500336 bytes of RAM (less than 512k)
That project will work correctly (at least you will be able to flash it and when running, you can flash it a second time)

The sketch teensyMAMEClassic2BAD is the same but the static backup HEAP in "emuapi.h" has been increased from 0x16000 to 0x19000. RAM usage is now 512624.
Flashing that one just blocks the Teensy4.0 and even after powering down, you won't be able to flash anything when it "runs". You need to recompile teensyMAMEClassic2, try to flash it (won't work) and press the button. Then the Teensy will allow uploading the image and will be "alive" again.

You don't need SD or display or whatever to be connected to run the project. Pinout.txt details the connections if you really want to see something on the screen.
 
I got the DMA code working well. The difference is noticeable. Especially during VGA palette fades. Very smooth! Thanks for the tip. I ended up settling on 25 FPS, otherwise the frequent 64 KB memory block copies in TFT_T_DMA::writeScreen started taking a slightly noticeable hit on emulator performance. 25 FPS is more than enough.

Unfortunately, adding that used up quite a bit of memory for what seemed to be the driver's static framebuffer. I ended up having to share the top half of the CGA modes' video memory with the VGA memory. Meaning that B8000 to BA000 is now a mirror of A0000 to A2000 in the PC memory map. The good news is that this hack really shouldn't cause any issues, and I still get 640 KB to play with in DOS. Any software that I can think of is only using either using the VGA or CGA memory area at any given time, and doesn't care what happens in the other.

My first attempt at resolving the memory problem was changing the optimization level from fastest to smallest code since it also increases general memory usage. This worked, but the emulator was much slower. There was at least a 50% loss of speed, which wasn't acceptable to me.

There might be some edge cases where the video memory sharing causes some undesirable behavior, but that should be very rare and this seems to be a reasonable compromise to maintain performance.

I also started playing with getting one of your emulators running on my Teensy, I'll have another go at it this weekend. :)
 
Hello guys,

i am still flashed about what you have done .. i mean .... an atari st ... on a teensy ... still can't get it ... amazing ... *thumbs
is there a chance of using the uart for midi ? i'M asking myself if i could use cubase on this st emulator.
 
Should be doable. With a bit more memory, USB midi could be used. I will put it on my TODO list. Resolution might be the biggest problem.
 
Could you actually run Voyetra MIDI Sequencer Gold on Teensy 4.0 using this emulator? See http://www.vgmpf.com/Wiki/index.php?title=Sequencer_Plus_Gold

How about the timing? You should been also able to emulate the MPU-401 MIDI interface in UART mode though some other MIDI interfaces are also supported. The software actually supports multiple MIDI interfaces, guess you just need to have dedicated I/O range and interrupt for each one.
 
Last edited:
Status
Not open for further replies.
Back
Top