Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 19 of 19

Thread: PC Engine emulator, PSRAM experiment

  1. #1

    PC Engine emulator, PSRAM experiment

    I would like to share my first experience using PSRAM (SPI RAM) on the Teensy 4.0.

    Few weeks ago I decided to port one more emulator core to my MCUME project.
    https://github.com/Jean-MarcHarvengt/MCUME

    The core is TGEmu, a PC engine emulator written by Charles MacDonalds.
    It is a challenging emulator wrt RAM requirement.
    In addition to the 128k ILI9341 frame buffer, the core has about 200k of local variables + 128k of background cache + 512k of sprites object cache. Finally ROMS loaded into RAM are between 256kb and 1MB!
    There was clearly not enough memory in the T4 for it but I had ordered few IPS6404 PSRAM devices some time ago, so I had an opportunity to try them.

    I connected the PSRAM to SPI2 (SPI mode, not QUAD SPI)
    That means that I could forget the 'build in' SD card for my disk I/O.
    The ILI 9341 is connected on SPI0.
    I use one MQS channel for Audio (pin 10).
    My plan was to use the PSRAM for storing the game rom image. Using the PSRAM mostly in reading mode.
    The rest I could almost store in the RAM of the teensy.
    The game image is loaded to the PSRAM at startup. Initially it was read from USB storage but I had some hang up when reading the image from USB to PSRAM (not clear why). Another problem is that the uFS + USB library are using almost 70k of RAM.
    So I decided to go back to the good old SD library and connect the SD port of the ILI display on SPI0, together with the display (using another CS of course)
    As the image is copied at startup from SD to PSRAM, SPI0 can be used exclusively in DMA mode for the display later on.

    For the PSRAM driver, I used a cache of 16 pages of 16/32 bytes in RAM. depending of the game, the emulated CPU accesses and jumps at few locations.
    Bigger page size results is freezing the game continuously.
    CPU usage in TGEmu is also extensive. I had to compile the code for fastest and overclock the T4 to 800Mhz.

    You can see the result in this video
    https://youtu.be/Ot9RgDMqdF4

    I said that I could almost fit all RAM buffers (except the game image) into the T4 memory.
    In fact I had to cheat for the 512k RAM objects buffer which results in buggy display in some games. To go around I really would need the all heap (malloc area) to be available for the emulator.

  2. #2
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,125
    Just posted this: T4-0-Memory-trying-to-make-sense-of-the-different-regions

    If you do the math for sizeofsomememory - what does it show in your code? Wondering if there is a usable chuck of RAM going to waste in this case - of course it will vary as code changes.

    On that thread are notes on leaving code in FLASH - has that been done to free up RAM? Also - should be on that thread is the imxrt-size code that will display some memory map details - including showing how full the last 32KB ITCM code block is.

    Very cool you got PSRAM working. Would be cool to see that code broken out.

  3. #3
    if I run the tool of Kurt I get:

    FlexRAM section ITCM+DTCM = 512 KB
    Config : aaaaaaff
    ITCM : 110976 B (84.67% of 128 KB)
    DTCM : 377536 B (96.01% of 384 KB)
    Available for Stack: 15680
    OCRAM: 512KB
    DMAMEM: 4672 B ( 0.89% of 512 KB)
    Available for Heap: 519616 B (99.11% of 512 KB)
    Flash: 405536 B (19.96% of 1984 KB)


    I will isolate the code of the PSRAM in a new project.
    I also would like to investigate why I cannot use DMA Transfer with the PSRAM as I use for the display.

    I also have doubt about the clock setting in the SPI driver.
    The PSRAM supports 84MHz.
    If I pass that parameter, I don't reach that speed over SPI.

  4. #4
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,125
    Quote Originally Posted by Jean-Marc View Post
    if I run the tool of Kurt I get:

    FlexRAM section ITCM+DTCM = 512 KB
    Config : aaaaaaff
    ITCM : 110976 B (84.67% of 128 KB)
    DTCM : 377536 B (96.01% of 384 KB)
    Available for Stack: 15680
    OCRAM: 512KB
    DMAMEM: 4672 B ( 0.89% of 512 KB)
    Available for Heap: 519616 B (99.11% of 512 KB)
    Flash: 405536 B (19.96% of 1984 KB)


    I will isolate the code of the PSRAM in a new project.
    I also would like to investigate why I cannot use DMA Transfer with the PSRAM as I use for the display.

    I also have doubt about the clock setting in the SPI driver.
    The PSRAM supports 84MHz.
    If I pass that parameter, I don't reach that speed over SPI.
    That looks like that currently leaves a 12KB region orphaned in ITCM RAM. I just posted code here that locates that free space and gets a pointer to it: T4-0-Memory-trying-to-make-sense-of-the-different-regions

    TD 1.48 shipped with an SPI limit of 60 MHz (?) on the clock. @KurtE did a mod the other week that allowed for an 80 MHz clock that would give a bit of a boost. Seems there was a PULL request because it worked for tested displays - it may make it into TD 1.49 beta. KurtE posted his change on github.

  5. #5
    Thanks for the tip.
    My understanding from the post is that the ILI9341 was running at about 40MHz with the old limit.

    With the change I can set it up to 60MHz for the display and same for the PSRAM.
    Going above with the PSRAM (80MHz) results in error.
    Same for the display BTW.

    So there is a noticeable improvement but still far from the SPEC of the IPS6404 (104MHz for the SQ version)

    Does the current SPI driver supports QUAD SPI transfer? How to setup the extra pins for QUAD SPI? Any example shared somewhere? Thanks.

  6. #6
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,125
    Yeah - speed may have been 36M max.

    The ili9341 I tried went to 80 MHz fine

  7. #7
    As I also have the SD card on the same bus as the display, the load is probably slightly different.

  8. #8
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,125
    Quote Originally Posted by Jean-Marc View Post
    As I also have the SD card on the same bus as the display, the load is probably slightly different.
    Indeed something is different with the display or connections - was cool testing KurtE's update ran on the ili9341 as connected, and @KurtE and @mjs513 both tested with ili9488's IIRC.

    But those wouldn't relate to the PSRAM connections depending on how they were made - except to know that good lines to device the 1062 with KurtE's change was able to run at 80 MHz.

    Perhaps showing how it was connected would allow others to setup and test. I have some of those chips - but it would take KurtE's interest {or similar} to setup with logic analyzer or perhaps adjust the code timing or SPI ordering to get function at 80 MHz.

    … prior post was cut short as the doorbell rang. Nice it went up to 60 MHz.

    Other than the SDIO bus running 4 data lines - notes about QSPI support on other SPI's didn't come up as an option with the pins presented.

  9. #9
    Hi,

    I tried on another setup where only the ILI9341 (2.2") was connected on SPI0.
    I can reach 100MHz clock in DMA and I also did not notice any issue without DMA.
    This w-e I will create a PCB for the version where the SD card is connected on same SPI0 bus. I probably had some weird long wire connections.
    I will confirm with pictures!

    How do I post pictures on this forum BTW?

  10. #10
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,125
    Quote Originally Posted by Jean-Marc View Post
    Hi,

    I tried on another setup where only the ILI9341 (2.2") was connected on SPI0.
    I can reach 100MHz clock in DMA and I also did not notice any issue without DMA.
    This w-e I will create a PCB for the version where the SD card is connected on same SPI0 bus. I probably had some weird long wire connections.
    I will confirm with pictures!

    How do I post pictures on this forum BTW?
    Cool display works faster - test so far shows it hit 80 and no jumps until 120 as the match works out when running a benchmark.

    To post pics jpeg or png typically and must be under 1MB AFAIK or they are rejected. There is an 'insert image' icon on the normal toolbar - and in 'Go Advanced' where Any file attach you can pick a pic too.

  11. #11
    Here are few pictures of the T4+PSRAM piggy back and the PCB breaking board I created for the emulation project
    Click image for larger version. 

Name:	T4piggy.png 
Views:	31 
Size:	186.7 KB 
ID:	18157
    Click image for larger version. 

Name:	pcbT4.png 
Views:	32 
Size:	446.1 KB 
ID:	18158
    With both the ILI9341 + the SD card (from the ILI) on SPI0, I cannot go above 60MHz (ILI9341 alone I could go to 100MHz)
    Not sure what the default SPI clock is in the SD library. It probably fails there when I go above 60MHz.
    PSRAM alone on its SPI2 bus is still at 60MHz too. Above is giving errors.

  12. #12
    More emulators using the PSRAM module on the Teensy 4.0.
    Next to PC engine now Gameboy, Sega Master System, Megadrive and AtariST are using it.
    I had to struggle a week with the SW before figuring out that the soldered PSRAM chip was defect and that was not always visible. Now all ok!
    https://youtu.be/j2sKw7KYpEo
    https://github.com/Jean-MarcHarvengt...ster/README.md
    The list becomes bigger every day: Atari 2600,Odyssey,colecovision,Atari5200,Vectrex,NES,PC Engine,Sega Master System,Sega Game Gear,Sega Megadrive and Gameboy.
    And for the computers: ZX81and spectrum, Atari800,C64,AtariST, 8086 XT.
    Only theTeensy 4.0 could do them all!

  13. #13
    Senior Member
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    578
    Any chance that you share your code to access the IPS6404? Just ordered a bunch of them for experimenting, so some working examples to start from might be useful.

  14. #14
    Code is within every emulator but I created a separate project at:
    https://github.com/Jean-MarcHarvengt/psramips6404

  15. #15
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,125
    Quote Originally Posted by Jean-Marc View Post
    Code is within every emulator but I created a separate project at:
    https://github.com/Jean-MarcHarvengt/psramips6404
    Thanks for the share Jean-Marc … Did a fork and clone to test with.

  16. #16
    Senior Member
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    578
    Looks good, thanks a lot for sharing. I'll give it a try when the chips arrived.

  17. #17
    Junior Member
    Join Date
    Jul 2017
    Posts
    3
    Just a few thoughts about using PSRAMs after struggling with them in my projects. Maybe they will be helpful in some way.
    Although from the outside these chips invite to use them as a regular serial SRAM there is one potential trap:
    If we look at the datasheet:
    https://github.com/Edragon/Datasheet...S%2064Mbit.pdf
    Page 21
    tCEM parameter: CE# low pulse width, value max 8us
    PSRAMS being a DRAMS with all the refreshing circuitry and a serial IO built in need some time for internal operations, hence the one single I/O burst length is limited to 8us only.
    If exceeded it will return garbage.
    So, depending on the clock frequency of the SPI, to avoid errors the PAGE_SIZE value should be limited in order to not to exceed the allowed 8us max burst length.
    IE, for 70MHz clock:
    (8us * 70MHz)/ 8 = 70 bytes total
    subtracting 5 bytes (1 command, 3 byte address, 1 for wait cycles) gives max 65 bytes long data burst in theory, so a PAGE_SIZE of 64 bytes should work.
    For 60MHz the burst length goes down to 60 bytes or 55 bytes of data.
    Looking at the PSRAM_T::begin code, a default linear burst access is used. The max clock frequency is limited to 84MHz in that mode.
    Perhaps a way to speed up the transfer would be to set the chip to work in 32 byte burst wrap mode and access data in 32 byte chunks with a higher clock rate.
    Of course assuming the buffers stored in the PSRAM are aligned with it's page size of 1k. For accesses crossing the page boundary the clock is limited down to 84MHz.

  18. #18
    Thanks a lot for the spec hidden detail but by luck the current page size in the driver is 16 bytes.
    So it reads max 16bytes in an SPI transaction (always from a 16 Bytes page boundary, at 0, 16, 32...)
    So we are still below a 4us CS low pulse in total.
    I tried increasing from 70MHz to 80Mhz and it starts failing directly, with our without burst mode command after the reset.
    I get a bit confused to be honest...

  19. #19
    I meant 16 bytes + the 5 bytes overhead (1 com + 3 add + 1 wait) but still below 4us total (2.4us at 70MHz if I am correct)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •