Teensy 3.6 VGA driver

Hi qix67, at 180Mhz, you would have 180,000,000/(320*240)/60 = 39 cycles for each single pixel for a 320x240 frame. Not a lot, but perhaps doable for a simple RLE decompression, with >50% CPU still free.
I think you take a big shortcut in your computation. First, even if the frame buffer is 320x240, at least currently, nearly all modeline use a line repeat = 2, this means the DMA will work as if the frame buffer is 320x480, this doubles the number of used cycles.

When DMA works, due to byte access, it should use at least 5-6 cycles on SRAM per pixel. During this time, the CPU won't be able to access the frame buffer.

When a DMA TCD must be loaded and stored, 32 bytes must be stored to SRAM and 32 bytes must be loaded from SRAM. It is possible to avoid SRAM access conflict by storing frame buffer in SRAM_L (lower 64KB) and having data in SRAM_U (upper 192KB) but a conflict will always happen when drawing inside frame buffer. Moreover, SRAM_U requires an additional wait-state when accessed (that's what kinetis reference manual says).

Then, you might have optimisations like an interrupt only every 8 lines which fills a 320x8 buffer to save on interrupts.

However, in this case, the solution is not 100% hardware anymore. Moreover, this means you only use sprites because graphic primitives won't be able to draw easily in a compressed frame buffer. But I agree with you, working one line at a time is inefficient. Also, 2 buffers should be used to have one currently displayed while building the other.

If everything works as expected, I hope to be able to adjust pixel width as I want, improve DMA bandwidth usage using 32 bits access instead of 8 bits and to solve the memory problem. At first, I want to reach a frame buffer of 800x600 pixels (really) in RGB332 and later 800x600 in RGB565 with only 2 or 3 additionnal cheap components :) but I may hit the DMA speed limit because in RGB565, this mean It has to copy ~57.5MB/s
 
I think you take a big shortcut in your computation. First, even if the frame buffer is 320x240, at least currently, nearly all modeline use a line repeat = 2, this means the DMA will work as if the frame buffer is 320x480, this doubles the number of used cycles.

Not necessarily. The CPU might prepare a line once, then DMA would reuse it twice to generate the signals. Don't know if it can be done on Teensy.

When DMA works, due to byte access, it should use at least 5-6 cycles on SRAM per pixel.

But is it freezing the CPU totally, or only in case of RAM access? If the latter, RLE may decrease RAM use a lot.

graphic primitives won't be able to draw easily in a compressed frame buffer

A few fine algorithms would do the job :) Add some padding after each line so that every point added does not require moving half a buffer etc.
 
Not necessarily. The CPU might prepare a line once, then DMA would reuse it twice to generate the signals. Don't know if it can be done on Teensy.

Yes, it can. In the current code, when a line is duplicated, at the end of the first duplicated line, the DMA returns to the start of the line and restarts to display it.

But is it freezing the CPU totally, or only in case of RAM access? If the latter, RLE may decrease RAM use a lot.

It depends :) If the CPU has data in its cache, it can work without stopping. If not in cache but CPU uses SRAM_U while DMA uses SRAM_L, it still works fine. But if both uses the same SRAM "bank", DMA is configured to have the highest priority and in this case, the CPU is stopped (perhaps able to process incoming interrupts ?).


A few fine algorithms would do the job :) Add some padding after each line so that every point added does not require moving half a buffer etc.

It will take a bit of work to do all this :D
 
A pic of my experimental board :

VGA.jpg

Nope.. I'don't show the other side.. ;) just a few resistors and tons of hot glue.

@Paul Stoffregen: If you have a few minutes, you really should try it.. !
 
Last edited:
Thanks Frank. Is the cap something you added?

It's a remnant part of another experiment.
Note, you need to drill a big hole under the DSUB-15 (female) - the pins do not fit. I added two smaller holes on the sides for the "springs" (<- is this translation correct?) and a lot of hot glue - after soldering - to make it a bit more robust.
 
Yes 'Spring' is a good word to me for those mounting pieces. It would be nice to see the bottom - but it is probably just ugly and not secret :) Just following the wiring provided on github.com/qix67/uVGA

Adding some 1% Metal Film resistors and VGA connector to build to an order.

Frank - why don't you have an OSH PCB posted yet :)
 
Indeed it is a bit early to build something when it may change for the better with added support chips. I'll have connectors for your prototype in about a week.
 
I think qix will post a board ;)

Currently, the only PCB I am trying to make is a breakcoup pcb with a VGA connector, the 2 82E resistors for Hsync and Vsync and pins to plug the pcb into a breadboard but yesterday evening, when I ordered the pcb, aisler fails to recognized my bitcoin payment (mollie.com in fact) and ask me to pay again using a different payment method despite the fact the invoice was paid within the minute and the transaction confirmed within 3 minutes. I hope everything will be solved today else it will be a shitty day (again, like the 3 last ones at work).

I will try to produce a full PCB or at least a full schematic once I have a working solution but be cool, I only use fritzing (I do this kind of stuff since less than 9 months) and I found kicad a bit complex, especially the process from schematic to pcb but I think schematics in kicad are a lot cleaner and easier to read, especially when you have 8 or 16 pins working as a bus.
 
I hope everything will be solved today else it will be a shitty day (again, like the 3 last ones at work).

I hope everything ist OK .. (?!)

Have you tried to read the DDC Channel ? LOL, i don't know if it useful but it sounds interesting.. seems to be I2C @ address 50h - right ?
 
I hope everything ist OK .. (?!)

Yes but I'm not in holiday anymore :( However I have a good news, yesterday I received my SRAM with its very small pins (TSOP-II 44). The remaining components should not be very far. I will solder it on an adapter this weekend.

Have you tried to read the DDC Channel ? LOL, i don't know if it useful but it sounds interesting.. seems to be I2C @ address 50h - right ?

No, I have not even think of it :) I just take a quick look and retrieving data should be easy because it uses 2 dedicated pins and just requires 2 additional 2k2 resistors. Maybe layer
 
Yes, my holidays are over, too.. :-(

I tried to use uVGA with my c64-emulation. I get error UVGA_FAIL_TO_ALLOCATE_SRAM_L_BUFFER_IN_SRAM_L.
Is there any way to solve this ? The heap is always after the variables, so, currently, it not possible to use more than sizeof(SRAM_L)-sizeof(first line) variables.
And no, no chance to use malloc for my vars... :-(

Another question: How much memory will be used with 452x300 ? I thought it is roughly 452x300 bytes.. seems to be more in 3x DMA mode ?
 
Yes, my holidays are over, too.. :-(

I tried to use uVGA with my c64-emulation. I get error UVGA_FAIL_TO_ALLOCATE_SRAM_L_BUFFER_IN_SRAM_L.
Is there any way to solve this ? The heap is always after the variables, so, currently, it not possible to use more than sizeof(SRAM_L)-sizeof(first line) variables.
And no, no chance to use malloc for my vars... :-(

It is a bit problematic because without the sram_l line buffer, the DMA will not be able to always obtain access fast enough.

First, you should use UVGA_DMA_SINGLE instead of UVGA_DMA_AUTO. Auto mode copies 1 sram_u frame buffer line into sram_l buffer but if you don't have one, this will just waste bandwidth.

Then, you should comment lines 487-488 and 493-494 in uVGA.cpp.
Code:
487:                        if(((int)sram_l_dma_address) >= SRAM_U_START_ADDRESS)
488:                          return UVGA_FAIL_TO_ALLOCATE_SRAM_L_BUFFER_IN_SRAM_L;

Code:
493:                        if(((int)frame_buffer) >= SRAM_U_START_ADDRESS)
494:                           return UVGA_FRAME_BUFFER_FIRST_LINE_NOT_IN_SRAM_L;

This should allow the library to start however lines may be horizontally instable. Moreover, I cannot say if all (UVGA_DMA, line repeat) combinations will work. UVGA_DMA_SINGLE, line repeat=1 is the most simple/basic mode, it should never fail. UVGA_DMA_SINGLE, line repeat > 1 should also work without problem. UVGA_DMA_AUTO will probably have issue because the code expect to have at least the first frame buffer line in SRAM_L. I think the first line will be duplicated which increase the screen height delaying Vsync signal by 1 line and may disturb the monitor.

Another question: How much memory will be used with 452x300 ? I thought it is roughly 452x300 bytes.. seems to be more in 3x DMA mode ?

It will use 464*301 bytes for the frame buffer (464 is (452+1) rounded to the next multiple of 16 (+1 comes from the black pixel added at end of each line to power off the beam)). 301 is 300 + 1 sram_l line buffer.

The library also allocates a pointer array (300 * (uint8_t*)). A second array of the same size is allocated if UVGA_DMA_AUTO is used. Finally, the DMA requires TCD (32 bytes per TCD). The minimum is 4 TCDs in UVGA_DMA_SINGLE without line repeat. UVGA_DMA_SINGLE with 2 line repeat uses (3 + frame_buffer_height) TCD. UVGA_DMA_AUTO is a lot more hungry. With a 2 line repeat, it uses (3 + 1 + number of lines in SRAM_U*2) TCD. Image stability has a big cost :(

Assuming 452x300 is a 2 line repeat, in UVGA_DMA_SINGLE, it should use ~150 560 bytes. UVGA_DMA_AUTO will use ~161 392 if all frame buffer lines are in SRAM_U.


On a different matter, 3 minutes ago, I just received my latch (74HC573N). It is not the fast one (only 18ns) but it is in DIP20 (breadboard friendly :) ). I will be able to resume my flexbus test. If the latch is not fast enough, I also ordered faster one (3ns) but a bit harder to use (SO-20). I think I will solder everything on DIP adapter this afternoon and start my test tomorrow. The flexbus version may be a solution to your problem because flexbus will act like a 4 bytes cache, allowing DMA to read 4 bytes at a time instead of 1, reducing the number of DMA access and improving efficency
 
Hi,

I made a very dirty hack, have it static now:
Code:
#define WIDTH 452
#define HEIGHT 600
#define REPEAT_LINE 2
#define FB_ROW_STRIDE    ((WIDTH + 1 + 15) & 0xFFF0)
#define FB_HEIGHT        ((HEIGHT + REPEAT_LINE - 1) / REPEAT_LINE)
DMAMEM uint8_t _frame_buffer[FB_ROW_STRIDE * (FB_HEIGHT + 1) + 15];
DMAMEM uint8_t _fb_row_pointer[sizeof(uint8_t *) * FB_HEIGHT];
DMAMEM uint8_t _dma_row_pointer[sizeof(uint8_t *) * (FB_HEIGHT + REPEAT_LINE)];
...
(removed all "malloc")

This works with the demos, but not with my emu.. something crashes. I'll find the problem, but it will take some time... i don't know where and why this happens, at the moment.

Thank you very much for your help. It makes it easier to understand the code (which is pretty good!)
 
Nope.. it's an other problem!

solved... I see "**** COMMODORE 64 BASIC V2 **** 64K RAM SYSTEM 38911 BASIC BYTES FREE" and a blinking cursor ;)
Good quality!!
Tomorrow, I'll try to make a photo and post it.

I'm still unsure if the whole timing is OK and all works.. will take a week or more to rewrite & test all.

Good night :)
 
solved... I see "**** COMMODORE 64 BASIC V2 **** 64K RAM SYSTEM 38911 BASIC BYTES FREE" and a blinking cursor ;)
Good quality!!
Tomorrow, I'll try to make a photo a post it.

I'm still unsure if the whole timing is OK and all works.. will take a week or more to rewrite & test all.

Good night :)

Congratulations Frank! Great news to start.
 
Hi,

I made a very dirty hack, have it static now:
Code:
#define WIDTH 452
#define HEIGHT 600
#define REPEAT_LINE 2
#define FB_ROW_STRIDE    ((WIDTH + 1 + 15) & 0xFFF0)
#define FB_HEIGHT        ((HEIGHT + REPEAT_LINE - 1) / REPEAT_LINE)
DMAMEM uint8_t _frame_buffer[FB_ROW_STRIDE * (FB_HEIGHT + 1) + 15];
DMAMEM uint8_t _fb_row_pointer[sizeof(uint8_t *) * FB_HEIGHT];
DMAMEM uint8_t _dma_row_pointer[sizeof(uint8_t *) * (FB_HEIGHT + REPEAT_LINE)];
...
(removed all "malloc")

This works with the demos, but not with my emu.. something crashes. I'll find the problem, but it will take some time... i don't know where and why this happens, at the moment.

fb_row_pointer and dma_row_pointer should be uint8_t*. They don't need to be in DMA memory, they can be anywhere in SRAM. In debug mode, to you have a "DMA crashed" message ? This can happen in 2 ways. The frame buffer address is not a multiple of 16 but it should only occurs in UVGA_DMA_AUTO. It can also occurs if a DMA transfer starts in SRAM_L and end in SRAM_U.
 
solved... I see "**** COMMODORE 64 BASIC V2 **** 64K RAM SYSTEM 38911 BASIC BYTES FREE" and a blinking cursor ;)
Good quality!!
Tomorrow, I'll try to make a photo and post it.

I'm still unsure if the whole timing is OK and all works.. will take a week or more to rewrite & test all.

Good night :)

Great news. Did you have to modify something in uVGA ? I can find a way to include your changes.

Yesterday I soldered my components. TSOP-II-44 was surprisingly easy to solder, faster than SOIC-20 in fact. I just tested my flexbus code and I encountered a little problem. One of the signal I use is inverted and I currently have no NOT gates, should receive them in the next 5 days. I tried a quick hack using a 2N2222 but the signal frequency seems too high.
 
Great news. Did you have to modify something in uVGA ? I can find a way to include your changes.

I had to change only one thing:
- My "hack" with the arrays above (and it requires a modeline "240Mhz 452x300" + I've modified .horizontal_position_shift = 14+23+12)

Can you add a "begin" that accepts pointers to the three buffers - instead of allocating them ?

Then, - and I did not see this yesterday, because the display is too far away from me -, there is some pixel-flickering. The picture itself is rock stable.
I can send you a pre-compiled hexfile if you want to see it. Or the full source-code, but it needs some fiddling to compile it.

Yesterday I soldered my components. TSOP-II-44 was surprisingly easy to solder, faster than SOIC-20 in fact. I just tested my flexbus code and I encountered a little problem. One of the signal I use is inverted and I currently have no NOT gates, should receive them in the next 5 days. I tried a quick hack using a 2N2222 but the signal frequency seems too high.

I'm very curious ;)
 
Last edited:
In debug mode, to you have a "DMA crashed" message ?
Nope, everything ok, it was my code that caused the problems :) All good (and bug fixed) :)

I've attached the hex-file if you want to take a look.
(If you want to try a bit, you can use the Arduino Serial Monitor to send commands - And pls note, it is a very untested and preliminary "alpha" version)
 

Attachments

  • c64.ino.zip
    151.3 KB · Views: 168
I am curious too about adding the SRAM - at least to see if Frank can use it :)

I soldered a VGA connector to a PCB - but too late in the day to organize the resistors and solder them to try Frank's posted code. I cut down the middle row of offset pins on the VGA connector and soldered a 30ga wire across those GND pins to pin #5 and clipped and skipped the open #9 pin.

I got DigiKey 1% resistors 392, 511, 825, 1K, 2K - but forgot the 82 Ohm for the SYNC lines - hoping 10% stock 100 Ohm will work?

@qix67 - how touchy are the SYNC lines? Will 100 Ohm work? I could do Parallel (100,330) for 76.7 Ohm if needed or even parallel { 220,220,330 } to get 82.5?
 
I am curious too about adding the SRAM - at least to see if Frank can use it :)

I am working on 2 solutions. A 16 bit SRAM connected with a 8bit bus width, but I am waiting for my NOT gate to do so. The second solution is 16 bit SRAM connected with 16 bit bus width. It should be a lot faster, however due to teensy not wiring some signals, only aligned 16bit access should be possible (requires 1 TSOP2-44 (SRAM) + 2 SOIC-20 (2*8 bit latch)).

My main problem is the number of wires to connect. Just to use 64KB SRAM, it requires nearly 30 wires, breadboard is just a big mess :mad:

I soldered a VGA connector to a PCB - but too late in the day to organize the resistors and solder them to try Frank's posted code. I cut down the middle row of offset pins on the VGA connector and soldered a 30ga wire across those GND pins to pin #5 and clipped and skipped the open #9 pin.

I got DigiKey 1% resistors 392, 511, 825, 1K, 2K - but forgot the 82 Ohm for the SYNC lines - hoping 10% stock 100 Ohm will work?

@qix67 - how touchy are the SYNC lines? Will 100 Ohm work? I could do Parallel (100,330) for 76.7 Ohm if needed or even parallel { 220,220,330 } to get 82.5?

It will probably work, it will depend on your monitor. Going below 82E is probably a bad idea because it will generate too much volt on the monitor pin but 100E will probably get accepted by monitor.
 
Back
Top