Teensy 3.6 VGA driver

qix67

Well-known member
Hello,

I have just released a VGA library running without any CPU help, 100% hardware. It can be downloaded here: https://github.com/qix67/uVGA

The library works with pixels in RGB332 format (=256 differents colors).

Image resolution depends of CPU frequency. When running @24Mhz (yes, 24Mhz), image resolution is 200x240. At 240Mhz, image resolution is 703x300. Video frequencies are chosen to best fit the image resolution. 703x300 fits inside 800x600@60 video settings and looks like a full screen image. 200x240 fits inside 640x480@60.

Various graphical primitives are provided, most of them share the same API as DueVGA project.

The library was developped on teensy 3.6 but will probably work on teensy 3.5 without any change, perhaps you will have to define additionnal modeline if the CPU cannot run at the same speed. It also works on teensy 3.2 but due to memory size, resolution will be rather low.

Enjoy and fill free to comment and contribute.
 
Last edited:
Hello,

I have just released a VGA library running without any CPU help, 100% hardware. It can be downloaded here: https://github.com/qix67/uVGA

The library works with pixels in RGB332 format (=256 differents colors).

Image resolution depends of CPU frequency. When running @24Mhz (yes, 24Mhz), image resolution is 200x240. At 240Mhz, image resolution is 703x300. Video frequencies are chosen to best fit the image resolution. 703x300 fits inside 800x600@60 video settings and looks like a full screen image. 200x240 fits inside 640x480@60.

Various graphical primitives are provided, most of them share the same API as DueVGA project.

The library was developped on teensy 3.6 but will probably work on teensy 3.5 without any change, perhaps you will have to define additionnal modeline if the CPU cannot run at the same speed. It also works on teensy 3.2 but due to memory size, resolution will be rather low.

Enjoy and fill free to comment and contribute.

WOW, Great.
I ordered a few SAA7121H from China a few days ago, since i planned to use it for a TV PAL/NTSC output on the Teensy.
Do you think, it is possible to modfiy your lib to output PAL (or NTSC) RGB Signals ? That would be much better than the chip..(which is out of production, already and would need complicated CCIR 656)
 
Have you tried to use the audio-lib together with your VGA ? The audio uses DMA, too. I'd need the DAC output :)
 
In fact, my initial goal was to generate a PAL signal however after understanding how PAL/NTSC works, I conclude it is too complex to do this and it does not seem to be possible 100% with hardware without external component.

I take a look to DueVGA code (the code has no comment and is very hard to read). To change color in VGA, you need to generate a signal between 0 and 1V which can be done using resistor ladder. In PAL/NTSC, you must generate a signal between 0.3V and 1V to generate the luminosity and mix it with another signal (PAL or NTSC color frequency) which is phase shifted to change the hue. In fact, I wonder how someone can even have such an idea but it was during analog era :)

I think I should be possible to generate VGA signal and feed it to an AD723. It "just" requires a modeline compatible with AD723. Unfortunately, I have not such a component (but I plan to) and the monitor I use to perform my test does not accept PAL or NTSC modeline, these modes use a pixel clock too low. But I don't desesperate, my very first version of my library (never released) was a lot more complex and has several problems and I solved all of them reading kinetis reference manual (2200 pages) in 3 weeks.
 
Last edited:
No.

I had no problem when having heavy CPU or memory load. However, accessing DMA may generate video artifacts. The library itself can work with any DMA channel between 0 and 15, it will modify priority of the video DMA channel to remain the fastest one.

This morning, I tried a first test where I draw horizontal and vertical lines using DMA and it clearly does not work as expected (comment the line #define NO_DMA_GFX in uVGA_gfx.cpp to see the result). Vertical lines seems to work rather well however horizontal lines which are easier to draw generate a huge amount of access on SRAM without pause. While drawing this kind of lines, the video DMA fails to obtain access fast enough and the monitor lost sync. Moreover, when the sync comes back, the 2nd DMA channel has missed several starts which partially scrolls the screen (lines in SRAM_U are shifted one or more line to the bottom of the screen), it is very weird.

Now, It is a matter of size (always :) ). The most demanding video mode (703x300) copies 703*600*60 bytes per second, 24.13MB/s. At this CPU frequency and in the best case (burst copy), I have seen somewhere the bandwidth is more than 200MB/s. Even if the video generation is clearly not the best case, the 2nd DMA channel works in the best combination possible and the 3rd DMA channel has a null cost.

Assuming a 44.1KHz stereo 16 bits, one second requires ~168KB/s. In 800x600@60Khz, line frequency is 37.87KHz which means the DMA will have to copy approximately 4.6 bytes per line. I think the teensy should be able to support this, a such small DMA access should "probably" not be worst CPU drawing figure or scrolling screen
 
Ok, cool :)

there are cheap VGA -> TV or VGA->HDMI converters available, around $10 from china. It's not worth to build one :)
I want to try it with my C64 Emulation - may take some time because I have to do some heavy code-changes.
Would be really cool if it works :)

Edit: I need only 16 colors.
 
Last edited:
perhaps we can see 1080P videos from frank when the T4 is out ;)

perhaps since it does support video on T3, T4 should have an HDMI prop shield ;)
 
perhaps we can see 1080P videos from frank when the T4 is out ;)

perhaps since it does support video on T3, T4 should have an HDMI prop shield ;)

The main problem, even on teensy 3.6, is the memory limit.

1080p is possible but not 1920x1080, something like 242x1080 will use nearly all memory...

However, this resolution is more than 1/4 of one side of 1080p 3D in side by side mode :cool:
 
I am trying to improve uVGA library using flexbus. Flexbus has several advantages over GPIO solution. It is possible to have variable wait-state (0-63) which allow more precise pixel width. Moreover, the DMA can be used more efficiently using 16 or 32 bit read from memory and 16-32 bit write (flexbus will perform conversion from 32 bit to 8 bit write itself).

I have a nearly working code however I have a little problem, MK66 from teensy 3.6 cannot use flexbus in non-multiplexed mode (someone on nxp.com tolds me some signals are not available on the package itself) therefore, I am stuck in multiplexed mode.

My first minor problem is it divides bandwidth by 2 but with DMA speed improvement, I had to add wait state to avoid too small pixels :)

My main problem is the multiplexed mode itself because I received address and data on the same lines. This creates a more or less black-blue column between each pixel.

Fortunately, a signal (pin14, FB_CS0) goes LOW when data is available and HIGH when address is on the bus.

Now, my very big problem is my somewhat limited knowledge in electronic. To filter the bus and remove address, I think it should be possible to use a 8 bit latch connected to flexbus data/address lines and either a logic inverter or a NPN transister to connect pin 14 to latch enable input.

Can someone confirm me this is correct (I assume it is ;) ) and tell me component references (DIP please, I suck at SMD soldering :( ), are SN74HC373 and 2N2222 fast enough ?. Everything will send and receive 3.3v signals and should work at F_BUS speed (max 60MHz on teensy 3.6)

Thanks.
 
Last edited:
Can anyone post a photo of it working? Then Robin could post on the blog about it. :)

Ping me when the code is fairly stable/mature and you'd like to see this included with the Teensyduino installer.
 
Last edited:
Here is the output produced by a teensy 3.6@240MHz in the highest resolution (800x600 with a framebuffer of 703x300), roughly 16KB of RAM remains free with this settings.
uVGA_703x300_GPIO.jpg

And here is the same using flexbus code (not yet on github), 4 wait-state, burst mode (reduce the number of times the address appears). The pixel clock goes faster, that's why the image is narrower. Black lines occur when address appears on the flexbus data line due to line multiplexing. Used flexbus pins are 21,20, 6,8,7, 37,65,35 (bit 0 => bit7 of the byte). Flexbus CSCR is set with a data path of 8 bits, BLS=1, AA=1. CSAR = 0x6000_0000, CSMR = 1. This is more or less similar to the LCD example of AN4393 (using FlexBus Interface for Kinetis Microcontroller)

uVGA_703x300_FLEXBUS.jpg
 
I've ordered two VGA 15-pin connectors and some 1% resistors. I'm very curious ;) and want to try it next weekend. I'm still looking where I can order Mini VGA connectors and adapters. Any hints ? :)

For my project, the best resolution is about ~400 x 284 visible pixels at around ~50 Hz (exactly 50.125 Hz) for "C64-PAL" - and a memory usage < 150KB - do you think this is possible ? A black border on the left and/or right would be ok.

Edit: Is a "letterbox" format with ~400 pixels x-resolution and black borders for a 16:9 display possible ?
Edit: I'm using 240MHz F_CPU.
 
Last edited:
I've ordered two VGA 15-pin connectors and some 1% resistors. I'm very curious ;) and want to try it next weekend. I'm still looking where I can order Mini VGA connectors and adapters. Any hints ? :)

Not at all, I did not know it exists :)

For my project, the best resolution is about ~400 x 284 visible pixels at around ~50 Hz (exactly 50.125 Hz) for "C64-PAL" - and a memory usage < 150KB - do you think this is possible ? A black border on the left and/or right would be ok.

Edit: Is a "letterbox" format with ~400 pixels x-resolution and black borders for a 16:9 display possible ?
Edit: I'm using 240MHz F_CPU.

At this frequency, the stable resolution I have is 452*300 @ 60Hz. I will try to found something @50Hz but I'm working on the next version of the library. If everything goes at expected, the library should be able to produce something close to what you want with 2 additional cheap components. I ordered them 5 days ago, they should arrive soon.

I am currently very busy and unfortunately, the end of my holidays is close ('til august 31st). I currently have 3 projects using teensy 3.6 but most of them are stopped due to the lack of components (they are on their way). If everything goes as expected, with very very few additional cheap & common components, uVGA library will be to do something far better than the current version. Depending on my ability to solder components (TSOP-II-44), I hope to produce something a lot better and when I say a lot better, it is really really a lot better :D. If everything goes as I expect, uVga library will do something beyond possible :cool:
 
The TSOP-II-44, is it a RAM Chip ?

Yes, a 512KB SRAM, IS61LV25616AL-10TL. If everything goes as expected, I plan to switch to an even bigger one after (1MB)

But before this, I am waiting the latch I ordered to be able to generate a more accurate image using Flexbus. I ordered an 8 bit latch (74HC573) which is very cheap and should not be too hard to solder (soic20 package). It is a ~20ns latch, at 240MHz, flexbus runs at 60MHz (16.66ns) and I have at least 2 wait-state in my current code. I also ordered several 74LVC573AD (3.21€ for 20pcs). They are a lot faster (~1ns), just in case.

During my tests, I found several interesting things when sending image over flexbus:
  • Instead of reading image pixel (byte) per byte, it is possible to read them 2 or 4 at once. DMA will read 4 bytes at once and write 4 at once, SRAM access are more efficient. The flexbus controller will automatically split the 32 bits data into 4 bytes, acting like a little cache.
  • Another advantage is due to Flexbus being a different slave on the crossbar, moreover, its port is not used by anything else. This means no delay should occur. As nothing uses flexbus (it is powered off), its usage is at no cost :)
  • Because data are sent via flexbus and not GPIO port (D currently), this means GPIO port access should not suffer from any delay. Moreover, all pins of D port are not reserved anymore, only 8 pins (21,20,6,8,7,37,36,35) + the 2 sync pins + Flexbus FB_CS0 (pin 14).
 
Is a "letterbox" format with ~400 pixels x-resolution and black borders for a 16:9 display possible ?

Yes, no, well, not like you expect it :)

When the sync frequencies are 800x600@60, if you use the 703x300 frame buffer setting, it appears as a full image because each line is duplicated but if line repetition is disabled, visually, you obtain something like 800x300 (2.66 aspect, nearly 24:9). However, unlike normal letter box image where black lines are above and below image, here, all lines are after the image. It is easy to modify. Instead of having 1 DMA TCD to generate all black lines after the image, I just have to add another one before the image and split the number of black lines evenly between them. It only costs 1 TCD (32 octets).

Black borders on left and right of image are "automatically" generated. To increase the size of the left border, increase the value of horizontal_position_shift in the modeline settings. The right border is created when the DMA stop. The only ways to modify it is either by using UVGA_HSTRETCH_NORMAL, UVGA_HSTRETCH_WIDE, UVGA_HSTRETCH_ULTRA_WIDE (try to modify a setting using ULTRA_WIDE and replace it by NORMAL, you will have a very large black border). The other way is to reduce the number of pixels per line. It is probably possible to modify hsync_start and hsync_stop but I am not sure all monitor will support these custom settings.

Beware, dispite having valid sync frequencies (black image is properly recognized by the monitor), it is possible to create combination of pixel_h_stretch & hres not valid for the monitor. My monitor tolerates several pixels "over-scan" on the right of the image but if there is too many, RGB signals will not be 0V during HSYNC. The monitor should "probably" not explode ;) however, it will totally and surely loose the video sync and display nothing (at least on my TFT, a CRT monitor will probably display very weird things).
 
Wow.. I made a quick'n dirty board - your VGA output is AMAZING.
Congratulations - it is really really good. This has the potential to become the new standard display for Teensy.

I'm very impressed.

The only thing is, that my display complains about a wrong resolution every minute and displays an annoying blue window with a warning message - it likes higher resolutions more. But that's a problem of this particular display, I think. It's a large computer display.

Good work,
Frank.

Edit: I found a way to disable the message in the menu.
 
Last edited:
The only thing is, that my display complains about a wrong resolution every minute and displays an annoying blue window with a warning message - it likes higher resolutions more. But that's a problem of this particular display, I think. It's a large computer display.

I have a similar problem when I tried anything at 50Hz. Sometimes, when playing with different resolutions, I even had to force my monitor to recalibrate itself. It also is one of the reason I tried to use standard modeline.

Soon, a friend of mine will give me an old CRT monitor. I hope to be able to produce image using frequencies suitable for VGA=>PAL/NTSC adapter.
 
Could CPU help with RLE decompression/sprites/character map? Just allow the user to define a function which generates that 640 pixels every h-retrace and returns, while your library changes these into signals. No more problems with frame size, and sprites!
 
Could CPU help with RLE decompression/sprites/character map? Just allow the user to define a function which generates that 640 pixels every h-retrace and returns, while your library changes these into signals. No more problems with frame size, and sprites!

Technically, it is already possible. However, due to DMA TCD optimizations, it is not available everywhere (search DMA_TCD_CSR_INTMAJOR in uVGA_DMA_RGB332.cpp).

IMHO, this replaces the frame size problem by a CPU load + code complexity problem. I don't think the teensy has enough power to build such a line during the hsync. It is possible to use a line double buffering to have more time. Another problem is if the screen is filled with a full screen rectangle. In this case, you gain nothing because the rectangle uses as much memory as the full size frame buffer. A solution whould be to use vector graphics but here again, the CPU load will skyrocket.

As far as I remember, what you describe is the way the old Atari Jaguar GPU works.

Another full hardware solution would be to build one set of DMA TCD per line instead of just one TCD per line. With all these TCD, the DMA itself will copy the correct line of each sprite at the correct moment in the single line buffer. I don't say it is impossible to do but I don't know if the DMA will be fast enough to do this, especially when another DMA channel is displaying a line. I made test to draw GFX primitives using DMA and drawing a long enough horizontal line disrupt image displaying. In such a case, it should even be possible to work without the line buffer. The DMA will copy directly the sprite pixels to GPIO port. Technically, it is the most efficient solution because the DMA does not work on empty line. However, If big sprites are to be displayed, you will have to manage sprite overlapping. Building TCDs will be a real hell and CPU load will go high as soon as anything moves on the screen.

I think it is a valid solution to display for example 10-12 side-by-side icons on a screen. In such a case, each icon will have to use 1 DMA channel (32 available on teensy 3.6) and only 1 TCD, this means the TCD is directly stored in the DMA controller thus cost nothing (TCD in RAM uses 32 bytes).
 
Hi qix67, at 180Mhz, you would have 180,000,000/(320*240)/60 = 39 cycles for each single pixel for a 320x240 frame. Not a lot, but perhaps doable for a simple RLE decompression, with >50% CPU still free. Then, you might have optimisations like an interrupt only every 8 lines which fills a 320x8 buffer to save on interrupts.

A full screen rectangle would not use as much as a full frame buffer, but rather not more that a few hundred bytes: https://en.wikipedia.org/wiki/Run-length_encoding.
 
Last edited:
Back
Top