Next steps on NTSC output from Teensy 4.0

Status
Not open for further replies.

hypothete

New member
I'm looking for advice on using a Teensy 4.0 to generate NTSC composite video from a buffer. By attaching an R2R resistor ladder to pins 14-17, I'm currently able to generate an image 52px wide with 15 shades of gray. You can see my source code and some pictures here: https://github.com/hypothete/teensytv

I'm able to get 1us timing by reading elapsedMicros in a while loop, but that limits my horizontal resolution to 52px because each row drawn in NTSC holds 52us of image data (63.5us overall). My goal is to get 320px horizontal resolution. For that, I need to accomplish a few things:

  • Pack 320 values plus start/end pulses into a signal that lasts 63.5us for each horizontal line
  • Output that signal to 4 pins as efficiently as possible, probably not using digitalWriteFast()
  • Long term, I'd like to pass updates at >3.58MHz so I can color-encode pixels, but improving the resolution takes precedence

Here's what I think I could use to accomplish these tasks, but I could use a gut check on my assumptions. I'm fairly new to MCU programming, so I'm still shaky on some of these concepts.

  • A DMA channel seems like the right tool to transfer data from my buffer to the various pins
  • It looks like most people use PDB to trigger the DMA transfer for sub-microsecond timings
  • The Teensy 4.0 does not have GPIO "ports" in the same way that older models do, so I need to use something else for my DMA destination if I want to write to multiple pins at once. It seems like this is one use of FlexIO, but the reference manual is kind of vague about it.
  • The Teensy 4.0 also does not have a DAC, so I'll need to stick with the resistor ladder for now. I do own the Rev D Audio Shield, but I am not sure if it would be a better substitute.

Am I on the right track here? Any suggestions for next steps?

Thanks,
Duncan
 
For T_3.6 there was VGA driving code - thread and github - that powered a couple of resistors to feed the VGA color pins. Not sure if that has any reference value - or could be updated for T_4?

There are thread notes on 'port like' group write/read of pins on T_4.

This thread : Teensy-3-6-VGA-driver
 
Hi @defragster, thanks for the reply & link, the VGA library looks like some good reading on DMA usage. I wasn't able to spot what you were talking about re: port-like writes for T4 on the thread, are there terms or other discussions I could check out on it?
 
Not that thread/topic for T4 port group writes - VGA not updated for T4 AFAIK.

Other threads have notes on port like GPIO on T4 - would have to search.
 
Yes, this might be a good use for FlexIO. Unlike the GPIO ports which can only output 32 bit writes, FlexIO allows you to output parallel data to 4 pins (or 8, 16, or 32) and have it clocked out at a precise frequency. FlexIO2 pins 0-3 are available on T4.

The way FlexIO works is that you transfer your data to a series of buffered shift registers (up to 128 bits wide) and then it automatically shifts out to your output pins 4 bits at a time. You can load the buffers with DMA or just with normal register accesses.

I'm working on some code using FlexIO to drive a LED panel which I hope to share soon. It uses DMA to transfer to the shift buffers and outputs 16bit parallel at 24 MHz (42 ns data pulse width). With 4bit output you could probably go faster than that.
 
@easone Thanks for the notes on FlexIO - I'm reading up on it more & have moved the output pins to FlexIO port 1 /GPIO9 in preparation. Is there anything else you'd recommend as a reference for a barebones setup before porting my project over? So far I'm just going through forum posts and trying to understand the reference manual section. Thanks!
 
Here's some example code... this shifts out data in parallel to pins 10-13 every 25 ns. It requires KurtE's FlexIO_t4 library (https://github.com/KurtE/FlexIO_t4). (Note that there's a typo in that library that needs to be fixed before this will compile: In FlexIO_t4.cpp, the initialization for "flex2_hardware" needs an ampersand before "IRQHandler_FlexIO2" to match the one for "flex1_hardware".)

Code:
// Transfer data from dataBuffer to FlexIO
// FlexIO hardware shifts out 4 bits at a time 

#include <Arduino.h>
#include "FlexIO_t4.h"
#include "DMAChannel.h"

FlexIOHandler *pFlex;
IMXRT_FLEXIO_t *p;
const FlexIOHandler::FLEXIO_Hardware_t *hw;
DMAChannel myDMA;
volatile uint32_t DMAMEM dataBuffer[2] __attribute__((aligned(32)));

void setup() {
  Serial.begin(115200);
  delay(1000);
  Serial.println("Start setup");
  delay(1000);

  /* Get a FlexIO channel */
  pFlex = FlexIOHandler::flexIOHandler_list[1]; // use FlexIO2

  /* Pointer to the port structure in the FlexIO channel */
  p = &pFlex->port();
  
  /* Pointer to the hardware structure in the FlexIO channel */
  hw = &pFlex->hardware();

  /* Basic pin setup */
  pinMode(10, OUTPUT); // FlexIO2:0
  pinMode(12, OUTPUT); // FlexIO2:1
  pinMode(11, OUTPUT); // FlexIO2:2
  pinMode(13, OUTPUT); // FlexIO2:3
  
  /* High speed and drive strength configuration */
  *(portControlRegister(10)) = 0xFF; 
  *(portControlRegister(12)) = 0xFF;
  *(portControlRegister(11)) = 0xFF;
  *(portControlRegister(13)) = 0xFF;

  /* Set clock */
  pFlex->setClockSettings(3, 0, 0); // 480 MHz

  /* Set up pin mux */
  pFlex->setIOPinToFlexMode(10);
  pFlex->setIOPinToFlexMode(12);
  pFlex->setIOPinToFlexMode(11);
  pFlex->setIOPinToFlexMode(13);

  /* Enable the clock */
  hw->clock_gate_register |= hw->clock_gate_mask;
  
  /* Enable the FlexIO with fast access */
  p->CTRL = FLEXIO_CTRL_FLEXEN | FLEXIO_CTRL_FASTACC;

  /* Shifter 0 registers */ 
  #define S0_PWIDTH 3 // 4-bit parallel shift width
  #define S0_INSRC 1 // Input source from Shifter 1
  #define S0_SSTOP 0 // Stop bit disabled
  #define S0_SSTART 0 // Start bit disabled, transmitter loads data on enable 
  #define S0_TIMSEL 0 // Use timer 0
  #define S0_TIMPOL 1 // Shift on negedge of clock 
  #define S0_PINCFG 3 // Shifter pin output
  #define S0_PINSEL 0 // Select pins FXIO_D0 through FXIO_D3
  #define S0_PINPOL 0 // Shifter pin active high polarity
  #define S0_SMOD 2 // Shifter transmit mode
  p->SHIFTCFG[0] = (S0_PWIDTH<<16) | (S0_INSRC<<8) | (S0_SSTOP<<4) | (S0_SSTART<<0);
  p->SHIFTCTL[0] = (S0_TIMSEL<<24) | (S0_TIMPOL<<23) | (S0_PINCFG<<16) | (S0_PINSEL<<8) | (S0_PINPOL<<7) | (S0_SMOD<<0);

  /* Timer 0 registers */ 
  #define T0_TIMOUT 1 // Timer output is logic zero when enabled and is not affected by the Timer reset
  #define T0_TIMDEC 0 // Timer decrements on FlexIO clock, shift clock equals timer output
  #define T0_TIMRST 0 // Timer never reset
  #define T0_TIMDIS 2 // Timer disabled on Timer compare
  #define T0_TIMENA 2 // Timer enabled on Trigger high
  #define T0_TSTOP 0 // Stop bit disabled
  #define T0_TSTART 0 // Start bit disabled
  #define T0_TRGSEL (4*0+1) // Trigger select Shifter 0 status flag
  #define T0_TRGPOL 1 // Trigger active low
  #define T0_TRGSRC 1 // Internal trigger selected
  #define T0_PINCFG 0 // Timer pin output disabled
  #define T0_PINSEL 0 // Select pin FXIO_D0
  #define T0_PINPOL 0 // Timer pin polarity active high
  #define T0_TIMOD 1 // Dual 8-bit counters baud mode
  p->TIMCFG[0] = (T0_TIMOUT<<24) | (T0_TIMDEC<<20) | (T0_TIMRST<<16) | (T0_TIMDIS<<12) | (T0_TIMENA<<8) | (T0_TSTOP<<4) | (T0_TSTART<<1);
  p->TIMCTL[0] = (T0_TRGSEL<<24) | (T0_TRGPOL<<23) | (T0_TRGSRC<<22) | (T0_PINCFG<<16) | (T0_PINSEL<<8) | (T0_PINPOL<<7) | (T0_TIMOD<<0);

  #define CLOCK_DIVIDER 6 // Shift clock frequency is 6 times slower than FlexIO clock = 80 MHz (12.5 ns width from positive to negative edge, 25 ns between shifts)
  #define SHIFTS_PER_TRANSFER 8 // Shift out 8 times with every transfer = one 32-bit word = contents of Shifter 0
  p->TIMCMP[0] = ((SHIFTS_PER_TRANSFER*2-1)<<8) | ((CLOCK_DIVIDER-1)<<0);

  Serial.println("FlexIO setup complete");

  Serial.println("Start DMA setup");

  /* Enable DMA trigger on Shifter 0 */
  p->SHIFTSDEN |= (1<<0); // DMA request is generated when data is transferred from buffer0 to shifter0

  /* Disable DMA channel so it doesn't start transferring yet */
  myDMA.disable();

  /* Configure DMA to transfer 32-bit words from dataBuffer to Shifter 0 buffer */
  unsigned int transferBytes = sizeof(dataBuffer);
  myDMA.sourceBuffer(&(dataBuffer[0]), transferBytes); // transfer entire dataBuffer
  myDMA.destination(p->SHIFTBUF[0]);

  /* Set DMA channel to automatically disable on completion (we have to manually enable it every loop) */
  myDMA.disableOnCompletion();

  /* Set up DMA channel to use Shifter 0 trigger */
  myDMA.triggerAtHardwareEvent(hw->shifters_dma_channel[0]);

  Serial.println("DMA setup complete");

  /* initialize data buffer */
  dataBuffer[0] = (0<<0) | (1<<4) | (2<<8) | (3<<12) | (4<<16) | (5<<20) | (6<<24) | (7<<28);
  dataBuffer[1] = (8<<0) | (9<<4) | (10<<8) | (11<<12) | (12<<16) | (13<<20) | (14<<24) | (15<<28);

  /* flush DMAMEM cache */
  arm_dcache_flush((void*)dataBuffer, sizeof(dataBuffer));
}

void loop() {
  myDMA.enable();
  // DMA transfer and FlexIO shifts happen in background, no CPU involvement
  
  delay(4000);
}

Here's an oscilloscope screenshot with channels 1,2,3,4 hooked to pins 10,12,11,13:
oscope_output.jpg
 
I'm currently working on Bit Banging a VGA display using a T4, defragster has already helped me out with the timing to get nanosecond accuracy... But while digitalWriteFast() seems to operate within a few cycles when setting the state of a single pin... But when setting multiple pins, it can take many 10s of cycles. So I'm going to need to use Port writing.
 

Many thanks for this, after much reading it seems I have been rather optimistic as to driving a VGA display with 8bits per pixel, directly from the GPIO... Perhaps the larger form factor T4 might offer more options. Maybe 4bits per pixel would be doable, but that's not really where I want to go.

I have had much success using PWM with a Low pass filter to generate audio output, is there any resource for the PWM hardware for the Teensy 4.0?
 
Looks like the T_3.6 VGA used 8 bits? Never looked into the code. I suppose the problem may be having the 8 bits on a port?

That first linked read example pulls from multiple ports and combines IIRC - doing the inverse with the write might put the output bits in parallel on the T4.0? The 4.1 pin thread does suggest hope toward what a T_3.6 could do with more pins.

@mjs513 bought a VGA adapter - it wasn't cheap - not sure what came of that?
 
Looks like the T_3.6 VGA used 8 bits? Never looked into the code. I suppose the problem may be having the 8 bits on a port?

I phrased my reply poorly, I mean 24bits per pixel, 8bits per colour channel... the T3.6 VGA driver uses 2/3/3 bits per colour channel scheme, which would be far too limited for my project.

That first linked read example pulls from multiple ports and combines IIRC - doing the inverse with the write might put the output bits in parallel on the T4.0? The 4.1 pin thread does suggest hope toward what a T_3.6 could do with more pins.

@mjs513 bought a VGA adapter - it wasn't cheap - not sure what came of that?

I'll probably get a T4.1 anyway (I have a T3.6 here I haven't used yet)... it might be a better option.
 
Status
Not open for further replies.
Back
Top