T4 Pixel Pipeline Library

vjmuzik

Well-known member
I started working on adding support for the Pixel Pipeline built in to the T4.
So far I have the overlay and output stages working correctly, rotate/flip in the output stage also works.
I have tested color key and alpha for the overlay stage so I can confirm that it's working.

As for the input stage I can't figure out exactly why it's not working as it should be.
It draws it in the correct place with the correct background color, but its only drawing black where the input buffer should be.
I haven't tried any color space conversions yet since the input stage isn't fully working.

You will need to update your Teensy core from here: https://github.com/PaulStoffregen/cores
The PXP library can be found here: https://github.com/vjmuzik/T4_PXP
An example for a ST7789 display is included in this library here: https://github.com/vjmuzik/Adafruit_GFX_Buffer

If you use the example above you can see a black square inside of a blue square.
The black square should be a copy of the green square that is next to it.

I'd like to get a second set of eyes to look at this to find where the issue occurs.
 
I just started putting together some pixel pipeline code today--aimed at comparing its rotate function with a software version in terms of speed and complexity. I haven't even started on the overlay stuff, so you're way ahead of me. I'll look over your code and see what I can do to help. I started with the Kinetis SDK examples, but that code goes so far toward hardware abstraction that it makes my head spin to read it. I've been working close to the metal for so many years that multiple layers of #defines and other abstractions that force me to drill through layers of header files to find the bits corresponding to the register bits defined in the reference manual gets discouraging. (WOW! was that ever a run-on-sentence!)
 
That is very cool ! I did not know about the PXP pipeline. How efficient is the flip/rotation of a buffer ?
 
I haven’t done too big of a benchmark so I can’t say for certain. I can say that adding a rotation or flip did not increase the number of cycles it took for the PXP to process. So if you are already using the PXP it’s really efficient. I’m going to be using it for it’s overlay alpha blending and for drawing bitmaps to a frame buffer. Of course this all depends on getting the input stage working, otherwise the overlay is kind of useless.
 
I did a simple test to measure the time needed to rotate a QVGA image by 90 degrees using either a programmed rotation or a PXP rotation. Here are the results:
Code:
Rotation times as a function of memory source and destination

Src-->dest		PGM (uSec)	PXP (uSec)
--------------------------------------------------------
DTCM   -->  DTCM	387		578
DTCM   -->  DMAMEM	721		562
DTCM   -->  EXTMEM	5997		6487
DMAMEM -->  DTCM	547		561		
DMAMEM -->  DMAMEM	701		583
DMAMEM -->  EXTMEM	8681		6041
EXTMEM -->  DTCM	5567		12745		
EXTMEM -->  DMAMEM	5589		12751
EXTMEM -->  EXTMEM	28630		19204
The results seem to be mixed--sometimes the programmitic rotate is faster, sometimes the PXP is faster. The major difference is that the CPU is free for other tasks while the PXP does the rotation in the background. Some other things that are interesting:

* The PXP is slower than a programmed rotation for EXTMEM to DMAMEM and EXTMEM to DTCM. Apparently the PXP has issue with cache misses when EXTMEM is the source.
* When either source or destination is in EXTMEM, the rotate times are about 8 times higher for both rotation methods--a reflection of the overall slower access to EXTMEM.

The program has a lot of issues that I need to address before posting it. Amongst the most annoying is that my optimized algorithm for minimizing cache misses when EXTMEM is the destination result in an inverted display image. I used vjmuzik's library as an example, but simplified my code to use direct calls as opposed to his library functions. My main issue with the library is the degree to which the PXP_Next code is entwined with the other function. I haven't yet wrapped my head around the advantages and requirements of the PXP_Next functionality so I opted to go with simpler implementation.
 
Based on the manual the only advantage of using the pxp_next is that you can queue up another command while one is already running so as soon as the previous one finishes it'll start the next one. Though if there isn't any already running then it just immediately runs it, it works by copying the information from the address you write to it into all of the pxp registers, besides a couple that are noted in the manual as not being copied over. Aside from the queue functionality it's theoretically the same as directly writing the registers just without being able to use the SET, TOGGLE, and CLEAR registers.
 
There seems to be a bunch of mental overhead in the updating of the next_PXP structure for what I see to be slight gain. I see it as necessary only if you are going to do two different things in succession that require major changes to the PXP setup. Then you could set up two (or more) next_PXP structures and switch between the image processing functions by alternating the structure pointers. If you're just doing one thing, it hardly seems worth the trouble. One thing that might make it worthwhile is to have a number of next_PXP structures saved in flash or on an SD card. You could then read them into memory and have a very short initialization routine that would only have to set up the clock and point to the saved structure.

It also seems that there may be a simpler way to set up a next_PXP structure:
Code:
//    WARNING!!!!   Untested code!!!!
// function to set next_PXP structure from PXP registers after they are set up
void Set_next_PXP(struct NEXT_PXP_t *nextptr);
uint32_t   * srcptr, *nextptr;
uint16_t i;

    srcptr = &PXP_CTRL;
    dstptr = nextptr;


    for(i= 0; i<48; i++){
       *destptr++ = *srcptr;
       srcptr += 4;  // PXP saved registers are 16 bytes apart--I hope!
    }

}
Once you have a tested PXP setup, you could call the function to fill the next_PXP structure. You could then save it to memory or call a function that would print out the structure contents in a form that could easily be added to source code.
 
For my use case I’m going to be making multiple calls in succession so I preemptively added the support in for it. Most of the registers are 16 bytes apart except for the last 3, to me it also doesn’t make sense to manually copy it over in a for loop when the PXP hardware will copy it by itself by just writing to the NEXT register. Though I do see the appeal of having NEXT_PXP structures already precomputed as opposed to setting them up on the fly. But, again for my use case everything is going to be done dynamically so my thought process was more focused on that when I wrote it. It’s hard to explain the idea I have in my head of how I think I want it to work. It’s also necessary to have the cache flushes for your buffers if they are anywhere besides RAM1 before running a PXP operation since it is DMA.
 
I also have a couple of projects that will benefit from dynamic changes in the PXP operation with the NEXT_PXP capabilities. For example:

1. Set up the OV7670 camera to produce a VGA image in the YUV422 format.
2. Option 1: Pass the YUV422 buffered image to a function to convert it to JPEG for compression.
Option 2: Pass the YUV422 buffered image to the PXP and convert it to RGB888 for storage in a PC-compatible bitmap file
3. After step 2 set up the PXP to scale the YUV or BGA RGB888 image to a QVGA RGB565 image for a TFT display.

I think that step 2 should help eliminate some of the banding I see in gradients in RGB565 images on the PC, as it should give better color resolution---although with lower spatial resolution in the colors.

I think his type of problem can be approached by having two different PXP setups and saving each to a separate NEXT_PXP structure. You then simply store the appropriate structure address and there is no need to update any of the PXP registers.

Yesterday I spent some time thinking about ways to save the NEXT_PXP structure in nonvolatile memory (EEPROM) or on SD. (Or print it out as an initialized array of uint32_t that you can copy and paste into other source code). There seems to be some utility in that, but there is the sticking point that any changes in the program are likely to change the addresses of buffers defined as standard variables. I see two ways to get around that issue:

1. After you load a PXP setup from nonvolatile memory, you call the functions necessary to set the pointers to the buffers you have defined in your program.

2. If you are working with VGA images, you are constrained to using EXTMEM, so you could set fixed memory addresses in EXTMEM by declaring buffer pointers as constants--and remember to avoid declaring other variables in EXTMEM that might conflict. You could set four 1MB buffers in locations at the 4, 5, 6,and 7MB boundaries. Each could hold a VGA RGB888 image (about 921KB) or one of the smaller formats. If you are manipulating images one of the smaller formats you could do the same thing in a portion of DMAMEM for speed, but you will have to make sure you don't conflict with things like USB buffers or any heap variables.

PXP setup then simplifies to:
* Call function to Setup Clock and reset PXP
* Retrieve a saved PXP_NEXT Structure and store pointer to it in the NEXT register
* (Optional: update buffer addresses for variables declared in program)

The advantage of this process is that a new user can write programs to use the stored PXP setups without having to learn the PXP internals. If the canned setups are not exactly what they want, they can then modify the PXP as needed and as their familiarity with the hardware improves.

I'm hoping that something like conversion from RGB565 to RGB888 would need only about 60 lines of source---including the pre-defined PXP_NEXT structure and about 30 lines of code and function calls. I guess I'll see if I can make that happen with a rotate function before I move on to color space conversions. (Even though rotation is something that is done as well with a 10-line function--unless you have a way to use the foreground time while the PXP does the work.)



I think I've got enough of a handle on the PXP vs Program rotation stuff that I will next move on to trying some of the color-space conversions.
 
Here are two demo programs that set up the pixel pipeline to do rotations.

The first program does the actual PXP setup, with OV7670 Camera output rotated and displayed on an ILI9341 TFT module. The PXP setup is also converted to an array of 32 uint32_t entries that can be used with the PXP_NEXT functionality to do a setup of the PXP in other programs by cutting and pasting the array initialization code.

Code:
/****************************************************
  Collect camera data into two different sets of
  buffers and set up PXP rotations then print PXP_Next
  data for cut and paste into another program
  M Borgerson 12/5/2020
 ******************************************************** */

#include <OV7670.h>
#include <ILI9341_t3n.h>

// need to include PXP definitions if not using the
// latest imxrt.h from GitHub (as of 12/4/2020).
#ifndef PXP_CTRL_SET
#include "PXP_Defs.h"
#endif

//Specify the pins used for Non-SPI functions of display
#define TFT_CS   10  // AD_B0_02
#define TFT_DC   9  // AD_B0_03
#define TFT_RST  8

ILI9341_t3n tft = ILI9341_t3n(TFT_CS, TFT_DC, TFT_RST);

#define OUTBUFIDX 3
#define PSBUFIDX 12
const uint16_t  bwidth = 320;
const uint16_t  bheight = 240;


// PXP_Next structis 32 register setings, but we just save as an array
uint32_t PXP_Next_0[32];


// define memory buffers in different locations for speed testing
uint16_t srcdma[320l * 240l]__attribute__ ((aligned (64))) DMAMEM;
uint16_t dstdma[320l * 240l]__attribute__ ((aligned (64))) DMAMEM;

uint16_t srcext[320l * 240l]__attribute__ ((aligned (64))) EXTMEM;
uint16_t dstext[320l * 240l]__attribute__ ((aligned (64))) EXTMEM;

// CSI frame buffer 2 isn't used by PXP rotations
uint16_t fb2[320l * 240l];

uint16_t *srcptr = (uint16_t *)&srcdma;
uint16_t *dstptr = (uint16_t *)&dstdma;

const char compileTime [] = " Compiled on " __DATE__ " " __TIME__;

const int pinCamReset = 14;

void setup() {
  Serial.begin(9600);
  delay(200);
  Wire.begin();

  pinMode(pinCamReset, OUTPUT);


  digitalWriteFast(pinCamReset, LOW);
  delay(10);
  digitalWriteFast(pinCamReset, HIGH);  // subsequent resets via SCB


  if (OV7670.begin(QVGA, (uint8_t *)srcptr, (uint8_t *)&fb2)) {
    Serial.println("OV7670 camera initialized.");

  } else {
    Serial.println("Error initializing OV7670");
  }
  // 12 MHz gives 15FPS.  16MHz will do 20FPS, but leaves little time
  // for anything but video display.
  OV7670.SetCamClock(12);

  // Start ILI9341
  tft.begin();
  tft.setRotation(0);  // testing external rotation

  CMSI();

  Serial.println("Initializing PXP");
  delay(50);
  PXP_Init(srcptr, dstptr);
  delay(10);
  Serial.println("Ready");
}

void loop() {
  // Only 3 choices:  's' System Info  'f' capture single frame 't' run rotate tests
  char ch;
  if (Serial.available()) {
    ch = Serial.read();
    if (ch == 's') CMSI();
    if (ch == 'f') CMGF();
    if (ch == 't') TestSpeeds();
  }

}

elapsedMicros rutime;

void CMSI(void) {
  Serial.printf("\n\nOV7670 Camera and ILI9341  QVGA Test 3 %s\n", compileTime);
  OV7670.ShowCamConfig();
}

void ShowPXP(void) {
  Serial.printf("PXP_OUT_BUF:%08X \n", PXP_OUT_BUF);
  Serial.printf("PXP_PS_BUF:%08X \n", PXP_PS_BUF);
}

// save the PXP registers to the PXP_Next array
void SavePXPNext(uint32_t pxnptr[]) {
  uint16_t i;
  volatile uint32_t *pxptr = &PXP_CTRL;  // set first address
  uint32_t *nxptr = &pxnptr[0];
  for (i = 0; i < 29; i++) { // first 29 are at 16-byte intervals
    *nxptr++ = *pxptr;
    pxptr += 4; // skips ahead 16 bytes at input
  }
  // the last three entries are oddly spaced
  *nxptr++ = PXP_POWER;
  *nxptr++ = PXP_NEXT;
  *nxptr = PXP_PORTER_DUFF_CTRL;
}

// print out a PXP_Next array in a format that can be pasted into
// source code to get the same PXP behavior
void PrintPXPNext(const char *arrayname, uint32_t pxnptr[]) {
  uint16_t i;
  Serial.printf("\nuint32_t %s[32] = {", arrayname);
  for (i = 0; i < 31; i++) {
    if ((i % 6) == 0) Serial.println();
    Serial.printf("0x%08X, ", *pxnptr++);
  }
  // last one can't have a comma and needs bracket amd semicolon
  Serial.printf("0x%08X };", *pxnptr);
  Serial.println();
}

bool PXP_Done(void) {
  return PXP_STAT & PXP_STAT_IRQ;
}

// updated with PXP definitions from new imxrt.h
// and using constants bwidth = 320, bheight = 240 for QVGA
void PXP_Init(uint16_t *inbuff, uint16_t *outbuff) {
  // turn on the PXP Clock
  CCM_CCGR2 |= CCM_CCGR2_PXP(CCM_CCGR_ON);

  PXP_CTRL_SET = PXP_CTRL_SFTRST; //Reset the PXP
  PXP_CTRL_CLR = PXP_CTRL_SFTRST | PXP_CTRL_CLKGATE; //Clear reset and gate
  delay(10);

  PXP_CTRL_SET = PXP_CTRL_ROTATE(3) | PXP_CTRL_BLOCK_SIZE;  // Set Rotation 3 block size 16x16

  PXP_CSC1_COEF0 |= PXP_COEF0_BYPASS; 

  PXP_OUT_CTRL_SET = 0x0E;  // specify RGB565 output
  PXP_OUT_BUF = (volatile void *)outbuff;
  PXP_OUT_PITCH = bheight * 2; // output is 240 pixels by 2 bytes after rotation
  PXP_OUT_LRC = 0;
  PXP_OUT_LRC = ((bwidth) << 16) | (bheight);

  PXP_OUT_AS_ULC = 0xFFFFFFFF;  // not using the alpha surface
  PXP_OUT_AS_LRC = 0;

  PXP_OUT_PS_ULC = 0;  // start processing at upper left 0,0
  PXP_OUT_PS_LRC = ((bwidth) << 16) | (bheight); // same as output

  PXP_PS_CTRL_SET = 0x0E;  // PS buffer format is RGB565
  PXP_PS_BUF = (volatile void *)inbuff;
  PXP_PS_UBUF = 0;  // not using YUV
  PXP_PS_VBUF = 0;  // not using YUV
  PXP_PS_PITCH = 640; // input is 320 pixels by 2 bytes wide before rotation
  PXP_PS_SCALE = 0x10001000; // 1:1 scaling (0x1.000)
  PXP_PS_CLRKEYLOW_0 = 0xFFFFFF;  // this disables color keying
  PXP_PS_CLRKEYHIGH_0 = 0x0;  //  this disables color keying

  PXP_CTRL_SET = PXP_CTRL_IRQ_ENABLE;
  // we don't actually use the interrupt but need to enable the bits
  // in the PXP_STAT register
}

void PXP_Rotate(void) {
  uint32_t etime;
  SavePXPNext((uint32_t*)&PXP_Next_0);
  PXP_STAT_CLR = PXP_STAT;  // clears all flags
  PXP_CTRL_SET =  PXP_CTRL_ENABLE;  // start the PXP
  rutime = 0;
  // wait until rotation finished
  while (!PXP_Done()) {};
  PXP_CTRL_CLR =  PXP_CTRL_ENABLE;  // stop the PXP
  etime = rutime;
  Serial.printf("PXP Rotation took %lu microseconds\n", etime);
}

// Capture, rotate and display a single frame from OV7670
void CMGF(void) {
  uint16_t readyframe;
  uint32_t imagesize;
  imagesize = OV7670.ImageSize();
  OV7670.begin(QVGA, (uint8_t *)PXP_PS_BUF, (uint8_t *)&fb2);
  OV7670.ClearFrameReady();
  do {
    readyframe = OV7670.FrameReady();
  } while (readyframe != 1 ); // wait until  frame 1 just completed

  if ((uint32_t)PXP_PS_BUF > 0x2020000) { // makes camera dma data visible
    arm_dcache_delete((void *)PXP_PS_BUF, imagesize);
  }
  if ((uint32_t)PXP_OUT_BUF > 0x2020000) { // needed when doing DMA into memory
    arm_dcache_delete((void *)PXP_OUT_BUF, imagesize);
  }
  PXP_Rotate();
  Serial.printf("Output buffer at %p\n", PXP_OUT_BUF);
  if ((uint32_t)PXP_OUT_BUF > 0x2020000) {
    arm_dcache_flush((void *)PXP_OUT_BUF, imagesize); // needed when doing DMA out of memory
  }
  tft.writeRect(0, 0, tft.width(), tft.height(), (uint16_t *)PXP_OUT_BUF);

}

void TestFrame(uint16_t *psrc, uint16_t *pdst) {
  // set up frame 1 to store in psrc, frame 2 to fb2
  uint32_t imagesize;
  imagesize = OV7670.ImageSize();
  PXP_PS_BUF = (void *)psrc;
  PXP_OUT_BUF = (void *)pdst; // set the PXP OUT buffer pointer
  if ((uint32_t)psrc > 0x20200000) arm_dcache_flush((void*)psrc, imagesize);
  if ((uint32_t)pdst > 0x20200000) arm_dcache_flush((void*)pdst, imagesize);

  CMGF();
}

// try various combinations of source and destination memory
// to compare the rotation speeds
void TestSpeeds(void) {
  Serial.println("\nDMAMEM to DMAMEM");
  TestFrame((uint16_t *)&srcdma, (uint16_t *)&dstdma);
  PrintPXPNext("Rot_DMA_DMA ", (uint32_t*)&PXP_Next_0);

  Serial.println("\nDMAMEM to EXTMEM");
  TestFrame((uint16_t *)&srcdma, (uint16_t *)&dstext);
  PrintPXPNext("Rot_DMA_EXT ", (uint32_t*)&PXP_Next_0);

  Serial.println("\nEXTMEM to DMAMEM");
  TestFrame((uint16_t *)&srcext, (uint16_t *)&dstdma);
  PrintPXPNext("Rot_EXT_DMA ", (uint32_t*)&PXP_Next_0);

  Serial.println("\nEXTMEM to EXTMEM");
  TestFrame((uint16_t *)&srcext, (uint16_t *)&dstext);
  PrintPXPNext("Rot_EXT_EXT ", (uint32_t*)&PXP_Next_0);
}

The second program allows you to use the same PXP setup without going through setup code or modifying any registers to switch between two setups. The program does require that you specify the input and output buffers for the rotation--in case the new program should have the arrays in different places than the original program.

Code:
/****************************************************
  Collect camera data and rotate using restored PXP
  setups copied from output of SavePXP program
  m. borgerson   12/5/2020
 ******************************************************** */
#include <OV7670.h>
#include <ILI9341_t3n.h>

// need to include PXP definitions if not using the
// latest imxrt.h from GitHub (as of 12/4/2020).
#ifndef PXP_CTRL_SET
#include "PXP_Defs.h"
#endif

//Specify the pins used  display for Non-SPI functions
#define TFT_CS   10  // AD_B0_02
#define TFT_DC   9  // AD_B0_03
#define TFT_RST  8

ILI9341_t3n tft = ILI9341_t3n(TFT_CS, TFT_DC, TFT_RST);

// we are using QVGA settings for camera
const uint16_t  bwidth = 320;
const uint16_t  bheight = 240;

// define memory buffers in different locations for speed testing
uint16_t srcdma[320l * 240l]__attribute__ ((aligned (64))) DMAMEM;
uint16_t dstdma[320l * 240l]__attribute__ ((aligned (64))) DMAMEM;

uint16_t srcext[320l * 240l]__attribute__ ((aligned (64))) EXTMEM;
uint16_t dstext[320l * 240l]__attribute__ ((aligned (64))) EXTMEM;

#define OUTBUFIDX 3
#define PSBUFIDX 12
// PXP_Next struct is 32 register setings, but we just save as an array

uint32_t Rot_EXT_EXT [32] = {
  0x00800302, 0x00000000, 0x0000000E, 0x70000000, 0x00000000, 0x000001E0,
  0x014000F0, 0x00000000, 0x014000F0, 0x3FFF3FFF, 0x00000000, 0x0000000E,
  0x70025800, 0x00000000, 0x00000000, 0x00000280, 0x00000000, 0x10001000,
  0x00000000, 0x00FFFFFF, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
  0x00FFFFFF, 0x00000000, 0x44000000, 0x01230208, 0x079B076C, 0x00000000,
  0x00000000, 0x00000000
};

uint32_t Rot_DMA_DMA [32] = {
  0x00800302, 0x00000000, 0x0000000E, 0x20200000, 0x00000000, 0x000001E0,
  0x014000F0, 0x00000000, 0x014000F0, 0x3FFF3FFF, 0x00000000, 0x0000000E,
  0x20225800, 0x00000000, 0x00000000, 0x00000280, 0x00000000, 0x10001000,
  0x00000000, 0x00FFFFFF, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
  0x00FFFFFF, 0x00000000, 0x44000000, 0x01230208, 0x079B076C, 0x00000000,
  0x00000000, 0x00000000
};

#define OUTBUFIDX 3
#define PSBUFIDX 12

// CSI frame buffer 2 isn't used by PXP rotations
uint16_t fb2[320l * 240l];

const char compileTime [] = " Compiled on " __DATE__ " " __TIME__;

const int pinCamReset = 14;

void setup() {
  uint8_t *srcptr = (uint8_t *)Rot_DMA_DMA [PSBUFIDX];
  Serial.begin(9600);
  delay(200);
  Wire.begin();

  pinMode(pinCamReset, OUTPUT);
  digitalWriteFast(pinCamReset, LOW);
  delay(10);
  digitalWriteFast(pinCamReset, HIGH);  // subsequent resets via SCB

  if (OV7670.begin(QVGA, (uint8_t *)srcptr, (uint8_t *)&fb2)) {
    Serial.println("OV7670 camera initialized.");

  } else {
    Serial.println("Error initializing OV7670");
  }
  // 12 MHz gives 15FPS.  16MHz will do 20FPS, but leaves little time
  // for anything but video display.
  OV7670.SetCamClock(12);
  // Start ILI9341
  tft.begin();
  tft.setRotation(0);  // testing external rotation

  CMSI();
  Serial.println("Adjusting PXP buffer addresses.");
  delay(10);
  SetNextBuffers(Rot_EXT_EXT, srcext, dstext);
  SetNextBuffers(Rot_DMA_DMA, srcdma, dstdma);

  Serial.println("Initializing PXP");
  delay(50);
  PXP_Start((uint32_t)&Rot_DMA_DMA);
  delay(10);
  Serial.println("Ready");
}

void loop() {
  // Only 3 choices:  's' System Info  'f' capture single frame 't' run rotate tests
  char ch;
  if (Serial.available()) {
    ch = Serial.read();
    if (ch == 's') CMSI();
    if (ch == 'f') CMGF();
    if (ch == 't') TestSpeeds();
  }
}

// adjust the source and destination buffers in the saved PXP settings to match
// the variables declared in this program
void SetNextBuffers(uint32_t pxpnxt[], uint16_t src[], uint16_t dst[]) {
  pxpnxt[PSBUFIDX] = (uint32_t)src;
  pxpnxt[OUTBUFIDX] = (uint32_t)dst;
}

void CMSI(void) {
  Serial.printf("\n\nOV7670 Camera and ILI9341  QVGA Test 3 %s\n", compileTime);
  OV7670.ShowCamConfig();
}

// Capture, rotate and display a single frame from OV7670
void CMGF(void) {
  uint16_t readyframe;
  uint32_t imagesize;
  imagesize = OV7670.ImageSize();
  OV7670.begin(QVGA, (uint8_t *)PXP_PS_BUF, (uint8_t *)&fb2);
  OV7670.ClearFrameReady();
  do {
    readyframe = OV7670.FrameReady();
  } while (readyframe != 1 ); // wait until  frame 1 just completed

  if ((uint32_t)PXP_PS_BUF > 0x2020000) { // makes camera dma data visible
    arm_dcache_delete((void *)PXP_PS_BUF, imagesize);
  }
  if ((uint32_t)PXP_OUT_BUF > 0x2020000) { // needed when doing DMA into memory
    arm_dcache_delete((void *)PXP_OUT_BUF, imagesize);
  }
  PXP_Rotate();
  Serial.printf("Output buffer at %p\n", PXP_OUT_BUF);
  if ((uint32_t)PXP_OUT_BUF > 0x2020000) {
    arm_dcache_flush((void *)PXP_OUT_BUF, imagesize); // needed when doing DMA out of memory
  }
  tft.writeRect(0, 0, tft.width(), tft.height(), (uint16_t *)PXP_OUT_BUF);

}

// Use two different PXP_NEXT settings to compare the rotation speeds
void TestSpeeds(void) {
  Serial.println("\nDMAMEM to DMAMEM");
  PXP_NEXT = (uint32_t)&Rot_DMA_DMA;
  PXP_CTRL_CLR = PXP_CTRL_ENABLE;  // stop automatic execution on PXP_NEXT write
  CMGF();// get, rotate and display a frame
  delay(1000);
  Serial.println("\nEXTMEM to EXTMEM");
  PXP_NEXT = (uint32_t)&Rot_EXT_EXT;
  PXP_CTRL_CLR = PXP_CTRL_ENABLE;  // stop automatic execution on PXP_NEXT write
  CMGF();  // get, rotate and display a frame
}

bool PXP_Done(void) {
  return PXP_STAT & PXP_STAT_IRQ;
}

// Restart PXP with settings from a PXP_Next array
void PXP_Start(uint32_t pxnptr) {
  // turn on clock to PXP
  CCM_CCGR2 |= CCM_CCGR2_PXP(CCM_CCGR_ON);

  PXP_CTRL_SET = PXP_CTRL_SFTRST; //Reset
  PXP_CTRL_CLR = PXP_CTRL_SFTRST | PXP_CTRL_CLKGATE; //Clear reset and gate
  delay(10);
  // storing pointer in PXP_NEXT causes PXP to restore settings
  PXP_NEXT = pxnptr;
}

elapsedMicros rutime;
void PXP_Rotate(void) {
  uint32_t etime;
  PXP_STAT_CLR = PXP_STAT;  // clears all flags
  PXP_CTRL_SET =  PXP_CTRL_ENABLE;  // start the PXP
  rutime = 0;  // reset the timing counter
  // wait until rotation finished
  while (!PXP_Done()) { };

  PXP_CTRL_CLR =  PXP_CTRL_ENABLE;  // stop the PXP
  etime = rutime;
  Serial.printf("PXP Rotation took %lu microseconds\n", etime);
}

The primary advantage of using the PXP_Next arrays for setup is that you can switch from one PXP setup to another with minimal code. These rotation example are excessively simple in that the PXP rotation can be bypassed by just setting the TFT display rotation to 3 instead of zero. In my case, this example code was a first step toward simplifying setup and restore for more complex operations like scaling and color space conversions.
 
Very cool signs of utility and progress. Using another hardware capability for 'background' processing keeping the loop() free to loop()!
 
Would be good to have the needed pins for the rest of the video hardware.. let's hope the MM has them instead 8 serial.. ;)
 
Last edited:
I've made good progress on one of my goals: having the PXP convert incoming YUV422 image buffers from the OV7670 camera to QVGA RGB565 buffers for display on an ILI9341 board. I have a test program that accepts YUV422 QQVGA, QVGA, and VGA images, scales them as necessary to display at QVGA size, then converts from YUV to RGB565 for the ILI9341. I'll post some demo code when I clean up the debugging cruft and simplify the user interface.

One of the key tools in the development was getting a readable display of the PXP setup. I ended up with this:
Code:
[FONT=Courier New]CTRL:         00800002       STAT:         00200001
OUT_CTRL:     0000000E       OUT_BUF:      70000000    OUT_BUF2: 00000000
OUT_PITCH:         640       OUT_LRC:       320,240
OUT_PS_ULC:      0,  0       OUT_PS_LRC:    320,240
OUT_AS_ULC:   16383,16383    OUT_AS_LRC:      0,  0

PS_CTRL:      00000032       PS_BUF:       70300000
PS_UBUF:      00000000       PS_VBUF:      00000000
PS_PITCH:         1280       PS_BKGND:     00000080
PS_SCALE:     20002000       PS_OFFSET:    00000000
PS_CLRKEYLOW: 00FFFFFF       PS_CLRKEYLHI: 00000000

AS_CTRL:      00000000       AS_BUF:       00000000    AS_PITCH:      0
AS_CLRKEYLOW: 00FFFFFF       AS_CLRKEYLHI: 00000000

CSC1_COEF0:   84030000       CSC1_COEF1:   01230208    CSC1_COEF2: 076B079C

POWER:        00000000       NEXT:         00000000
PORTER_DUFF:  00000000[/FONT]

Old Fart Digression: Why is the default font for code display not a monospaced font? If I want to have my columns line up nicely, I have to change the font to Courier New.

Here is the code for the display, which I hope will help out others working with the Pixel Pipeline:

Code:
[FONT=Courier New]
void ShowPXP(void) {
  Serial.printf("CTRL:         %08X       STAT:         %08X\n", PXP_CTRL, PXP_STAT);
  Serial.printf("OUT_CTRL:     %08X       OUT_BUF:      %08X    OUT_BUF2: %08X\n", PXP_OUT_CTRL,PXP_OUT_BUF,PXP_OUT_BUF2);
  Serial.printf("OUT_PITCH:    %8lu       OUT_LRC:       %3u,%3u\n", PXP_OUT_PITCH, PXP_OUT_LRC>>16, PXP_OUT_LRC&0xFFFF);

  Serial.printf("OUT_PS_ULC:    %3u,%3u       OUT_PS_LRC:    %3u,%3u\n", PXP_OUT_PS_ULC>>16, PXP_OUT_PS_ULC&0xFFFF,
                                                               PXP_OUT_PS_LRC>>16, PXP_OUT_PS_LRC&0xFFFF);
  Serial.printf("OUT_AS_ULC:   %3u,%3u    OUT_AS_LRC:    %3u,%3u\n", PXP_OUT_AS_ULC>>16, PXP_OUT_AS_ULC&0xFFFF,
                                                               PXP_OUT_AS_LRC>>16, PXP_OUT_AS_LRC&0xFFFF);
  Serial.println();
  Serial.printf("PS_CTRL:      %08X       PS_BUF:       %08X\n", PXP_PS_CTRL,PXP_PS_BUF);
  Serial.printf("PS_UBUF:      %08X       PS_VBUF:      %08X\n", PXP_PS_UBUF, PXP_PS_VBUF);
  Serial.printf("PS_PITCH:     %8lu       PS_BKGND:     %08X\n", PXP_PS_PITCH, PXP_PS_BACKGROUND_0);
  Serial.printf("PS_SCALE:     %08X       PS_OFFSET:    %08X\n", PXP_PS_SCALE,PXP_PS_OFFSET);
  Serial.printf("PS_CLRKEYLOW: %08X       PS_CLRKEYLHI: %08X\n", PXP_PS_CLRKEYLOW_0,PXP_PS_CLRKEYHIGH_0);
  Serial.println();
  Serial.printf("AS_CTRL:      %08X       AS_BUF:       %08X    AS_PITCH: %6u\n", PXP_AS_CTRL,PXP_AS_BUF, PXP_AS_PITCH & 0xFFFF);
  Serial.printf("AS_CLRKEYLOW: %08X       AS_CLRKEYLHI: %08X\n", PXP_AS_CLRKEYLOW_0,PXP_AS_CLRKEYHIGH_0);
  Serial.println();
  Serial.printf("CSC1_COEF0:   %08X       CSC1_COEF1:   %08X    CSC1_COEF2: %08X\n", 
                                                                PXP_CSC1_COEF0,PXP_CSC1_COEF1,PXP_CSC1_COEF2);
  Serial.println();
  Serial.printf("POWER:        %08X       NEXT:         %08X\n", PXP_POWER,PXP_NEXT);
  Serial.printf("PORTER_DUFF:  %08X\n", PXP_PORTER_DUFF_CTRL);
}

[/FONT]
 
A follow up note with some PXP conversion timings:

VGA YUV422 to QVGA RGB565 70.43mSec
QVGA YUV422 to QVGA RGB565 22.15mSec
QQVGA YUV422 to QVGA RGB565 10.13mSec

Another anomaly that annoys me: When I switch the OV7670 from full-frame VGA mode to QVGA the downsampling process seems to switch the output from UYVY to VYUY and I have to change the YUV output bit order settings to get the colors right.

There are other issues with the setting of the camera and PXP YUV to RGB conversion coefficients. I can understand how the guys who write the camera and display drivers for smartphones have to spend hundreds or thousands of hours tweaking registers to get the photo and display quality we expect from our phones.
 
Now that I got my test boards in I loaded this up again and thanks to your ShowPXP method I was able to determine that the manual has a slight error in the PXP_NEXT description that caused the issues for me, specifically with this line here:
Code:
All registers will be reloaded with the exception of the following: STAT, CSCCOEFn, NEXT.
As it turns out CSCCOEFn is in fact reloaded and since I never set it in my next structure the bypass bit wasn't staying turned on like I thought it would be and I would've never checked for it either.

Since everything is now working correctly on my end I was able to finally test some dynamic color space conversions with alpha overlays and I'm happy to report that it is working wonderfully for my application.
 
Nice to see you have things working for your application. I didn't run into the glitch with CSCOEFn restoration as my scheme for saving and restoring a Next_PXP array pulls all the data from the PXP and stores it in the array with a simple for() loop and three transfers for the items at the end of the array. I haven't yet tackled alpha overlays, but I can see a possible usage for time stamping captured camera frames by writing the time stamp into a small bitmap and using overlays to show it in the captured frame. Another possible project is to display a video or camera frame capture in a window on my ILI9341 while putting some touch-screen start and stop buttons in the foreground.

Right now, my PXP operations are so tightly tied to my OV7670 camera setup that I need to make sure that the camera code is stable before posting more PXP code.

I've started cleaning up the camera code and have posted it on GitHub at https://github.com/mjborgerson/OV7670

I need to go through all the examples in the library folder to make sure that they work with the latest OV7670 object and there is a lot of work to be done on the pdf documentation file.
 
I’m using it almost like a texture map for drawing “realistic” controls on screen and drawing the color separately underneath it. With the color space conversion it makes it easy to use transparent bitmaps in ARGB8888 format to accomplish that. This way I don’t have to store a separate bitmap for every color I want to use since that will waste memory. Though it can push the limits of what the PXP can handle if I draw too many controls too quickly.
 
@vjmuzik I would love to test the PXP library with LVGL and my custom display driver.

A bit of background - I use an ILI9488 (480*320) in landscape mode and data is transferred via an 8 bit parallel bus (8080). Im using FlexIO and DMA to acheive fast async transfers.
My only issue is it suffers from some visible tearing as the display is actually updating left>right in portrait mode.
Using a rotation + syncing the display update with vsync (using TE signal + IRQ) I could eliminate tearing.

I have tried some methods of SW rotations with @vindar but as I has expected it slows down LVGL renering, so when I run animations the latency is quite visible - this is where the PXP would come in handy.
Now the question is, how would I plug in PXP to this?
LVGL can be configured for either 1 screen sized frame buffer at or two half screen sized frame buffer at most - they sit in DMAMEM.
The display driver does a DMA transfer from the buffer(s) into FlexIO and out to the display.
I assume I would need another frame buffer for the PXP to draw into? And then have the display driver read from that additional buffer?

Would be highly appriciated if you could steer me into the right direction on how I would rotate a full screen sized buffer (if possible) using the PXP?

Im using a teensy Micromod BTW
 
That last sentence--using the Micromod--could present some problems. A 480 x320 buffer of RGB565 data is 307200 bytes. DMAMEM only has room for one of those. If your program allows you to put another buffer in DTCM, you may be able to use the pixel pipeline to rotate from one buffer to another. On a T4.1, you could put one or both in EXTMEM PSRAM.

I'll have to look back at some of my tests, but I think that a pass of a full VGA buffer through the PXP took around a hundred milliseconds. I was able to move the whole process of moving a VGA image to a QVGA buffer, then to an ILI9341 into a background routine using an interrupt from the CSI at the end of an image capture to tell the PXP to start the transfer. The PXP-done interrupt started a background transfer from memory to the ILI9341 using KurtE's excellent library. I think I was OK updating the LCD at 10Hz, because it was only being used as a viewfinder for the camera.



You may be able to do something similar with your FlexIO transfers, but the memory requirements may be the deciding issue.
 
I’m a little at a loss for why the rotation is needed in software, don’t these type of displays usually have a built in rotation to match the buffer you send it? I’ll admit I’ve never used one of the ILI type displays or had one that has the tear effect line broken out so I haven’t been able to use it even if I wanted to. Maybe the display rotation just changes the write direction, but not the read direction from the internal buffer.

I haven’t touched anything electronic related in a little bit so my knowledge is a little rusty. I thought the PXP rotation was done in blocks of a certain size and it could be rotated back into the same buffer without issues.
 
That last sentence--using the Micromod--could present some problems. A 480 x320 buffer of RGB565 data is 307200 bytes. DMAMEM only has room for one of those. If your program allows you to put another buffer in DTCM, you may be able to use the pixel pipeline to rotate from one buffer to another. On a T4.1, you could put one or both in EXTMEM PSRAM.
true, there is not enough DMAMEM for two full screen sized buffers.
What I can do is have one full screen sized buffer for the display itself, and a partial buffer from LVGL that draws the page in several chunks. While it draws a chunk, PXP could flip it. I think
I have build a custom Teensy MM/4.1 - It's got the pinout of a MM with all the extra FlexIO2 pins exposed (also happens to be the entire interface for the eLCDIF peripheral) and it also has the PSRAM/Flash chips routed as well. So I just might be able to have a source buffer in DMAMEM and a destination buffer in EXTMEM - this should take 10-6ms based on the benchmarks posted above. This should leave enough headroom as the display updates every 15-12ms

I’m a little at a loss for why the rotation is needed in software, don’t these type of displays usually have a built in rotation to match the buffer you send it? I’ll admit I’ve never used one of the ILI type displays or had one that has the tear effect line broken out so I haven’t been able to use it even if I wanted to. Maybe the display rotation just changes the write direction, but not the read direction from the internal buffer.

I haven’t touched anything electronic related in a little bit so my knowledge is a little rusty. I thought the PXP rotation was done in blocks of a certain size and it could be rotated back into the same buffer without issues.
You answered your question :)
If you place the display in portraite orientation, the scan line runs left to right, starting top left down to the bottom right.
While you can set the rotation of the image in the MEMORY_ACCESS_CONTROL register of the display, it only changes the direction of writing into GRAM - so this is the cause of the tearing.
The PXP does the rotation in blocks, for example 8x8 or 12x12 blocks. The higher the blocks size the faster it will go, but your refresh area needs to be able to be divided by the block size.
E.g 320x480 (153600) / 8x8 (64) = 2400
But, I don't think the PXP can use the same buffer as the source/destination. I wish it could
 
@vjmuzik circling back to this as I have my custom T4.1/MM board with all the FlexIO2 pins/eLCDIF pins exposed, as well as the external PSRAM, so now is the time to start playing with this wonderful library :)
1. What do I call to rotate an image? I understand that I init, then set the rotation and color formats - then I just call process() each time I want to rotate what's in the source buffer?
2. Is there a callback I can register to when the entire rotation is done? I would need this to trigger the transfer to the display. Or can I just add my own function call at the end of PXP_isr?

Thanks!
 
@vjmuzik circling back to this as I have my custom T4.1/MM board with all the FlexIO2 pins/eLCDIF pins exposed, as well as the external PSRAM, so now is the time to start playing with this wonderful library :)
1. What do I call to rotate an image? I understand that I init, then set the rotation and color formats - then I just call process() each time I want to rotate what's in the source buffer?
2. Is there a callback I can register to when the entire rotation is done? I would need this to trigger the transfer to the display. Or can I just add my own function call at the end of PXP_isr?

Thanks!

Truthfully I haven’t done much with this since making the library, very little even at that, so my memory of it isn’t the best.
1. That sounds about right, all the functions just change placeholder variables then call process to start it.
2. After calling process you can call finish to make sure it’s done before moving on or you can add a function call to the end of the isr if you don’t want to block your program while it’s working.
 
@vjmuzik circling back to this as I have my custom T4.1/MM board with all the FlexIO2 pins/eLCDIF pins exposed, as well as the external PSRAM, so now is the time to start playing with this wonderful library :)
1. What do I call to rotate an image? I understand that I init, then set the rotation and color formats - then I just call process() each time I want to rotate what's in the source buffer?
2. Is there a callback I can register to when the entire rotation is done? I would need this to trigger the transfer to the display. Or can I just add my own function call at the end of PXP_isr?

Thanks!

I used the PXP end-of-conversion interrupt to put together an All-Background camera viewfinder using the asynchronous update capability of the ILI9341_T3N libarary. The process looked like this:

1. Setup the camera CSI interface for an END-OF-FRAME interrupt. That interrupt calls a function that sets some buffer pointers and calls the PXP to rotate, resize, and do necessary color-space conversions. The CSI runs continuously in the background. The PXP input buffer is VGA size and is in EXTMEM.
2. Set a PXP end-of-conversion interrupt that calls the LCD asynchronous update function, which runs in the background.
3. The LCD asynchronous update doesn't require an interrupt handler.

A lot of extra code is needed if you are getting images from the CSI running continuously. The CSI and PXP do not play nicely together if both are trying to access EXTEM at the same time in the background. I found it necessary to have the CSI interrupt handler halt the CSI for one or more frame intervals while the PXP did its thing. When the PXP is done, it lets the CSI start again. Since the QVGA output buffer is in DMAMEM or DTCM, the asynchronous LCD updates don't interfere with the CSI.

My PXP library doesn't have callbacks built in, so you have to add your own:

Code:
void (*EndPXPHook)(void)  = NULL; 

// Set the callback function pointer for end of PXP operation
void SetEndPXPHook(void(*epxpptr)(void)) {
	EndPXPHook = epxpptr;
}


/********************************************************
 * PXP interrupt handler.  Clears interrupt bit, then
 * starts the asynchronous screen update.  Since the
 * PXP output buffer is in DTCM, not EXTMEM, the
 * asynchronous update seems to play well with the CSI
 *********************************************************** */
void PXP_Handler(void) {
	pxp.Stop();                  // Stop PXP until next frame
	PXP_STAT_CLR = PXP_STAT_IRQ; // clear PXP interrupt
	if(EndPXPHook != NULL)EndPXPHook(); // call user end-of-PXP function
	tft.updateScreenAsync();  // Update the LCD in the background
}

In your setup() function, add something like this:

	attachInterruptVector(IRQ_PXP, &PXP_Handler);
	NVIC_ENABLE_IRQ(IRQ_PXP);

 	// Set up interrupt at camera EOF--but only AFTER PXP interrupt set
	camLib.SetEOFHook(&eofLCD);
 
Last edited:
Back
Top