T4 Pixel Pipeline Library

@mborgerson thanks for the breakdown and code example.
I'm generating an image (480*320px @16 bit color) using LVGL and feeding that into a buffer placed in DMAMEM, which is also the source buffer for PXP.
PXP then fills a destination buffer with the rotated image (90 degrees) that is placed in EXTMEM. I have my own 8 bit 8080 ILI9488 display driver that I wrote for the Teensy MicroMod, it transfers the data via DMA/FlexIO.

Reason I have my buffers configured in that way is due to the following list (in bold):
Code:
Rotation times as a function of memory source and destination

Src-->dest		PGM (uSec)	PXP (uSec)
--------------------------------------------------------
DTCM   -->  DTCM	387		578
DTCM   -->  DMAMEM	721		562
DTCM   -->  EXTMEM	5997		6487
DMAMEM -->  DTCM	547		561		
DMAMEM -->  DMAMEM	701		583
[B]DMAMEM -->  EXTMEM	8681		6041[/B]
EXTMEM -->  DTCM	5567		12745		
[B]EXTMEM -->  DMAMEM	5589		12751[/B]
[B]EXTMEM -->  EXTMEM	28630		19204
[/B]


I have the sample app written and compiled, but haven't actually tested it yet - I need to wire up the display to my custom PCB, and also set some variables to test the timing of the full flow/each step.
 
Got around to wiring up my ILI9486 breakout display to my custom board Teensy MM with PSRAM :cool:

Tested the sketch, but I am running into some issues:

1. The image is being rotated by 90 degrees but it looks like so:
IMG_9194.jpg
It's filling only two thirds of the screen (320x320) and the artifact on the side is usually black (picture was taken with some trash data remaining on the GRAM)
Just remember, I am taking a 480x320px landscape image, rotating it by 90 degrees and pushing it in portrait orientation to the display (320x480)

2. I added a callback to the interrupt, but I see it executing three times for a single process() call.
I placed my function at the end of PXP_isr(), but maybe it should be elsewhere?
Code:
void PXP_isr(){
  if((PXP_STAT & PXP_STAT_LUT_DMA_LOAD_DONE_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_LUT_DMA_LOAD_DONE_IRQ;
  }
  if((PXP_STAT & PXP_STAT_NEXT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_NEXT_IRQ;
  }
  if((PXP_STAT & PXP_STAT_AXI_READ_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_READ_ERROR;
  }
  if((PXP_STAT & PXP_STAT_AXI_WRITE_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_WRITE_ERROR;
  }
  if((PXP_STAT & PXP_STAT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_IRQ;
    PXP_done = true;
    // SHOULD IT GO HERE?
  }
#if defined(__IMXRT1062__) // Teensy 4.x
  asm("DSB");
#endif
// MY FUNCTION IS HERE
}

Here is my sketch:
Code:
#include "ILI948x_t4_mm.h"
#include "flexio_teensy_mm.c"
#include "T4_PXP.h"
ILI948x_t4_mm lcd = ILI948x_t4_mm(13,11,12); //(dc, cs, rst)
DMAMEM uint16_t s_fb[480*320];
EXTMEM uint16_t d_fb[480*320];
uint32_t etime;
elapsedMicros runtime;

FASTRUN void pushData(){
  etime = runtime;
  Serial.printf("PXP Rotation took %lu microseconds\n", etime);
  runtime = 0;
  etime = 0;
  arm_dcache_flush(d_fb, sizeof(d_fb)); // always flush cache after writing to DMAMEM variable that will be accessed by DMA
  lcd.pushPixels16bitDMA(d_fb,0,0,319,479); // DMA transfer
  //lcd.pushPixels16bit(d_fb,0,0,319,479); // Polling transfer
}

FASTRUN void lcdCallback(){
  etime = runtime;
  Serial.printf("Display DMA transfer took %lu microseconds\n", etime); 
}


void setup() {
  Serial.begin(115200);
  delay(1000);
  Serial.print(CrashReport);
  
  lcd.begin(24);
  lcd.setRotation(0);
  lcd.onCompleteCB(&lcdCallback);

  arm_dcache_flush((uint16_t*)s_fb, sizeof(s_fb)); // always flush cache after writing to DMAMEM variable that will be accessed by DMA
  memcpy(s_fb, flexio_teensy_mm, sizeof(flexio_teensy_mm));
  
  PXP_init();
  PXP_input_buffer(s_fb, 2, 480, 320);
  PXP_input_format(PXP_RGB565);
  PXP_output_buffer(d_fb, 2, 320, 480);
  PXP_output_format(PXP_RGB565);
  PXP_rotate(1);
  PXP_complete_cb(&pushData); // <------ MY CUSTOM PXP COMPLETE CALLBACK - CALLED FROM PXP_isr()
 
  runtime = 0;
  PXP_process();
}

void loop() {

}

My display library and the image used can be found on my github:
Driver: https://github.com/david-res/ILI948x_t4_mm
Image: https://github.com/david-res/ILI948x_t4_mm/blob/master/examples/ILI948x_mm_test/flexio_teensy_mm.c
 
Got around to wiring up my ILI9486 breakout display to my custom board Teensy MM with PSRAM :cool:

Tested the sketch, but I am running into some issues:

1. The image is being rotated by 90 degrees but it looks like so:
View attachment 30232
It's filling only two thirds of the screen (320x320) and the artifact on the side is usually black (picture was taken with some trash data remaining on the GRAM)
Just remember, I am taking a 480x320px landscape image, rotating it by 90 degrees and pushing it in portrait orientation to the display (320x480)

2. I added a callback to the interrupt, but I see it executing three times for a single process() call.
I placed my function at the end of PXP_isr(), but maybe it should be elsewhere?
Code:
void PXP_isr(){
  if((PXP_STAT & PXP_STAT_LUT_DMA_LOAD_DONE_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_LUT_DMA_LOAD_DONE_IRQ;
  }
  if((PXP_STAT & PXP_STAT_NEXT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_NEXT_IRQ;
  }
  if((PXP_STAT & PXP_STAT_AXI_READ_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_READ_ERROR;
  }
  if((PXP_STAT & PXP_STAT_AXI_WRITE_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_WRITE_ERROR;
  }
  if((PXP_STAT & PXP_STAT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_IRQ;
    PXP_done = true;
    // SHOULD IT GO HERE?
  }
#if defined(__IMXRT1062__) // Teensy 4.x
  asm("DSB");
#endif
// MY FUNCTION IS HERE
}

Here is my sketch:
Code:
#include "ILI948x_t4_mm.h"
#include "flexio_teensy_mm.c"
#include "T4_PXP.h"
ILI948x_t4_mm lcd = ILI948x_t4_mm(13,11,12); //(dc, cs, rst)
DMAMEM uint16_t s_fb[480*320];
EXTMEM uint16_t d_fb[480*320];
uint32_t etime;
elapsedMicros runtime;

FASTRUN void pushData(){
  etime = runtime;
  Serial.printf("PXP Rotation took %lu microseconds\n", etime);
  runtime = 0;
  etime = 0;
  arm_dcache_flush(d_fb, sizeof(d_fb)); // always flush cache after writing to DMAMEM variable that will be accessed by DMA
  lcd.pushPixels16bitDMA(d_fb,0,0,319,479); // DMA transfer
  //lcd.pushPixels16bit(d_fb,0,0,319,479); // Polling transfer
}

FASTRUN void lcdCallback(){
  etime = runtime;
  Serial.printf("Display DMA transfer took %lu microseconds\n", etime); 
}


void setup() {
  Serial.begin(115200);
  delay(1000);
  Serial.print(CrashReport);
  
  lcd.begin(24);
  lcd.setRotation(0);
  lcd.onCompleteCB(&lcdCallback);

  arm_dcache_flush((uint16_t*)s_fb, sizeof(s_fb)); // always flush cache after writing to DMAMEM variable that will be accessed by DMA
  memcpy(s_fb, flexio_teensy_mm, sizeof(flexio_teensy_mm));
  
  PXP_init();
  PXP_input_buffer(s_fb, 2, 480, 320);
  PXP_input_format(PXP_RGB565);
  PXP_output_buffer(d_fb, 2, 320, 480);
  PXP_output_format(PXP_RGB565);
  PXP_rotate(1);
  PXP_complete_cb(&pushData); // <------ MY CUSTOM PXP COMPLETE CALLBACK - CALLED FROM PXP_isr()
 
  runtime = 0;
  PXP_process();
}

void loop() {

}

My display library and the image used can be found on my github:
Driver: https://github.com/david-res/ILI948x_t4_mm
Image: https://github.com/david-res/ILI948x_t4_mm/blob/master/examples/ILI948x_mm_test/flexio_teensy_mm.c

1. In the context of rotation, logically I believe that makes sense that it’s clipped to 320x320, though I understand that doesn’t really solve your problem when transferring to the screen. Now that I think about it with that in mind standard rotation might not be what you need to solve the problem, at least without extra steps. I believe the destination buffer would have to be 480x480 to rotate the whole image then you would have to either write it as 480 separate lines clipped to 320 in length to the display or copy to a separate 320x480 buffer then send that all at once.

The issue is you want the picture to stay landscape but be stored as if it was a portrait picture, which isn’t a standard rotation. When put in that context the PXP method probably won’t save any amount of time as opposed to writing a for loop that can shuffle the pixels around within the same buffer that’s in DMAMEM so there’s no costly EXTMEM operations.

2. I believe the function should go where PXP_done is, I never payed attention to how many times each interrupt was called though.
 
Got around to wiring up my ILI9486 breakout display to my custom board Teensy MM with PSRAM :cool:

Tested the sketch, but I am running into some issues:

1. The image is being rotated by 90 degrees but it looks like so:
View attachment 30232
It's filling only two thirds of the screen (320x320) and the artifact on the side is usually black (picture was taken with some trash data remaining on the GRAM)
Just remember, I am taking a 480x320px landscape image, rotating it by 90 degrees and pushing it in portrait orientation to the display (320x480)

2. I added a callback to the interrupt, but I see it executing three times for a single process() call.
I placed my function at the end of PXP_isr(), but maybe it should be elsewhere?
Code:
void PXP_isr(){
  if((PXP_STAT & PXP_STAT_LUT_DMA_LOAD_DONE_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_LUT_DMA_LOAD_DONE_IRQ;
  }
  if((PXP_STAT & PXP_STAT_NEXT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_NEXT_IRQ;
  }
  if((PXP_STAT & PXP_STAT_AXI_READ_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_READ_ERROR;
  }
  if((PXP_STAT & PXP_STAT_AXI_WRITE_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_WRITE_ERROR;
  }
  if((PXP_STAT & PXP_STAT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_IRQ;
    PXP_done = true;
    // SHOULD IT GO HERE?
  }
#if defined(__IMXRT1062__) // Teensy 4.x
  asm("DSB");
#endif
// MY FUNCTION IS HERE
}

I think your function code should be executed only on (PXP_STAT & PXP_STAT_IRQ) != 0. Otherwise it will be executed for all the
other cases. I suggest you increment some global counters for all the cases so that you can see which other interrupt cases are occurring.
 
Ive moved the callback execution to (PXP_STAT & PXP_STAT_IRQ) != 0 and that seems to fix the multiple calls.
But, the rotation is still not 100% and it looks the same as before.
I can confirm the PXP is not rotating the image properly as if I push the source buffer content to thr display from RAM1/DMAMEM/EXTMEM it works just fine.
But as soon as I try to rotate it it gets all messed up.
 
So after much testing I decided to pass through the image without rotating to see what comes out.
The image is passed though fine, but with noise in the first 20-30 lines. Also, trying to rotate does nothing, as I conclude.
I tried 90,180 and 270 degree rotations and the image always comes out at 0 degrees rotation.

So I can imagine that either my PXP config is bad, or the library has some sort of bug.
 
So after much testing I decided to pass through the image without rotating to see what comes out.
The image is passed though fine, but with noise in the first 20-30 lines. Also, trying to rotate does nothing, as I conclude.
I tried 90,180 and 270 degree rotations and the image always comes out at 0 degrees rotation.

So I can imagine that either my PXP config is bad, or the library has some sort of bug.

There's a note in the illustration on page 1923 of the reference manual in the section about PXP rotation:

Note that fetches from the source buffer are from non-incrementing addresses, and are thus very inefficient from a bus memory controller standpoint.

I believe that the PXP fetches and stores using direct memory access and will not benefit from caches for EXTMEM or DMAMEM. That would explain the much longer rotation time when the source for a rotation is in either DMAMEM or EXTMEM.

I spent a lot of hours yesterday developing a minimalist PXP rotation test. I started with a buffer that I filled with a recognizable pattern of stripes and corner markers. I set up the PXP to use 8x8 pixel blocks so I could get the smallest buffers that would require moving and rotating multiple blocks (16x16) and (16x8).

With a 16x16 buffer, rotation from DTCM to DTCM works properly.
If I use either DMAMEM or EXTMEM for the source, the rotation fails.

Rotating a 16x8 buffer, rotation fails with all memory types. I am currently of the opinion that this is the result of an error in my setup of the PXP. One clue is that I get a stripe of background pixels between the two 8x8 pixel blocks.


I will reduce the test code to a minimal example showing the problem with library-independent setup of the PXP and post the code later today.
 
So I actually used some of your sample code in the beginning of the post and got the rotation to work!
Source buffer is in RAM1 and destination is in RAM2.
Takes about 3ms to rotate 90/270 degrees and 13ms to transfer the data to the display.
With EXTMEM it takes WAYYYY longer (50ms for transfers)

I can only sacrifice RAM2 for buffers, so now I wonder if I can have one full frame screen buffer as the source, and have two small ones as a dual buffer setup where while one fills with rotated data, the other transmits to the display..
I know the PXP can do this with the eLCDIF but not sure it can be done with FlexIO
 
I’m curious, since you are using FlexIO, if it wouldn’t be beneficial to make use of its buffers rather than maintaining 2 complete buffers for the screen. Since each FlexIO has 8 32 bit shift registers it can buffer 16 pixels at a time if setup correctly. So theoretically you could just buffer/rotate 16 pixels at a time concurrently with the FlexIO transfer and not really eat up any time during the whole LCD transfer.
 
Attached is my sample rotation app that works well so far

FlexIO actually has 4 buffers, so that would be 8 pixels per shift command until the buffers are empty. (Data sheet says 8, but many people here have had no luck using more than 4)
Now, my DMA setup will do everything in one shot - eg DMA will fill the FlexIO buffers 8 pixels at a time until the entire length of the source buffer has been sent thought.
If I was to trigger a DMA setup for every 8 pixels it would cause a lot of latency and lengthen the overall transfer.

My idea is actually to use one big buffer that is filled by LVGL and then use two smaller differential buffers (idea is from vindar's GFX library) that are about 480*16 large and alternate between them - one sends data to the display, the other get rotated data fed from the PXP, then swap them around

Here is a description in the data sheet:
pxp_elcd_handshake1.jpg
pxp_elcd_handshake2.jpg

View attachment pxp_8bit_lcd.ino
 
I wasn’t aware the other 4 buffers were broken, good to know in the future though. Sounds like you’re more familiar with it than I am so I can only help so much.

However I don’t think the PXP rotation is going to give you the result you want. You're trying to match the scan line direction of your screen from what I remember so you want the rotation on screen to stay the same but the buffer needs to be written in a different order. So the differential buffer is likely still the approach you would want, but you’ll probably have to use a for loop to fill it instead of PXP doing automatically.

With this approach you would probably want 2 DMA setting objects that trigger each other on completion with separate ISRs on completion so each one can fill the buffer of the other while ones running at the same time. I don’t have any hardware setup at the moment to test this out for myself, but I believe that would give you what you’re looking for with minimal impact.
 
So I actually used some of your sample code in the beginning of the post and got the rotation to work!
Source buffer is in RAM1 and destination is in RAM2.
Takes about 3ms to rotate 90/270 degrees and 13ms to transfer the data to the display.
With EXTMEM it takes WAYYYY longer (50ms for transfers)

I can only sacrifice RAM2 for buffers, so now I wonder if I can have one full frame screen buffer as the source, and have two small ones as a dual buffer setup where while one fills with rotated data, the other transmits to the display..
I know the PXP can do this with the eLCDIF but not sure it can be done with FlexIO

I cleaned up my test code and ran some tests:

DTCM(RAM1) to DTCM, DMAMEM or EXTMEM rotates properly for square buffers.

DTCM source is showing issues with rectangular buffers for all destinations. I suspect this is a PXP setup issue and I hope to find a solution in Rezo's code.

Here is the source for my test sketch. In my sketch, I had all the basic PXP functions in a separate .ino file in the sketch folder, purely for editing convenience. In the code block below, the PXP functions are concatenated at the end of the test code. This is a minimalist test, but has the advantages of using small buffers that produce output that can be viewed in the Serial monitor. No hardware other than the T4.1 is required.

Code:
/*****************************************************************
// Demonstration program to show results of PXP rotations on small 
// buffers in two configurations:  16x8 pixels and 16x16 pixels.
// The internal pixel block size is set to 8x8, so each of these
// sizes requires the rotation and movement of multiple blocks.
// The buffers are of type Uint16_t and occupy the same space
// as RGB565 and other two-byte pixels.  The buffers are easily
// small enough to fit in DTCM with the program code.
***********************************************************************/



#define WIDTH 16
//#define HEIGHT 16   // use this one for square buffer
#define HEIGHT 8  // uncomment for rectangular buffer

#define BUFFSIZE (WIDTH * HEIGHT)  // 2 bytes per pixel for RGB565 so use uint16_t

// To select the source and destination memory locations, uncomment ONE
// set of buffer definitions for source and destination, recompile and upload.
// Yes, this could all be done with a set of #ifdefs, but I turn a cold shoulder to
// #ifdef blizzards, and even one little instance irritates me.   Either way, you
// have to edit something somewhere and recompile.

uint16_t srcBuff[2 * BUFFSIZE] __attribute__((aligned(64)));  // DTCM by default;
//uint16_t srcBuff[2*BUFFSIZE]__attribute__ ((aligned (64))) DMAMEM;
//uint16_t srcBuff[2*BUFFSIZE]__attribute__ ((aligned (64))) EXTMEM;

uint16_t dstBuff[2 * BUFFSIZE] __attribute__((aligned(64)));  // DTCM by default;
//uint16_t dstBuff[2*BUFFSIZE]__attribute__ ((aligned (64))) DMAMEM;
//uint16_t dstBuff[2*BUFFSIZE]__attribute__ ((aligned (64))) EXTMEM;


// We build the origin buffer here and memcpy it to srcBuff
uint16_t orgBuff[BUFFSIZE] EXTMEM;

const char compileTime[] = " Compiled on " __DATE__ " " __TIME__;

enum tRotval { ROT0,
               ROT90,
               ROT180,
               ROT270 };
Stream *pxpstrm = &Serial;  // used to send output from PXP functions
void setup() {
  while (!Serial && millis() < 3000) {}

  Serial.begin(9600);

  delay(1000);
  Serial.println("\n\n\n");
  Serial.printf("\nPXP Rotation test 2 %s\n", compileTime);

  memset(orgBuff, 0, sizeof(orgBuff));  // buffers may be in memory areas
  memset(srcBuff, 0, sizeof(srcBuff));  // that are not automatically cleared at
  memset(dstBuff, 0, sizeof(dstBuff));  // startup, so do that here
  FillBuffer(orgBuff, WIDTH, HEIGHT);   // put our unique pattern in origin buffer
  Serial.println("origin Bufffer filled");
  memcpy(srcBuff, orgBuff, sizeof(orgBuff));
  Serial.println("origin buffer copied to srcBuff");
  Serial.println("srcBuff corners: ");
  ShowCorners(srcBuff, WIDTH, HEIGHT);
  ShowBuffer(srcBuff, WIDTH, HEIGHT);
  delay(100);
  PXPInit();
}

#define LSTART (y * wd)
void FillBuffer(uint16_t buff[], uint32_t wd, uint32_t ht) {
  uint32_t x, y;
  for (x = 0; x < wd; x++) {
    for (y = 0; y < ht; y++) {
      buff[LSTART + x] = y;
    }
  }
  buff[0] = 0x1111;
  buff[wd - 1] = 0x2222;
  buff[(ht - 1) * wd] = 0x3333;
  buff[ht * wd - 1] = 0x4444;
  Serial.println();
}

void ShowBuffer(uint16_t buff[], uint32_t wd, uint32_t ht) {
  uint32_t x, y;
  y = 0;
  for (y = 0; y < ht; y++) {
    for (x = 0; x < wd; x++) {                              // these values only work as long as wd and ht < 256
      if ((x % wd) == 0) Serial.printf("\n%5d: ", LSTART);  // one horizontal row per output line
      Serial.printf("%04X ", buff[LSTART + x]);
    }
  }
  Serial.println();
}

uint32_t BufferDifferences(uint16_t *ptr1, uint16_t *ptr2, uint32_t numpixels) {
  uint32_t i, errCount;
  errCount = 0;
  for (i = 0; i < numpixels; i++) {
    if (*ptr1++ != *ptr2++) errCount++;
  }
  return errCount;
}

void ShowCorners(uint16_t buff[], uint32_t wd, uint32_t ht) {
  Serial.printf("Upper Left  :%04X  ", buff[0]);
  Serial.printf("Upper Right :%04X  ", buff[wd - 1]);
  Serial.printf("Bottom  Left:%04X  ", buff[wd * (ht - 1)]);
  Serial.printf("Bottom Right:%04X  ", buff[wd * ht - 1]);
  Serial.println();
}

// Rotate the src buffer by 90 degrees clockwise into dst buffer
void Rot90(uint16_t *psrc, uint16_t *pdst, uint16_t *pwd, uint16_t *pht) {
  uint32_t newbuff;
  uint16_t wd, ht, newwd, newht;
  wd = *pwd;
  ht = *pht;
  uint32_t bsize = wd * ht * 2;  // two bytes per pixel
  PXPSetPS(psrc, wd, ht, PXP_RGB565);
  PXPSetOutput(pdst, ht, wd, PXP_RGB565);  // swap output ht and width for rotation
  if ((uint32_t)PXP_PS_BUF > 0x2020000) {  // makes camera dma data visible
    arm_dcache_delete((void *)PXP_PS_BUF, bsize);
  }
  if ((uint32_t)PXP_OUT_BUF > 0x2020000) {  // needed when doing DMA into memory
    arm_dcache_delete((void *)PXP_OUT_BUF, bsize);
  }
  PXPRotate(ROT90);  // waits until done
  delay(100);
  //Serial.printf("Output buffer at %p\n", PXP_OUT_BUF);
  if ((uint32_t)PXP_OUT_BUF > 0x2020000) {
    arm_dcache_flush((void *)PXP_OUT_BUF, bsize);  // needed when doing DMA out of memory
  }
  delay(100);
  Serial.print("\n   90-degree rotation done.");
  PXPGetOutput(&newbuff, &newwd, &newht);
  Serial.printf("  new width:  %u   new height: %u\n", newwd, newht);
  *pwd = newwd;
  *pht = newht;  // returns new width to input variables
  Serial.println("Destination Buffer Corners: ");
  ShowCorners(pdst, newwd, newht);
  ShowBuffer(pdst, newwd, newht);
}

void RunTest(uint16_t numrotations) {
  uint32_t errCount;
  uint16_t newwd, newht;
  uint32_t newbuff;
  memcpy(srcBuff, orgBuff, sizeof(orgBuff));  // put fresh data in srcBuff
  Serial.println("\n\nTesting rotation");
  newwd = WIDTH;
  newht = HEIGHT;

  delay(100);
  // first rotate 90 degrees
  Rot90(srcBuff, dstBuff, &newwd, &newht);
  delay(100);
  if (numrotations == 4) {
    // Now rotate another 90 degrees 3 times
    Rot90(dstBuff, srcBuff, &newwd, &newht);
    delay(100);
    Rot90(srcBuff, dstBuff, &newwd, &newht);
    delay(100);
    Rot90(dstBuff, srcBuff, &newwd, &newht);  // Back to source buffer at end
    delay(100);

    Serial.println("360-degree rotation done");
    PXPGetOutput(&newbuff, &newwd, &newht);
    Serial.printf("new width:  %u   new height: %u\n", newwd, newht);
    Serial.println("Result Buffer Corners: ");
    ShowCorners(srcBuff, newwd, newht);
    delay(100);
    // srcBuff should now be the same as orgBuff,  Let's verify
    errCount = BufferDifferences(srcBuff, orgBuff, BUFFSIZE);
    Serial.printf("Differences between final bufffer and original buffer: %d\n\n", errCount);
  }
  Serial.println();
}

void loop() {
  char ch;
  if (Serial.available()) {
    ch = Serial.read();
    if (ch == '1') RunTest(1);
    if (ch == '4') RunTest(4);
    if (ch == 's') PXPShow();
  }
}

/**********************************************************
PXP  Direct Control functions to avoid using library
I'm assuming that TD 1.57 has up-to-date PXP Register Definitions
***************************************/
// table of bytes per pixel for various output formats
// used for automatic setting of pitch at setup
const uint16_t bytesperpixel[32] = {
	4,4,0,0,4,3,0,0,
	2, 2,2,2,2,2,2,0,
	4,0,2,2,1,1,0,0,
	2,2,2,2,0,0,2,2
};

#define SetCorner( reg, h,v)  reg = ((h<<16) + v)

  // Set the default output pointer to USB Serial


// simplified initialization--now you have to set up ps, as, and output surfaces separately
void PXPInit(void){
  // turn on the PXP Clock
  CCM_CCGR2 |= CCM_CCGR2_PXP(CCM_CCGR_ON);

  PXP_CTRL_SET = PXP_CTRL_SFTRST; //Reset the PXP
  PXP_CTRL_CLR = PXP_CTRL_SFTRST | PXP_CTRL_CLKGATE; //Clear reset and gate
  delay(10);
  PXP_CTRL_CLR =  PXP_CTRL_ROT_POS;  // make sure rotate is done into output buffer
  PXP_CTRL_CLR =  PXP_CTRL_BLOCK_SIZE; // set block size to 8
  PXP_CTRL_CLR = PXP_CTRL_ROTATE(0x03);   // Set Rotation  to zero 
  PXP_CSC1_COEF0 |= PXP_COEF0_BYPASS;   
  PXP_CTRL_SET = PXP_CTRL_IRQ_ENABLE;
  // we don't actually use the interrupt but need to enable the bits
  // in the PXP_STAT register
}

uint16_t BytesPerPixel(uint16_t mpt){
	return bytesperpixel[mpt];
}


// Map type is PXB_RGB565 for example program
void PXPSetPS(	void *psbuff,uint16_t inh, uint16_t inv,  uint16_t maptype){

	PXP_PS_CTRL_CLR = 0x1F;
	PXP_PS_CTRL_SET = maptype;  // PS buffer format specification
	PXP_PS_BUF = (volatile void *)psbuff;
	PXP_PS_UBUF = 0;  // not using YUV planes
	PXP_PS_VBUF = 0;  // not using YUV planes+S
	PXP_PS_BACKGROUND_0 = 0xFFFF;  // white
	PXP_OUT_PS_ULC = 0;  // start processing at upper left 0,0
 	SetCorner(PXP_OUT_PS_LRC, (inh-1), (inv-1)); // end processing at h-1 and v-1

	PXP_PS_PITCH = inh*bytesperpixel[maptype]; // input is width * format bytes per pixel
	PXP_PS_SCALE = 0x10001000; // 1:1 scaling (0x1.000)
	PXP_PS_CLRKEYLOW_0 = 0xFFFFFF;  // this disables color keying
	PXP_PS_CLRKEYHIGH_0 = 0x0;  //  this disables color keying		
}

void PXPSetOutput(	void *outbuff, uint16_t outh, uint16_t outv,  uint16_t maptype){
	PXP_OUT_CTRL_CLR = 0x0080031F;  // clear format, interlaced output and alpha output bits
	PXP_OUT_CTRL_SET = maptype;  // specify  output color format in bits 0:4
	PXP_OUT_BUF = (volatile void *)outbuff;
	PXP_OUT_PITCH = outh * bytesperpixel[maptype]; // assume Same as width * bytesperpixel
	//PXP_OUT_LRC = ((outh-1) << 16 | (outv-1));	
  SetCorner(PXP_OUT_LRC, (outh-1) ,(outv-1));
}

// Return buffer address, width and height to variables pointed to by parameters
void PXPGetOutput(	uint32_t* outbuffptr, uint16_t *pouth, uint16_t *poutv){
	*outbuffptr = (uint32_t)PXP_OUT_BUF;
	*pouth = ((PXP_OUT_LRC >> 16) & 0x03FFF) +1;
	*poutv = (PXP_OUT_LRC & 0x03FFF) +1;		
}

void PXPSetStream(Stream *psptr){
  pxpstrm = psptr;
}

// This  function prints a nicely formatted output of the PXP register settings
// The formatting does require using a monospaced font, like Courier
void PXPShow(void) {
  pxpstrm->printf("CTRL:         %08X       STAT:         %08X\n", PXP_CTRL, PXP_STAT);
  pxpstrm->printf("OUT_CTRL:     %08X       OUT_BUF:      %08X    \nOUT_BUF2:     %08X\n", PXP_OUT_CTRL,PXP_OUT_BUF,PXP_OUT_BUF2);
  pxpstrm->printf("OUT_PITCH:    %8lu       OUT_LRC:       %3u,%3u\n", PXP_OUT_PITCH, PXP_OUT_LRC>>16, PXP_OUT_LRC&0xFFFF);

  pxpstrm->printf("OUT_PS_ULC:    %3u,%3u       OUT_PS_LRC:    %3u,%3u\n", PXP_OUT_PS_ULC>>16, PXP_OUT_PS_ULC&0xFFFF,
                                                               PXP_OUT_PS_LRC>>16, PXP_OUT_PS_LRC&0xFFFF);
  pxpstrm->printf("OUT_AS_ULC:    %3u,%3u       OUT_AS_LRC:    %3u,%3u\n", PXP_OUT_AS_ULC>>16, PXP_OUT_AS_ULC&0xFFFF,
                                                               PXP_OUT_AS_LRC>>16, PXP_OUT_AS_LRC&0xFFFF);
  pxpstrm->println();  // section separator
  pxpstrm->printf("PS_CTRL:      %08X       PS_BUF:       %08X\n", PXP_PS_CTRL,PXP_PS_BUF);
  pxpstrm->printf("PS_UBUF:      %08X       PS_VBUF:      %08X\n", PXP_PS_UBUF, PXP_PS_VBUF);
  pxpstrm->printf("PS_PITCH:     %8lu       PS_BKGND:     %08X\n", PXP_PS_PITCH, PXP_PS_BACKGROUND_0);
  pxpstrm->printf("PS_SCALE:     %08X       PS_OFFSET:    %08X\n", PXP_PS_SCALE,PXP_PS_OFFSET);
  pxpstrm->printf("PS_CLRKEYLOW: %08X       PS_CLRKEYLHI: %08X\n", PXP_PS_CLRKEYLOW_0,PXP_PS_CLRKEYHIGH_0);
  pxpstrm->println();
  pxpstrm->printf("AS_CTRL:      %08X       AS_BUF:       %08X    AS_PITCH: %6u\n", PXP_AS_CTRL,PXP_AS_BUF, PXP_AS_PITCH & 0xFFFF);
  pxpstrm->printf("AS_CLRKEYLOW: %08X       AS_CLRKEYLHI: %08X\n", PXP_AS_CLRKEYLOW_0,PXP_AS_CLRKEYHIGH_0);
  pxpstrm->println();
  pxpstrm->printf("CSC1_COEF0:   %08X       CSC1_COEF1:   %08X    \nCSC1_COEF2:   %08X\n", 
                                                                PXP_CSC1_COEF0,PXP_CSC1_COEF1,PXP_CSC1_COEF2);
  pxpstrm->println();  // section separator
  pxpstrm->printf("POWER:        %08X       NEXT:         %08X\n", PXP_POWER,PXP_NEXT);
  pxpstrm->printf("PORTER_DUFF:  %08X\n", PXP_PORTER_DUFF_CTRL);
}


// The next three functions could all be macros
bool PXPDone(void) {
  return PXP_STAT & PXP_STAT_IRQ;
}

void PXPStart(void){
  PXP_STAT_CLR = PXP_STAT;  // clears all flags
  PXP_CTRL_SET =  PXP_CTRL_ENABLE;  // start the PXP	
}

void PXPStop(void){
	PXP_CTRL_CLR =  PXP_CTRL_ENABLE;  // stop the PXP	
}

// Rotate output by 0,90,180,270 degrees
void PXPRotate(tRotval rot) {
  PXP_CTRL_CLR = PXP_CTRL_ROTATE(3);  // clear previous settings
  PXP_CTRL_SET = PXP_CTRL_ROTATE(rot); // set requested rotation

  PXPStart();
  // wait until rotation finished
  while (!PXPDone()) {};
  PXPStop();  // stop the PXP
  //Serial.printf("PXP Status = %04X\n", PXP_STAT &0xFF);  
}


Here is the output from a 90 degree rotation of an 8 x 16 pixel buffer:
Code:
PXP Rotation test 2  Compiled on Jan 26 2023 12:28:40
origin Bufffer filled
origin buffer copied to srcBuff
srcBuff corners: 
Upper Left  :1111  Upper Right :2222  Bottom  Left:3333  Bottom Right:4444  

    0: 1111 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 2222 
   16: 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 
   32: 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 0002 
   48: 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 0003 
   64: 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 0004 
   80: 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 0005 
   96: 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 0006 
  112: 3333 0007 0007 0007 0007 0007 0007 0007 0007 0007 0007 0007 0007 0007 0007 4444 


Testing rotation

   90-degree rotation done.  new width:  8   new height: 16
Destination Buffer Corners: 
Upper Left  :07FF  Upper Right :07FF  Bottom  Left:0000  Bottom Right:0000  

    0: 07FF 07FF 07FF 07FF 07FF 07FF 07FF 07FF <-----Where did these come from!
    8: 3333 0006 0005 0004 0003 0002 0001 1111 
   16: 0007 0006 0005 0004 0003 0002 0001 0000 
   24: 0007 0006 0005 0004 0003 0002 0001 0000 
   32: 0007 0006 0005 0004 0003 0002 0001 0000 <------This 8x8 looks like
   40: 0007 0006 0005 0004 0003 0002 0001 0000        it was properly rotated
   48: 0007 0006 0005 0004 0003 0002 0001 0000 
   56: 0007 0006 0005 0004 0003 0002 0001 0000 
   64: 0007 0006 0005 0004 0003 0002 0001 0000 
   72: 0000 0000 0000 0000 0000 0000 0000 0000 <------Why are these zeros?
   80: 0000 0000 0000 0000 0000 0000 0000 0000 
   88: 0000 0000 0000 0000 0000 0000 0000 0000 
   96: 0000 0000 0000 0000 0000 0000 0000 0000 
  104: 0000 0000 0000 0000 0000 0000 0000 0000 
  112: 0000 0000 0000 0000 0000 0000 0000 0000 
  120: 0000 0000 0000 0000 0000 0000 0000 0000 

PXP Registers---from 's' command
CTRL:         00000102       STAT:         01010001
OUT_CTRL:     0000000E       OUT_BUF:      20001AC0    
OUT_BUF2:     00000000
OUT_PITCH:          16       OUT_LRC:         7, 15
OUT_PS_ULC:      0,  0       OUT_PS_LRC:     15,  7
OUT_AS_ULC:      0,  0       OUT_AS_LRC:      0,  0
PS_CTRL:      0000000E       PS_BUF:       20001CC0
PS_UBUF:      00000000       PS_VBUF:      00000000
PS_PITCH:           32       PS_BKGND:     0000FFFF
PS_SCALE:     10001000       PS_OFFSET:    00000000
PS_CLRKEYLOW: 00FFFFFF       PS_CLRKEYLHI: 00000000
AS_CTRL:      00000000       AS_BUF:       00000000    AS_PITCH:      0
AS_CLRKEYLOW: 00FFFFFF       AS_CLRKEYLHI: 00000000
CSC1_COEF0:   44000000       CSC1_COEF1:   01230208    
CSC1_COEF2:   079B076C
POWER:        00000000       NEXT:         00000000
 
I wasn’t aware the other 4 buffers were broken, good to know in the future though. Sounds like you’re more familiar with it than I am so I can only help so much.

However I don’t think the PXP rotation is going to give you the result you want. You're trying to match the scan line direction of your screen from what I remember so you want the rotation on screen to stay the same but the buffer needs to be written in a different order. So the differential buffer is likely still the approach you would want, but you’ll probably have to use a for loop to fill it instead of PXP doing automatically.

With this approach you would probably want 2 DMA setting objects that trigger each other on completion with separate ISRs on completion so each one can fill the buffer of the other while ones running at the same time. I don’t have any hardware setup at the moment to test this out for myself, but I believe that would give you what you’re looking for with minimal impact.

I tried this approach using the for loop, and while for some fairly light graphics it works, when lvgl animations are triggered, this is where you can really see the for loops hanging the CPU. IF dma could do the transfer/rotations of data, then that would offload the conversion from the CPU just like the PXP would.

I cleaned up my test code and ran some tests:

DTCM(RAM1) to DTCM, DMAMEM or EXTMEM rotates properly for square buffers.

DTCM source is showing issues with rectangular buffers for all destinations. I suspect this is a PXP setup issue and I hope to find a solution in Rezo's code.

Here is the source for my test sketch. In my sketch, I had all the basic PXP functions in a separate .ino file in the sketch folder, purely for editing convenience. In the code block below, the PXP functions are concatenated at the end of the test code. This is a minimalist test, but has the advantages of using small buffers that produce output that can be viewed in the Serial monitor. No hardware other than the T4.1 is required.
I used your code example in post #10 and the rectangular rotation works fine! Not sure why it's not working for you now on all memory regions!
 
I've figured out the problem I was having with the rotations. I was not properly setting the PXP_OUT_PITCH register before doing the rotation. That register sets the number of bytes between two vertically adjacent pixels in the output buffer.

For 90 and 270-degree rotations, the output pitch is the input height (in pixels) times the bytes per pixel.
For 0 and 180-degree rotations, the output pitch is the same as the input pitch (input width x bytes per pixel).

When things are set up right, it doesn't matter if width is greater or less than height for the input image. It also doesn't matter if the width or height is not an even multiple of the PXP rotation block size.

It does still seem to matter if the source image is located other than in DTCM. I still get some errors when rotating from DMAMEM or EXTMEM into any other memory area. When the source image is in DTCM, the rotated images seem fine.

I'm going to fix the rotation function in the PXP library and clean up my demo program. I'll post the results sometime this weekend.
 
It does still seem to matter if the source image is located other than in DTCM. I still get some errors when rotating from DMAMEM or EXTMEM into any other memory area. When the source image is in DTCM, the rotated images seem fine.

This issue was due to the fact that I was not calling arm_dcache_flush_delete((void *)PXP_PS_BUF, bsize); BEFORE the rotation. I was writing the pixel pattern into the source image in a foreground function, and some of the data stayed in the cache, not in the actual memory read by the PXP. Now it seems that the only difference between the memory areas for source and destination is the time taken for the PXP to do the rotation. I'm now caught up to where Rezo was after post #38.

Incidentally, the IMXRT1060 Reference manual says in section 36.3.1.28 that it is possible to do in-place processing of the process buffer by setting the PXP output buffer to be the same memory area as the PXP process buffer. I can see how this would work for merging the alpha surface, or doing color-space conversions for formats with the same pixel byte widths. I can't see how it would work for scaling operations or rotations. I tried it for rotation and it didn't work for me. But then, a lot of things that didn't work for me in the last few days work now. I ascribe my issues with the PXP to having forgotten many of the things I learned when first starting with the PXP about two years ago. My next revision of the PXP library will have better descriptions and comments.
 
Incidentally, the IMXRT1060 Reference manual says in section 36.3.1.28 that it is possible to do in-place processing of the process buffer by setting the PXP output buffer to be the same memory area as the PXP process buffer. I can see how this would work for merging the alpha surface, or doing color-space conversions for formats with the same pixel byte widths. I can't see how it would work for scaling operations or rotations. I tried it for rotation and it didn't work for me. But then, a lot of things that didn't work for me in the last few days work now. I ascribe my issues with the PXP to having forgotten many of the things I learned when first starting with the PXP about two years ago. My next revision of the PXP library will have better descriptions and comments.

Yep, unfortunately you can't do rotations from and into the same buffer, but also, the PS/AS buffer can't be bigger than the output buffer (page 1907):
Figure 36-2 represents a sample output buffer configuration with both an AS and PS
included. The alignment of each AS and PS within the output buffer can be at any
arbitrary pixel locations. For example, the PS has an upper left coordinate (ULC) of (2,2)
and a lower right coordinate (LRC) at pixel (13,13). The maximum value for the ULC
and LRC for each of the AS and PS is bounded by the LRC of the output buffer, (15,15)
for this example.

So my whole idea of rotating portions of the PS buffer into two smaller output buffers wont work :(
This method can only be done if using the eLCDIF in bus master mode, and there is no way to latch onto that and read out the values of the LCDIF output buffers.

Luckily, my custom PCB has all these LCD pins exposed, so I WILL give this a try on the RGB interface (good luck to me!)
 
I felt like I was missing something, so I have done more reading in the reference manual and modified my sketch to work as follows:
PS buffer is 480*320
OUT buffer is 320*48

IF I offset the PS buffer source, I can move through the source image, basically loop/double buffer and write sections of rotated frames to my display without the need of two screen sized frame buffers:

36.3.1.26 Clipping source images
A subset of the PS buffer can be used in rendering the output buffer. The PXP_PS_BUF
register can indicate an offset into the PS buffer that will be used for display within the
OUTPUT buffer.
The pixel at the address defined in the PXP_PS_BUF register will be the pixel that is
displayed at the pixel coordinate indicated by PXP_OUT_PS_ULC within the output
buffer. Essentially, the PXP_PS_BUF register can be used to establish an offset into the
PS buffer thus clipping all PS buffer pixels that are at a lower address. The
PXP_PS_PITCH will always indicate the number of bytes that are vertically adjacent in
the PS buffer. The settings in the PXP_PS_BUF, PXP_OUT_PS_ULC, and
PXP_OUT_PS_LRC will determine the subset of the PS buffer, or clipped PS source
buffer, that will be used in the output buffer.

I have tested this by manually setting the offset on the PS buffer and it works!
Now I need to build the entire logic of switching between two smaller output buffers, incrementing the PS buffer address and looping through this entire full frame rotation-iteration.
 
I felt like I was missing something, so I have done more reading in the reference manual and modified my sketch to work as follows:
PS buffer is 480*320
OUT buffer is 320*48

IF I offset the PS buffer source, I can move through the source image, basically loop/double buffer and write sections of rotated frames to my display without the need of two screen sized frame buffers:



I have tested this by manually setting the offset on the PS buffer and it works!
Now I need to build the entire logic of switching between two smaller output buffers, incrementing the PS buffer address and looping through this entire full frame rotation-iteration.


I think you're on track. I did a lot of experimenting with buffer offsets and PS buffer definitions when I did my slide show application. I was doing image scaling for zoom effects, offsets to move the slide images, and alpha buffers for fade-in and fade-out. I never needed rotation, but it seems sensible that the same techniques can be used with piecewise rotation.

You may also see an overall increase in performance if the PXP segment rotations can run at the same time as the output to the display, both using DMA in the background. If that works, you may be able to do a screen update in about 110 to 120% of the time for just the rotation (or the screen update---whichever takes longer). However, you may find that you have issues if both the PXP and output functions are competing for access to the same memory area. I see this issue when the CSI camera interface and PXP are competing for access to EXTMEM. However, after my recent experiences with ARM_DCACHE_FLUSH_DELETE before and after rotation, I may have to reexamine my code to make sure I'm doing cache management properly.

Have you considered the option of doing your LVGL graphics drawing into a 240 x 160 buffer, then scaling it up at the same time you do the rotation? The smaller origin image will affect the output quality, but if the images are changing rapidly, the loss of resolution may not be noticeable. The 1/4 size origin buffers might allow you to have multiple source buffers in DTCM. I'm also not sure if rotation and scaling can occur at the same time in the PXP. It might take multiple consecutive operations with the PXP. The first could be a rotation into a 240 x 160 buffer in DTCM, for speed. The subsequent operation(s) would scale up the image to the output dimensions. This could be done in piecewise or single operations to overlap the scaling and output.
 
So I plan to use one 480*320 16 bit frame buffer that LVGL writes to and PXP reads from.
Then I will have two smaller 48*320 16 bit output buffers that I will switch between so that I can rotate one while the other transfers data to the display over DMA.
All three buffers will be in DMAMEM
I think this together will yield good performance.

LVGL writes Left to right, top to bottom, just like the displays update, so I can't rotate a partial source buffer for the 90 degree rotation.
 
Making some progress with my test sketch but have very odd behavior.
For some reason one of the transfer steps happens twice (I have no idea why) and portions of the data are incorrect in some cases.

Attached is my test sketch, and my Serial print log below and an image of the output:
ILI9486 Initialized
Initializing PXP
offset: 0, lcd_x 0 lcd_x1: 48 cnt: 0
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 48, lcd_x 48 lcd_x1: 96 cnt: 1
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 96, lcd_x 96 lcd_x1: 144 cnt: 2
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 144, lcd_x 144 lcd_x1: 192 cnt: 3
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 192, lcd_x 192 lcd_x1: 240 cnt: 4
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 240, lcd_x 240 lcd_x1: 288 cnt: 5
offset: 240, lcd_x 240 lcd_x1: 288 cnt: 5
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 288, lcd_x 288 lcd_x1: 336 cnt: 6
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 336, lcd_x 336 lcd_x1: 384 cnt: 7
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 384, lcd_x 384 lcd_x1: 432 cnt: 8
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
offset: 432, lcd_x 432 lcd_x1: 480 cnt: 9
PXP Rotation took 174 microseconds
Output buffer at 0x20200000
Display DMA transfer took 8 microseconds
Display DMA transfer took 1515 microseconds

IMG_9394.jpg

View attachment pxp_lcd_diff_buff.ino

Would highly appreciate it if someone could have a gawk at it and see what I am doing wrong
 
Something odd that I can't figure out:
If I move the following code from PXP_Init() into it's own function (lets call it PXP_Start):
Code:
void PXP_Start(){
  CCM_CCGR2 |= CCM_CCGR2_PXP(CCM_CCGR_ON);
  PXP_CTRL_SET = PXP_CTRL_SFTRST; //Reset the PXP
  PXP_CTRL_CLR = PXP_CTRL_SFTRST | PXP_CTRL_CLKGATE; //Clear reset and gate
  delay(10);
}
and call that in setup() before calling PXP_Init(), nothing works, as if the PXP is not doing anything.

I can't seem to figure out why the hell is would behave like this.. any insights?
 
Did you remove the lines in PXP_Start from PXP_Init? I tried the same thing, but renamed the new function PXP_XStart(), since I already had PXPStart. Every thing seemed to work as expected. Even if I removed the lines in PXP_XStart, from PXP_Init, everything seemed to work as expected.
 
Yeah everything that’s in pxp_start was removed from pxp_init, but it only works if everything is in pxp_init.
Super odd but I’ll keep playing with it.
 
Yeah everything that’s in pxp_start was removed from pxp_init, but it only works if everything is in pxp_init.
Super odd but I’ll keep playing with it.

I found your code in pop_lcd_diff_buff.ino a bit confusing. It seems that you call PXP_Init every time through the loop. I'm not sure it is a good idea to reset the PXP and set the buffers each time through the loop.

I think PXP_Init should be called once in setup(). After that, you can set the process surface and output bounds each time through loop(). There's a lot of stuff in init() that doesn't need to be done every time through the loop.

Part of my confusion may be because my updated library has pxp.Start() function to start the conversion when running the pxp in the background.
 
Okay, have revised the code, still have one screen sized source buffer and two 320*96px destination buffers, but using just one for the test.
Overall - it works! Successfully rotated the image 270 degrees and all in all with the DMA transfers takes roughly 15ms. whereas the DMA transfer without the PXP takes roughly 12-13ms
IMG_9496.jpg

My only issue, I have some noise on the top of the image (1st "scan" line):
IMG_9497.jpg
Not sure what the root cause is or how to fix it..


View attachment pxp_lcd_diff_buff_2.ino
 
Back
Top