T4 Pixel Pipeline Library

Quick follow up: @mjs513 and myself are making some progress on some of the above :D

For now, I turned off the output rotation, to make it easier to concentrate on the other stuff.

We fixed the issues with it wiping out memory. I am running a variant of the function mentioned earlier that has
a lot of input parameters. I am also bypassing some of the functions in the library.

The key to not screwing up memory is to make sure the PITCH members are correct as well as the OUT_PS_LRC.

Some answers to the above:

a) output image one or both directions is smaller than the screen. You set the: OUT_PS_ULC and OUT_PS_LRC,
to the offsets to where the image should start and image should end within your output buffer, like:
Code:
OUT_PS_ULC:     40, 10
OUT_PS_LRC:    439,309
When the actual size of the output is in:
Code:
OUT_LRC:       479,319

b) and c) - Input image clipping - You setup the PS_BUF: 70000020 to the
memory address of the first pixel you wish to output: That is if you wish for your image to start at ROW: 50 COL: 100,
you would setup the buffer to something like buffer + (50 * CAMERA_WIDTH + 100)
You need to make sure that the pitch is set properly such that it will increment the correct number of bytes per row.
It is also up to your code to make sure that your offsets are correct in that you don't extend outside the bounds of your actual image.

With the above understanding, the sketch is now setup to allow me to type in a scale factor and it now centers and/or clips the images.
Which is working better. Started off doing this with VGA input using HM0360 as being monochrome, the VGA image will fit into normal memory.

I am now playing again with the OV5640 and trying VGA/SVGA, which requires PSRAM on T4.1. Having some luck, but there are issues
with the PSRAM, probably not keeping up. It is doing better now that I updated it to run at 120mhz:
Changed in startup.c (configure_external_ram):
Code:
CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
        | CCM_CBCMR_FLEXSPI2_PODF(5) | CCM_CBCMR_FLEXSPI2_CLK_SEL(1); // 720/5 = 120 MHz

I tried the 133Mhz version, but it failed to work on the current board.

Next up: A little more trying to get the PSRAM sizes to work.
Then to add some support into the sketch that when zoomed in, add panning of the camera area...
 
As you know @KurtE and I have been making a few more mods to the PXP library which seem to resolve a couple of other issues we came across with further testing. Including fixing decimation which has now been tested down to a scale factor of 10 (anything smaller just shows a dot.).

So decided to test the PXP sketch with the Arduino style version of the ILI9488 parallel display:
Code:
Rotation 0:
Capture time (millis): 4, PXP_input_position(0, 0, 479, 319)
PXP_input_position(0, 80, 319, 399)
Input clip Buffer(0x20200000 -> 0x202000a0)
PXP time(micros) : 158, Display time: 25

Rotation 1:
Capture time (millis): 5, PXP_input_position(0, 0, 479, 319)
PXP_input_position(0, 0, 479, 319)
PXP time(micros) : 154, Display time: 24

Rotation 2:
Capture time (millis): 4, PXP_input_position(0, 0, 479, 319)
PXP_input_position(0, 80, 319, 399)
Input clip Buffer(0x20200000 -> 0x202000a0)
PXP time(micros) : 158, Display time: 25

Rotation 3:
Capture time (millis): 4, PXP_input_position(0, 0, 479, 319)
PXP_input_position(0, 0, 479, 319)
PXP time(micros) : 155, Display time: 25

Rotation 0 w/flip:
Capture time (millis): 4, PXP_input_position(0, 0, 479, 319)
PXP_input_position(0, 80, 319, 399)
Input clip Buffer(0x20200000 -> 0x202000a0)
PXP time(micros) : 158, Display time: 25

Rotation 2 w/scaling:
Capture time (millis): 4, PXP_input_position(0, 0, 479, 319)
PXP_input_position(0, 133, 319, 345)
PXP time(micros) : 154, Display time: 25

Rotation 3 w/Scaling:
Capture time (millis): 5, PXP_input_position(0, 0, 479, 319)
PXP_input_position(80, 53, 399, 265)
PXP time(micros) : 155, Display time: 25

While the PXP time went up (~85 micros to ~155micros for the flexion_teensy_mm image) the display time significantly reduced from about 232 millis to about 25 millis. A significant reduction in display time.

EDIT: Note that test was using the non-dma version of pushPixels16bit. When using the DMA version images get displaed in about 3-4 microseconds. However, if I try to scale the image nothing is displayed.

I pushed up the new changes to separate branch: https://github.com/mjs513/T4_PXP/tree/pxp_fixes_kurte
 
Last edited:
I've got the PXP in use for a new project, heres the HW spec:
DB5 with SDRAM
800*480px display, 2bpp running off the eLCDIF
LVGL v9.3

Im running LVGL in partial mode, with two 800*48px partial buffers that reside in DMAMEM
I have another full screen sized buffer in SDRAM that is used for the eLCDIF

I use the PXP to copy (async) data from the partial buffers to the main buffer

Here's a general data flow:
1758782605228.png



This is how I have the PXP setup in the setup function:
Code:
EXTMEM uint16_t lcdBuffer1[SCREEN_WIDTH * SCREEN_HEIGHT] __attribute__((aligned(64)));
EXTMEM uint16_t lcdBuffer2[SCREEN_WIDTH * SCREEN_HEIGHT] __attribute__((aligned(64))); //Only used when PXP is not in use
//EXTMEM uint16_t tempDisplayBuf[SCREEN_WIDTH * SCREEN_HEIGHT] __attribute__((aligned(64)));
//lv_display_t * disp;
#ifdef USE_PXP
DMAMEM uint16_t lvglBuffer1[SCREEN_WIDTH * (SCREEN_HEIGHT/4)] __attribute__((aligned(64)));
DMAMEM uint16_t lvglBuffer2[SCREEN_WIDTH * (SCREEN_HEIGHT/4)] __attribute__((aligned(64)));
#endif

#ifdef USE_PXP
  PXP_init();
  PXP_block_size(1); //16*16 block sized processing
  PXP_input_format(PXP_RGB565, 0, 0, 0); //Static, always apply these settings to the input buffer
  PXP_output_format(PXP_RGB565, 0, 0, 0); //Static, always apply these settings to the output buffer
  PXP_output_buffer((uint16_t*)lcdBuffer1, 2, 800, 480);
  PXP_callback(pxpCallback); //Triggers when copy is done, telling LVGL the flush is complete

....

  lv_init();
  disp = lv_display_create(SCREEN_WIDTH, SCREEN_HEIGHT);
  lv_display_set_flush_cb(disp, my_disp_flush);
  #ifdef USE_PXP
  lv_display_set_buffers(disp, lvglBuffer1, lvglBuffer2, SCREEN_WIDTH * (SCREEN_HEIGHT/4) * 2, LV_DISPLAY_RENDER_MODE_PARTIAL);
  #endif


And here is my LVGL flush and pxp callback

Code:
#ifdef LCDDISP
lv_display_t * disp;
volatile bool s_framePending = false;
#ifdef USE_PXP
FASTRUN void my_disp_flush(lv_display_t *display, const lv_area_t *area, uint8_t * px_map){
 

    // Calculate the actual area dimensions
    uint16_t area_width = (area->x2 - area->x1) + 1;
    uint16_t area_height = (area->y2 - area->y1) + 1;

    // Set up PXP input: the small LVGL buffer portion
    PXP_input_buffer((uint16_t*)px_map, 2, area_width, area_height);
    PXP_input_position(area->x1, area->y1, area->x2, area->y2);
    // Start the transfer
    PXP_process();
    s_framePending = true;
}


FASTRUN void pxpCallback(){
  if(s_framePending){
        lv_disp_flush_ready(disp);
        s_framePending = false;
    }
}

#else


Now, it works, kind of.
It will copy data over, and when I transition screens, for a glimplse, I will see the full image.
But in fact, It's blacking out everything and only dipslaying areas that need refreshing.
WhatsApp Image 2025-09-25 at 09.53.15.jpeg

In the image above, you an see the performance monitor on the bottom right of the display, as that object is constantly being updated.
If I click on buttons that need to render, they will appear and disappear.

So what I am trying to figure out, is, why are the sections that no longer need to be updated being blacked out - is it the PXP setting it to black, or the eLCDIF?

@jmarsh @KurtE @mjs513 @defragster @mborgerson tagging all of you since you have experience with either or both of the pxp/elcdif
 
Move the updates to s_framePending before the function calls.
The moment you call PXP_Process, the interrupt can be fired which may result in pxpCallback executing before you have set s_framePending to true. Likewise for calling lv_disp_flush_ready(), that may trigger another flush before you have cleared s_framePending.

I don't know how LVGL handles alpha (if it uses it at all), I suspect there's some PXP configuration missing for that and it thinks every pixel needs to be redrawn instead of only the ones being updated.
 
Last edited:
Move the updates to s_framePending before the function calls.
The moment you call PXP_Process, the interrupt can be fired which may result in pxpCallback executing before you have set s_framePending to true. Likewise for calling lv_disp_flush_ready(), that may trigger another flush before you have cleared s_framePending.
Thanks for the suggestion!

Just tried moving the s_framePending above the PXP function calls in the
my_disp_flush function, but I get the same behavior.

Where is the ideal place to put the callback trigger in the pxp isr?

I have it set up here:
Code:
static void (*pxp_callback_function)() = nullptr;

void PXP_isr(){
  if((PXP_STAT & PXP_STAT_LUT_DMA_LOAD_DONE_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_LUT_DMA_LOAD_DONE_IRQ;
  }
  if((PXP_STAT & PXP_STAT_NEXT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_NEXT_IRQ;
  }
  if((PXP_STAT & PXP_STAT_AXI_READ_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_READ_ERROR;
  }
  if((PXP_STAT & PXP_STAT_AXI_WRITE_ERROR) != 0){
    PXP_STAT_CLR = PXP_STAT_AXI_WRITE_ERROR;
  }
  if((PXP_STAT & PXP_STAT_IRQ) != 0){
    PXP_STAT_CLR = PXP_STAT_IRQ;
    PXP_done = true;

    // Call the callback function if it's set
    if(pxp_callback_function != nullptr) {
      pxp_callback_function();
    }
  }
#if defined(__IMXRT1062__) // Teensy 4.x
  asm("DSB");
#endif
}

PS Im using this PXP lib: https://github.com/mjs513/T4_PXP
 
So, if you draw a full width line each refresh at an increasing Y (with wrap) it would only ever show the last line drawn?

How is the lower right corner stats area properly drawn? (assuming it is?) Not all of that redraws every pixel each time?
 
LVGL will draw regions when set to partial buffer mode.
So If I was to draw a line across the screen in a loop where I go down a row on each itteration, based on the behavior exhibited, you would see the line going top to bottom, rather than filling the screen with the lines one by one
 
I think this is the issue:
1758793021411.png


Because I am not doing a full screen update, it might be setting everything else to the background color (which by default is probably black)
 
So, I've confirmed the above is the reason why anything that is not updating turns black.
Was able to overcome it partially by offsetting the output buffer start address with this code mod:


Code:
FASTRUN void my_disp_flush(lv_display_t *display, const lv_area_t *area, uint8_t * px_map){
    uint16_t area_width = (area->x2 - area->x1) + 1;
    uint16_t area_height = (area->y2 - area->y1) + 1;
    s_framePending = true;
    // Calculate the byte offset in the LCD buffer where this area starts
    uint32_t lcd_offset = (area->y1 * SCREEN_WIDTH + area->x1) * 2;
    void* lcd_dest = ((uint8_t*)lcdBuffer1) + lcd_offset;
    
    //arm_dcache_flush_delete((uint16_t*)px_map, area_width * area_height * 2);
    
    PXP_input_buffer((uint16_t*)px_map, 2, area_width, area_height);
    // Output buffer is just the area size, starting at the calculated offset
    PXP_output_buffer(lcd_dest, 2, area_width, area_height); 
    // Input covers the entire output - no letterboxing
    PXP_input_position(0, 0, area_width-1, area_height-1);
    
    PXP_process();
    
}

Oh well, not what I expected - just comes to show how much more limited the 1062 is to newer MCUs out there

I guess I'll just use two full lvgl buffers in SDRAM, transfer them to the output buffer in SDRAM, and then blend in my canvas with the Alpha surface buffer.
 
I don't see how this is a limitation of the MCU - you just can't update arbitrary pixels inside a region without an alpha plane...
Are you sure LVGL isn't using 1555 rather than 565 pixel format?
 
LGVL is definitely using RGB565

The limitation I see here is that you if you're not updating the entire buffer but just a part of it, by default it will apply the value of REG_PS_BACKGROUND to the rest of the buffer, and this behavior cannot be turned off:

Code:
At each pixel coordinate, the control logic determines if the PS pixel (argument also
applies to AS pixels) will be used in rendering the output pixel.
This is determined by checking the output pixel's coordinates against the
REG_OUT_PS_ULC and REG_OUT_PS_LRC (ULC and LRC in short) register
contents. For pixels outside this region, the PS pixel will be loaded with the pixel value
from REG_PS_BACKGROUND, which can be used to effectively control the
letterboxing color. There are no block size or block boundary restrictions when setting
the ULC or LRC for either the AS or PS. The only restriction is that the ULC and LRC
are within the OUT LRC extents.
 
So, I've confirmed the above is the reason why anything that is not updating turns black.
Was able to overcome it partially by offsetting the output buffer start address with this code mod:


Code:
FASTRUN void my_disp_flush(lv_display_t *display, const lv_area_t *area, uint8_t * px_map){
    uint16_t area_width = (area->x2 - area->x1) + 1;
    uint16_t area_height = (area->y2 - area->y1) + 1;
    s_framePending = true;
    // Calculate the byte offset in the LCD buffer where this area starts
    uint32_t lcd_offset = (area->y1 * SCREEN_WIDTH + area->x1) * 2;
    void* lcd_dest = ((uint8_t*)lcdBuffer1) + lcd_offset;
   
    //arm_dcache_flush_delete((uint16_t*)px_map, area_width * area_height * 2);
   
    PXP_input_buffer((uint16_t*)px_map, 2, area_width, area_height);
    // Output buffer is just the area size, starting at the calculated offset
    PXP_output_buffer(lcd_dest, 2, area_width, area_height);
    // Input covers the entire output - no letterboxing
    PXP_input_position(0, 0, area_width-1, area_height-1);
   
    PXP_process();
   
}

Oh well, not what I expected - just comes to show how much more limited the 1062 is to newer MCUs out there

I guess I'll just use two full lvgl buffers in SDRAM, transfer them to the output buffer in SDRAM, and then blend in my canvas with the Alpha surface buffer.
I remember having to do this as well when I was testing it out for drawing transparent icons on the screen. I feel like it was stated somewhere in the manual or I just tried it and it worked.
 
So basically set the source and destination as the same buffer, and offset the start address of the input buffer to the 0,0 area of where you want to draw a block.

Would have been so much more dev friendly to just implement a register setting to disable the background color for the areas outside of the updated region
 
Last edited:
Would have been so much more dev friendly to just implement a register setting to disable the background color for the areas outside of the updated region

Planning on working on something over the weekend - unfortunately - in the mid of something for the next couple of days :)
 
Planning on working on something over the weekend - unfortunately - in the mid of something for the next couple of days :)
My comment was directed to no one specific, just a general comment on the lack of sense and intuitiveness the NXP developers applied to the pxp

I can try add it into the lib for a slightly better user friendly API to call and share it when it works - don’t feel stressed to get on it!
 
My comment was directed to no one specific, just a general comment on the lack of sense and intuitiveness the NXP developers applied to the pxp
Understood. Was planning on doing it from this morning on the weekend - just forgot to post.
 
Would have been so much more dev friendly to just implement a register setting to disable the background color for the areas outside of the updated region
The thing is, what you're doing here doesn't really need to be done with the PXP... it's just a straight memcpy of each row. It probably takes longer to prepare the PXP to do it than just performing it using the CPU.
 
@Rezo
Sorry took me so long to play around with this but kept getting distracting with other issues :)

Anyways took a slightly different approach. Decided to play around with using the pxp overlay (using AS with PS) so gave it a try and seems to work - but I am using a ST7796 IPS display using @KenHahn min-prototyping board for testing. Basically it takes one image image in the PS buffer and a second image in the AS buffer and moves around on the screen :)

A snippet of code:
C++:
tft.setRotation(3);
  capture_frame(false);

  PXP_input_buffer(s_fb, 2, 420, 315);
  PXP_input_format(PXP_RGB565);

  uint32_t psUlcX = 0;
  uint32_t psUlcY = 0;
  uint32_t psLrcX, psLrcY;
  psLrcX = psUlcX + tft.width() - 1U;
  psLrcY = psUlcY + tft.height() - 1U;
  PXP_input_position(psUlcX, psUlcY, psLrcX, psLrcY);

  PXP_output_buffer(d_fb, 2, tft.width(), tft.height());
  PXP_output_format(PXP_RGB565, 0, 0, 0);
  PXP_process();
  draw_frame( tft.width(), tft.height(), d_fb);

  capture_frame1(false);
  PXP_overlay_format(PXP_RGB565);
  PXP_overlay_buffer(logo_fb, 2, logo_width, logo_height);
  uint32_t image_posx = 100;
  uint32_t image_posy = 12;
  PXP_overlay_position(image_posx, image_posy, image_posx + logo_width, image_posy + logo_height);
  PXP_process();
  draw_frame( tft.width(), tft.height(), d_fb);

  delay(2000);

  image_posx = 200;
  image_posy = 50;
  PXP_overlay_position( image_posx, image_posy, image_posx + logo_width, image_posy + logo_height);
  PXP_process();
  draw_frame( tft.width(), tft.height(), d_fb);

IMG_1645.png

This might work as well. I am attaching the whole sketch

Cheers
 

Attachments

  • pxpBltOperation-251003a.zip
    235.1 KB · Views: 15
@mjs513 thanks for getting the example together Mike!
I was able to get the PS and AS to blend without an issue - see the big waveform here on the AS with LVGL on the PS

But what I actually wanted to do is draw a full frame with a partial buffer on the PS, each time drawing a section.
Only way to so this is to offset the output buffer start address. It works based on my tests, but the performance is not as great as I expected, even though it’s all running async with a custom callback to load the next frame.

So, I’ll stick with full frame buffers for LVGL and maintain the current implementation
 
Back
Top