T4 Pixel Pipeline Library

Ok just managed to get do a color conversion from Y8/Y4 to RGB565. That will address issues with HM01B0 and HM0360 cameras - so everything can be done from PXP now.

Image from conversion:
1715966933605.jpeg
 
@mjs513 are you using SDRAM buffers or internal RAM buffers for the PXP source and destination?
Right now this is all on internal buffers on the T4.1 and Micromod. The T41 is using EXTRAM running at 132mhz with a ILI9488 on SPI

At some point soonish going to try on SDRAM board.
 
Looks great!
Thanks. Been at it again. Since its the set of functions I am using all the time I just created a new one that I am testing. This way don't have to keep copying and pasting code

Code:
        PXP_ps_output(tft.width(), tft.height(),      /* Display width and height */
                      FRAME_WIDTH, FRAME_HEIGHT,      /* Image width and height */
                      s_fb, PXP_Y8, 1,                /* Input buffer configuration */
                      d_fb, PXP_RGB565, 2,            /* Output buffer configuration */
                      3, false, 1.5,                  /* Rotation, flip, scaling */
                      &outputWidth, &outputHeight);   /* Frame Out size for drawing */
 
Ok pushed the changes mentioned up to my fork of lib and updated the examples if anyone wants to try it


The basic calling function is now (included the option for byte swap):
Code:
void PXP_ps_output(uint16_t disp_width, uint16_t disp_height, uint16_t image_width, uint16_t image_height,
                   void* buf_in, uint8_t format_in, uint8_t bpp_in, uint8_t byte_swap_in,
                   void* buf_out, uint8_t format_out, uint8_t bpp_out, uint8_t byte_swap_out,
                   uint8_t rotation, bool flip, float scaling,
                   uint16_t* scr_width, uint16_t* scr_height);

Think its pretty much self-explanatory except maybe the scr_width and scr_height. Basically thats the final image width and height after rotation/scaling effects applied. They are returned values from PXP.
 
using the non_camera sketch (another_pxp_test.ino) which just loads images from progmem I did some more timing tests. Based on using 198Mhz for SDRAM clock:
Code:
Using flexio_teensy_mm image from @Rezo
=================================================================
s_fb: RAM/d_fb: DMAMEM
Rotation 0:
Capture time (millis): 128, PXP time(micros) : 114, Display time: 205
Rotation 1:
Capture time (millis): 127, PXP time(micros) : 114, Display time: 246
Rotation 2:
Capture time (millis): 128, PXP time(micros) : 113, Display time: 205
Rotation 3:
Capture time (millis): 128, PXP time(micros) : 114, Display time: 246
Rotation 0 w/flip:
Capture time (millis): 128, PXP time(micros) : 113, Display time: 205
Rotation 2 w/scaling:
Capture time (millis): 127, PXP time(micros) : 102, Display time: 178
Rotation 3 w/Scaling:
Capture time (millis): 127, PXP time(micros) : 102, Display time: 178

==================================================================
sdram/sdram
Rotation 0:
Capture time (millis): 127, PXP time(micros) : 258, Display time: 205
Rotation 1:
Capture time (millis): 127, PXP time(micros) : 261, Display time: 246
Rotation 2:
Capture time (millis): 128, PXP time(micros) : 258, Display time: 205
Rotation 3:
Capture time (millis): 127, PXP time(micros) : 261, Display time: 247
Rotation 0 w/flip:
Capture time (millis): 127, PXP time(micros) : 257, Display time: 205
Rotation 2 w/scaling:
Capture time (millis): 127, PXP time(micros) : 352, Display time: 178
Rotation 3 w/Scaling:
Capture time (millis): 127, PXP time(micros) : 358, Display time: 178

SDRAM twice as slow as internal mem - probably will change if you run the clock faster
 
Ok pushed the changes mentioned up to my fork of lib and updated the examples if anyone wants to try it


The basic calling function is now (included the option for byte swap):
Code:
void PXP_ps_output(uint16_t disp_width, uint16_t disp_height, uint16_t image_width, uint16_t image_height,
                   void* buf_in, uint8_t format_in, uint8_t bpp_in, uint8_t byte_swap_in,
                   void* buf_out, uint8_t format_out, uint8_t bpp_out, uint8_t byte_swap_out,
                   uint8_t rotation, bool flip, float scaling,
                   uint16_t* scr_width, uint16_t* scr_height);

Think its pretty much self-explanatory except maybe the scr_width and scr_height. Basically thats the final image width and height after rotation/scaling effects applied. They are returned values from PXP.
Hi I tried the new sketch with the HM0360 reading qvga on MMOD and doing the conversion to RGB and rotation. It worked :D

I tried changing to VGA and as I sort of expected, not enough memory...

Wondering if I can setup the camera to support the size: FRAMESIZE_480X320,
And if we have enough memory for that...

May have to try it.
 
I tried changing to VGA and as I sort of expected, not enough memory...

Wondering if I can setup the camera to support the size: FRAMESIZE_480X320,
And if we have enough memory for that...
  1. Actually there is. There are 2 rules of thumb.
    The source buffer needs to be the image size and
  2. The destination buffer should be the display size
so in this case if you set your buffers like this for an ILI9488 display:
Code:
uint8_t s_fb[(640) * 480] __attribute__((aligned(64)));
DMAMEM uint16_t d_fb[(480) * 320] __attribute__ ((aligned (64)));

it should work. Think I tried it yesterday.

Oh dont forget to set the scalefactor to something like 1.5
 
It works...
1716085981318.png


It is interesting that the PXP takes almost as much time as either the capture or the display

Code:
Not DMA
Finished reading frame
Capture time (millis): 210, PXP time(micros) : 204, Display time: 232
 PXP rotation 2....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 178, PXP time(micros) : 204, Display time: 205
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 282, PXP time(micros) : 204, Display time: 232
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 187, PXP time(micros) : 205, Display time: 232
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 207, PXP time(micros) : 205, Display time: 232
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 149, PXP time(micros) : 205, Display time: 232
 
Just as a quick update @vjmuzik incorporated a PR that we generated to have one call setup and run PXP into his library:
Code:
PXP_ps_output(uint16_t disp_width, uint16_t disp_height, uint16_t image_width, uint16_t image_height,
                   void* buf_in, uint8_t format_in, uint8_t bpp_in, uint8_t byte_swap_in,
                   void* buf_out, uint8_t format_out, uint8_t bpp_out, uint8_t byte_swap_out,
                   uint8_t rotation, bool flip, float scaling,
                   uint16_t* scr_width, uint16_t* scr_height)

So one function will setup the buffers, conversions, rotation, and flip_both without having to mess with setting anything else up (or forget to set it up). We have been using it with our ov5640/24640/HM0360 with alot of success. Basically we set up a single function and call it before displaying the framebuffer:

Code:
inline void do_pxp_conversion(uint16_t &outputWidth, uint16_t &outputHeight) {
#if defined(CAMERA_USES_MONO_PALETTE)
  PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                camera.width(), camera.height(), /* Image width and height */
                camera_buffer, PXP_Y8, 1, 0,     /* Input buffer configuration */
                screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                TFT_ROTATION, 0, 480.0 / 320.0,  /* Rotation, flip, scaling */
                &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#else
  PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                camera.width(), camera.height(), /* Image width and height */
                camera_buffer, PXP_RGB565, 2, 0, /* Input buffer configuration */
                screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                TFT_ROTATION, true, 0.0,         /* Rotation, flip, scaling */
                &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#endif
}

Pretty much each grouping of variables is explained in the comment.

Now to play with eLCD.
 
I saw several post in different forums that indicate that a maximum of 5 arguments is an 'industry standard'. Seventeen arguments (if I counted right) seems to beg for a defined structure and passing a single pointer.

In the good old days, with dumber compilers and processors, passing a single pointer didn't even need stack space for arguments---the compiler reserved 3 or 4 registers for passing arguments and for internal use in functions so that they didn't even have to be saved and restored with stack push and pops in function calls. If this was a function whose execution time wasn't measured in milliseconds, single pointer might be faster---although the ARM cpus and compilers are pretty good at fetching arguments with a single instruction using a base register with an offset. It probably doesn't matter whether the base register is pointing to a stack frame or a structure elsewhere in memory.

If all the arguments are variables (not defined value or constants), you may use twice as much memory, as the values will exist once where they are allocated in the calling function, and once on the stack at the function call. The fact that you inline the call with four function results (the camera TFT width and height) as arguments must make for an interesting piece of inline code. ;-) If you were passing a structure pointer, you would only have to call the width and height functions if you changed something in the TFT or Camera.
 
@mborgerson - yeah there are alot of things I am passing and returning 2 as well.

Not a bad idea to think about using a struct. Will put it on the todo list - which seems to keep growing.
 
Personally, I don't really think I want to use a struct to initialize another struct.... 😉
That is almost all of the calls within the T4_PXP library are setup to fill in a structure:
Code:
typedef struct
{
    volatile uint32_t CTRL;
    volatile uint32_t STAT;
    volatile uint32_t OUT_CTRL;
    volatile void*    OUT_BUF;
    volatile void*    OUT_BUF2;
    volatile uint32_t OUT_PITCH;
    volatile uint32_t OUT_LRC;
    volatile uint32_t OUT_PS_ULC;
    volatile uint32_t OUT_PS_LRC;
    volatile uint32_t OUT_AS_ULC;
    volatile uint32_t OUT_AS_LRC;
    volatile uint32_t PS_CTRL;
    volatile void*    PS_BUF;
    volatile void*    PS_UBUF;
    volatile void*    PS_VBUF;
    volatile uint32_t PS_PITCH;
    volatile uint32_t PS_BACKGROUND;
    volatile uint32_t PS_SCALE;
    volatile uint32_t PS_OFFSET;
    volatile uint32_t PS_CLRKEYLOW;
    volatile uint32_t PS_CLRKEYHIGH;
    volatile uint32_t AS_CTRL;
    volatile void*    AS_BUF;
    volatile uint32_t AS_PITCH;
    volatile uint32_t AS_CLRKEYLOW;
    volatile uint32_t AS_CLRKEYHIGH;
    volatile uint32_t CSC1_COEF0;
    volatile uint32_t CSC1_COEF1;
    volatile uint32_t CSC1_COEF2;
    volatile uint32_t POWER;
    volatile uint32_t NEXT;
    volatile uint32_t PORTER_DUFF_CTRL;
} IMXRT_NEXT_PXP_t;
And if nothing changes, we don't need to much calling, but simply set the next pointer in PXP...
For example, I did a quick and dirty edit of our example sketch:

Code:
#ifdef USE_T4_PXP
inline void do_pxp_conversion(uint16_t &outputWidth, uint16_t &outputHeight) {
    static bool first_time = true;

    if (!first_time) {
        PXP_process();
        return;
    }
    first_time = false;
#if defined(CAMERA_USES_MONO_PALETTE)
    PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                  camera.width(), camera.height(), /* Image width and height */
                  camera_buffer, PXP_Y8, 1, 0,     /* Input buffer configuration */
                  screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                  TFT_ROTATION, 0, 480.0 / 320.0,  /* Rotation, flip, scaling */
                  &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#else
    PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                  camera.width(), camera.height(), /* Image width and height */
                  camera_buffer, PXP_RGB565, 2, 0, /* Input buffer configuration */
                  screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                  TFT_ROTATION, true, 0.0,            /* Rotation, flip, scaling */
                  &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#endif
}
#endif
So the first time through it calls the helper...
After that it simply calls the PXP_process(), which waits for any other PXP operations to complete and sets the next pointer.

And I think it is still working :D
 
And I think it is still working :D
Maybe sort of... That is I am not sure the proper way is to do what we are trying to do.
Currently the code goes something like:
C++:
bool camera_read_callback(void *pfb) {
    digitalWriteFast(3, HIGH);
    frame_count_camera++;
    if (tft.asyncUpdateActive()) {
        digitalWriteFast(3, LOW);
       return true;
    }
    frame_count_tft++;

    uint16_t outputWidth, outputHeight;
    do_pxp_conversion(outputWidth, outputHeight);
    tft.updateScreenAsync();
    digitalWriteFast(1, HIGH);
    digitalWriteFast(3, LOW);
    return true;
}
Note some of this is debug code, other is to not process the camera frame if the display is still working on previous frame...

Now the issue: PXP_process();
Only queues up the PXP conversion, it does not wait for it to complete.

There is a method that waits for it to complete: PXP_finish();
If I add a call to that in the another... example sketch I mentioned above:
PXP rotation 1.... $$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200 Not DMA Finished reading frame Capture time (millis): 187, PXP time(micros) : 205, Display time: 232
The time for the actual PXP operation is:
Code:
Rotation 1:
Capture time (millis): 128, PXP time(micros) : 1342, Display time: 246
Is larger...

I am guessing in most cases, maybe our processing to startup the TFT.updateScreenAsync() maybe takes up some of this time, but wondering if it is likely we are maybe outputting some contents from previous frame...

Note: The PXP setup does have an interrupt that is triggered when the frame completes and this library sets it up. And the PXP_done that the call to PXP_finish() and PXP_process looks at is set in this ISR.

Question to self and others: Wondering if the library should have a method to set a callback when the PXP_isr() sets the done, that then allows
us to only start the updateScreenAsync after it completes?
 
Back
Top