T4 Pixel Pipeline Library

mjs513 · May 13, 2024

KurtE said:
Looks like it runs on my MMOD with ILI9488

Thanks for checking

mjs513 · May 17, 2024

Ok just managed to get do a color conversion from Y8/Y4 to RGB565. That will address issues with HM01B0 and HM0360 cameras - so everything can be done from PXP now.

Image from conversion:

Rezo · May 17, 2024

@mjs513 are you using SDRAM buffers or internal RAM buffers for the PXP source and destination?

mjs513 · May 17, 2024

Rezo said:
@mjs513 are you using SDRAM buffers or internal RAM buffers for the PXP source and destination?

Right now this is all on internal buffers on the T4.1 and Micromod. The T41 is using EXTRAM running at 132mhz with a ILI9488 on SPI

At some point soonish going to try on SDRAM board.

KurtE · May 17, 2024

Looks great!

mjs513 · May 17, 2024

KurtE said:
Looks great!

Thanks. Been at it again. Since its the set of functions I am using all the time I just created a new one that I am testing. This way don't have to keep copying and pasting code

Code:

        PXP_ps_output(tft.width(), tft.height(),      /* Display width and height */
                      FRAME_WIDTH, FRAME_HEIGHT,      /* Image width and height */
                      s_fb, PXP_Y8, 1,                /* Input buffer configuration */
                      d_fb, PXP_RGB565, 2,            /* Output buffer configuration */
                      3, false, 1.5,                  /* Rotation, flip, scaling */
                      &outputWidth, &outputHeight);   /* Frame Out size for drawing */

mjs513 · May 18, 2024

Rezo said:
@mjs513 are you using SDRAM buffers or internal RAM buffers for the PXP source and destination?

Just posted some info on the SDRAM thread.

mjs513 · May 18, 2024

Ok pushed the changes mentioned up to my fork of lib and updated the examples if anyone wants to try it

GitHub - mjs513/T4_PXP at pxp_t4_mods

Teensy 4 Pixel Pipeline. Contribute to mjs513/T4_PXP development by creating an account on GitHub.

github.com

The basic calling function is now (included the option for byte swap):

Code:

void PXP_ps_output(uint16_t disp_width, uint16_t disp_height, uint16_t image_width, uint16_t image_height,
                   void* buf_in, uint8_t format_in, uint8_t bpp_in, uint8_t byte_swap_in,
                   void* buf_out, uint8_t format_out, uint8_t bpp_out, uint8_t byte_swap_out,
                   uint8_t rotation, bool flip, float scaling,
                   uint16_t* scr_width, uint16_t* scr_height);

Think its pretty much self-explanatory except maybe the scr_width and scr_height. Basically thats the final image width and height after rotation/scaling effects applied. They are returned values from PXP.

mjs513 · May 18, 2024

using the non_camera sketch (another_pxp_test.ino) which just loads images from progmem I did some more timing tests. Based on using 198Mhz for SDRAM clock:

Code:

Using flexio_teensy_mm image from @Rezo
=================================================================
s_fb: RAM/d_fb: DMAMEM
Rotation 0:
Capture time (millis): 128, PXP time(micros) : 114, Display time: 205
Rotation 1:
Capture time (millis): 127, PXP time(micros) : 114, Display time: 246
Rotation 2:
Capture time (millis): 128, PXP time(micros) : 113, Display time: 205
Rotation 3:
Capture time (millis): 128, PXP time(micros) : 114, Display time: 246
Rotation 0 w/flip:
Capture time (millis): 128, PXP time(micros) : 113, Display time: 205
Rotation 2 w/scaling:
Capture time (millis): 127, PXP time(micros) : 102, Display time: 178
Rotation 3 w/Scaling:
Capture time (millis): 127, PXP time(micros) : 102, Display time: 178

==================================================================
sdram/sdram
Rotation 0:
Capture time (millis): 127, PXP time(micros) : 258, Display time: 205
Rotation 1:
Capture time (millis): 127, PXP time(micros) : 261, Display time: 246
Rotation 2:
Capture time (millis): 128, PXP time(micros) : 258, Display time: 205
Rotation 3:
Capture time (millis): 127, PXP time(micros) : 261, Display time: 247
Rotation 0 w/flip:
Capture time (millis): 127, PXP time(micros) : 257, Display time: 205
Rotation 2 w/scaling:
Capture time (millis): 127, PXP time(micros) : 352, Display time: 178
Rotation 3 w/Scaling:
Capture time (millis): 127, PXP time(micros) : 358, Display time: 178

SDRAM twice as slow as internal mem - probably will change if you run the clock faster

KurtE · May 18, 2024

mjs513 said:
Ok pushed the changes mentioned up to my fork of lib and updated the examples if anyone wants to try it

GitHub - mjs513/T4_PXP at pxp_t4_mods

Teensy 4 Pixel Pipeline. Contribute to mjs513/T4_PXP development by creating an account on GitHub.

github.com

The basic calling function is now (included the option for byte swap):

Code:

void PXP_ps_output(uint16_t disp_width, uint16_t disp_height, uint16_t image_width, uint16_t image_height, void* buf_in, uint8_t format_in, uint8_t bpp_in, uint8_t byte_swap_in, void* buf_out, uint8_t format_out, uint8_t bpp_out, uint8_t byte_swap_out, uint8_t rotation, bool flip, float scaling, uint16_t* scr_width, uint16_t* scr_height);

Think its pretty much self-explanatory except maybe the scr_width and scr_height. Basically thats the final image width and height after rotation/scaling effects applied. They are returned values from PXP.

Hi I tried the new sketch with the HM0360 reading qvga on MMOD and doing the conversion to RGB and rotation. It worked

I tried changing to VGA and as I sort of expected, not enough memory...

Wondering if I can setup the camera to support the size: FRAMESIZE_480X320,
And if we have enough memory for that...

May have to try it.

mjs513 · May 18, 2024

KurtE said:
I tried changing to VGA and as I sort of expected, not enough memory...

Wondering if I can setup the camera to support the size: FRAMESIZE_480X320,
And if we have enough memory for that...

Actually there is. There are 2 rules of thumb.
The source buffer needs to be the image size and
The destination buffer should be the display size

so in this case if you set your buffers like this for an ILI9488 display:

Code:

uint8_t s_fb[(640) * 480] __attribute__((aligned(64)));
DMAMEM uint16_t d_fb[(480) * 320] __attribute__ ((aligned (64)));

it should work. Think I tried it yesterday.

Oh dont forget to set the scalefactor to something like 1.5

KurtE · May 18, 2024

It works...

It is interesting that the PXP takes almost as much time as either the capture or the display

Code:

Not DMA
Finished reading frame
Capture time (millis): 210, PXP time(micros) : 204, Display time: 232
 PXP rotation 2....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 178, PXP time(micros) : 204, Display time: 205
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 282, PXP time(micros) : 204, Display time: 232
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 187, PXP time(micros) : 205, Display time: 232
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 207, PXP time(micros) : 205, Display time: 232
 PXP rotation 1....
$$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200
    Not DMA
Finished reading frame
Capture time (millis): 149, PXP time(micros) : 205, Display time: 232

defragster · May 19, 2024

KurtE said:
PXP takes almost as much time as either the capture or the display

are the Miilis Micros labels correct? This would be 1,000 times faster?
Capture time (millis): 210
PXP time(micros) : 204

No units on : Display time: 232

mjs513 · May 19, 2024

defragster said:
are the Miilis Micros labels correct? This would be 1,000 times faster?

Yep Milli’s and micros are correct. Display is in millis

mjs513 · May 26, 2024

Just as a quick update @vjmuzik incorporated a PR that we generated to have one call setup and run PXP into his library:

Code:

PXP_ps_output(uint16_t disp_width, uint16_t disp_height, uint16_t image_width, uint16_t image_height,
                   void* buf_in, uint8_t format_in, uint8_t bpp_in, uint8_t byte_swap_in,
                   void* buf_out, uint8_t format_out, uint8_t bpp_out, uint8_t byte_swap_out,
                   uint8_t rotation, bool flip, float scaling,
                   uint16_t* scr_width, uint16_t* scr_height)

So one function will setup the buffers, conversions, rotation, and flip_both without having to mess with setting anything else up (or forget to set it up). We have been using it with our ov5640/24640/HM0360 with alot of success. Basically we set up a single function and call it before displaying the framebuffer:

Code:

inline void do_pxp_conversion(uint16_t &outputWidth, uint16_t &outputHeight) {
#if defined(CAMERA_USES_MONO_PALETTE)
  PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                camera.width(), camera.height(), /* Image width and height */
                camera_buffer, PXP_Y8, 1, 0,     /* Input buffer configuration */
                screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                TFT_ROTATION, 0, 480.0 / 320.0,  /* Rotation, flip, scaling */
                &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#else
  PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                camera.width(), camera.height(), /* Image width and height */
                camera_buffer, PXP_RGB565, 2, 0, /* Input buffer configuration */
                screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                TFT_ROTATION, true, 0.0,         /* Rotation, flip, scaling */
                &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#endif
}

Pretty much each grouping of variables is explained in the comment.

Now to play with eLCD.

mborgerson · May 26, 2024

I saw several post in different forums that indicate that a maximum of 5 arguments is an 'industry standard'. Seventeen arguments (if I counted right) seems to beg for a defined structure and passing a single pointer.

In the good old days, with dumber compilers and processors, passing a single pointer didn't even need stack space for arguments---the compiler reserved 3 or 4 registers for passing arguments and for internal use in functions so that they didn't even have to be saved and restored with stack push and pops in function calls. If this was a function whose execution time wasn't measured in milliseconds, single pointer might be faster---although the ARM cpus and compilers are pretty good at fetching arguments with a single instruction using a base register with an offset. It probably doesn't matter whether the base register is pointing to a stack frame or a structure elsewhere in memory.

If all the arguments are variables (not defined value or constants), you may use twice as much memory, as the values will exist once where they are allocated in the calling function, and once on the stack at the function call. The fact that you inline the call with four function results (the camera TFT width and height) as arguments must make for an interesting piece of inline code. ;-) If you were passing a structure pointer, you would only have to call the width and height functions if you changed something in the TFT or Camera.

mjs513 · May 26, 2024

@mborgerson - yeah there are alot of things I am passing and returning 2 as well.

Not a bad idea to think about using a struct. Will put it on the todo list - which seems to keep growing.

Rezo · May 27, 2024

The NXP drivers use structs on almost all classes, think it would be good here as well

KurtE · May 27, 2024

Personally, I don't really think I want to use a struct to initialize another struct....

That is almost all of the calls within the T4_PXP library are setup to fill in a structure:

Code:

typedef struct
{
    volatile uint32_t CTRL;
    volatile uint32_t STAT;
    volatile uint32_t OUT_CTRL;
    volatile void*    OUT_BUF;
    volatile void*    OUT_BUF2;
    volatile uint32_t OUT_PITCH;
    volatile uint32_t OUT_LRC;
    volatile uint32_t OUT_PS_ULC;
    volatile uint32_t OUT_PS_LRC;
    volatile uint32_t OUT_AS_ULC;
    volatile uint32_t OUT_AS_LRC;
    volatile uint32_t PS_CTRL;
    volatile void*    PS_BUF;
    volatile void*    PS_UBUF;
    volatile void*    PS_VBUF;
    volatile uint32_t PS_PITCH;
    volatile uint32_t PS_BACKGROUND;
    volatile uint32_t PS_SCALE;
    volatile uint32_t PS_OFFSET;
    volatile uint32_t PS_CLRKEYLOW;
    volatile uint32_t PS_CLRKEYHIGH;
    volatile uint32_t AS_CTRL;
    volatile void*    AS_BUF;
    volatile uint32_t AS_PITCH;
    volatile uint32_t AS_CLRKEYLOW;
    volatile uint32_t AS_CLRKEYHIGH;
    volatile uint32_t CSC1_COEF0;
    volatile uint32_t CSC1_COEF1;
    volatile uint32_t CSC1_COEF2;
    volatile uint32_t POWER;
    volatile uint32_t NEXT;
    volatile uint32_t PORTER_DUFF_CTRL;
} IMXRT_NEXT_PXP_t;

And if nothing changes, we don't need to much calling, but simply set the next pointer in PXP...
For example, I did a quick and dirty edit of our example sketch:

Code:

#ifdef USE_T4_PXP
inline void do_pxp_conversion(uint16_t &outputWidth, uint16_t &outputHeight) {
    static bool first_time = true;

    if (!first_time) {
        PXP_process();
        return;
    }
    first_time = false;
#if defined(CAMERA_USES_MONO_PALETTE)
    PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                  camera.width(), camera.height(), /* Image width and height */
                  camera_buffer, PXP_Y8, 1, 0,     /* Input buffer configuration */
                  screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                  TFT_ROTATION, 0, 480.0 / 320.0,  /* Rotation, flip, scaling */
                  &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#else
    PXP_ps_output(tft.width(), tft.height(),       /* Display width and height */
                  camera.width(), camera.height(), /* Image width and height */
                  camera_buffer, PXP_RGB565, 2, 0, /* Input buffer configuration */
                  screen_buffer, PXP_RGB565, 2, 0, /* Output buffer configuration */
                  TFT_ROTATION, true, 0.0,            /* Rotation, flip, scaling */
                  &outputWidth, &outputHeight);    /* Frame Out size for drawing */
#endif
}
#endif

So the first time through it calls the helper...
After that it simply calls the PXP_process(), which waits for any other PXP operations to complete and sets the next pointer.

And I think it is still working

KurtE · May 27, 2024

KurtE said:
And I think it is still working

Maybe sort of... That is I am not sure the proper way is to do what we are trying to do.
Currently the code goes something like:

C++:

bool camera_read_callback(void *pfb) {
    digitalWriteFast(3, HIGH);
    frame_count_camera++;
    if (tft.asyncUpdateActive()) {
        digitalWriteFast(3, LOW);
       return true;
    }
    frame_count_tft++;

    uint16_t outputWidth, outputHeight;
    do_pxp_conversion(outputWidth, outputHeight);
    tft.updateScreenAsync();
    digitalWriteFast(1, HIGH);
    digitalWriteFast(3, LOW);
    return true;
}

Note some of this is debug code, other is to not process the camera frame if the display is still working on previous frame...

Now the issue: PXP_process();
Only queues up the PXP conversion, it does not wait for it to complete.

There is a method that waits for it to complete: PXP_finish();
If I add a call to that in the another... example sketch I mentioned above:

KurtE said:
PXP rotation 1.... $$ImageSensor::readFrameFlexIO(0x200056c0, 307200, 0x0, 0, 0, 0) 307200 Not DMA Finished reading frame Capture time (millis): 187, PXP time(micros) : 205, Display time: 232

The time for the actual PXP operation is:

Code:

Rotation 1:
Capture time (millis): 128, PXP time(micros) : 1342, Display time: 246

Is larger...

I am guessing in most cases, maybe our processing to startup the TFT.updateScreenAsync() maybe takes up some of this time, but wondering if it is likely we are maybe outputting some contents from previous frame...

Note: The PXP setup does have an interrupt that is triggered when the frame completes and this library sets it up. And the PXP_done that the call to PXP_finish() and PXP_process looks at is set in this ISR.

Question to self and others: Wondering if the library should have a method to set a callback when the PXP_isr() sets the done, that then allows
us to only start the updateScreenAsync after it completes?

Rezo · May 27, 2024

@KurtE agree it should have an option to register a callback to the PXP ISR

mborgerson · May 27, 2024

I used that technique in my years-old OV7670 camera libraries:

CSI-Frame Capture IRQ -> Sets up and Starts PXP to convert YUV422 to RGB565 and shrink for ILI9341

PXP-End IRQ -> Sets up starts Async transfer to LCD

This sequence captured VGA frames in a circular buffer in EXTMEM to be used by the foreground loop and stored to SD card when requested. I found that having the CSI and PSP competing for access to EXTMEM caused lots of glitches in the data, so I had to stop the CSI while the PXP was running, which reduced the overall frame rate to about 5FPS.

Have you seen similar issues with FlexIO-based camera drivers? Hopefully, avoiding the use of EXTMEM and splitting the frame buffer between DTCM and DMAMEM avoids the conflict with PXP usage.

KurtE · May 27, 2024

Rezo said:
@KurtE agree it should have an option to register a callback to the PXP ISR

I have pushed up fork/branch with it in it:

GitHub - KurtE/T4_PXP at PXP_callback

Teensy 4 Pixel Pipeline. Contribute to KurtE/T4_PXP development by creating an account on GitHub.

github.com

Which I am experimenting with.

mborgerson · May 27, 2024

KurtE said:
Personally, I don't really think I want to use a struct to initialize another struct....
That is almost all of the calls within the T4_PXP library are setup to fill in a structure:

If your IMXRT_NEXT_PXP_t structure definition is available to the user, that could be the structure used to set up the PXP. Initializing the PXP then becomes simple matter of setting the PXP_NEXT register with a pointer to your structure. If you want to keep a local copy of the structure, you can do that with a simple memcpy(). After that is done, you can use the internal structure inside the class.

There may be some situations where you want to make lots of changes to the setup. In such cases, either your user code or the class could have an array of IMXRT_NEXT_PXP_t structures and a selector method to put a pointer to the desired structure into PXP_NEXT

If the PXP itself doesn't care what is in memory after the structure, you could add a pointer to a callback function at the end of the PXP data. If that is done, the user should be sure to set that pointer to NULL if no callback is desired. You never know when someone might decide to put either their own structure or a PXP class instance in DMAMEM or some other uninitialized memory!

I found it useful to have a member function to pass the current PXP register contents back to the user in an array compatible with PXP_NEXT:

Code:

// Save the PXP registers to the PXP_Next array passed as parameter.
// The input pointer should point to an array of 32 uint32_t elements.
void clPXP::SaveNext(uint32_t pxnptr[]) {
  uint16_t i;
  volatile uint32_t *pxptr = &PXP_CTRL;  // set pointer to first register
  uint32_t *nxptr = &pxnptr[0]; // set pointer to first saved value

  for (i = 0; i < 29; i++) { // first 29 are at 16-byte intervals
    *nxptr++ = *pxptr;
    pxptr += 4; // skips ahead 16 bytes to next register
  }
  // the last three entries are differently spaced so use register definitions
  *nxptr++ = PXP_POWER;
  *nxptr++ = PXP_NEXT;
  *nxptr = PXP_PORTER_DUFF_CTRL;
}

KurtE · May 28, 2024

mborgerson said:
I used that technique in my years-old OV7670 camera libraries:

CSI-Frame Capture IRQ -> Sets up and Starts PXP to convert YUV422 to RGB565 and shrink for ILI9341

PXP-End IRQ -> Sets up starts Async transfer to LCD

This sequence captured VGA frames in a circular buffer in EXTMEM to be used by the foreground loop and stored to SD card when requested. I found that having the CSI and PSP competing for access to EXTMEM caused lots of glitches in the data, so I had to stop the CSI while the PXP was running, which reduced the overall frame rate to about 5FPS.

Have you seen similar issues with FlexIO-based camera drivers? Hopefully, avoiding the use of EXTMEM and splitting the frame buffer between DTCM and DMAMEM avoids the conflict with PXP usage.

@mjs513 has played a lot more with this stuff than I have. I am still trying to understand some of the basics...
Just doing this for the fun of it with the different cameras.

Simply trying to read from camera, massage the data and show it on an ILI9488 display (or HX8357)...

For the OV2640 or OV5640 - I actually have it setup for the camera to return a 480x320 image, which will fit into either of these regions.
I could and was simply setup the display with the right rotation and I could simply have the displays rotation set properly and output directly to the screen... But was curious about comments that with output in Rotation 0, getting less tearing... So thought I would try it...

Right now playing the HM0360 display, which is returning a Mono VGA image... I am trying to setup the PXP to have it convert the
MONO image into an RGB565 format 480x320 rotated by 90 or 270 degrees. Which the current stuff we have setup works if I set the scaling to 1.5... This works sort of as the 1.5 converts the size to 320 by 426 So that fits into the 320 by 480 frame buffer.
Justified to one end (black at other).

a) But what I would like to do, is instead in this case the image should be centered, so would like to offset the one direction:
by (480-426)/2 or 27.

b) Or alternatively maybe I want to reduce the zoom and clip... For example, suppose I pass in a scale factor of 640/480=1.333333
That 640/1.33333 = 480 480/1.333333 = 360

If I currently do this it completely wipes out our memory as I pass in an array 480*320*2 and it uses 480*360*2=345600
Which can still fit in DMAMEM, especially if last area.

But what I want is for it to do is to constrain it's output to 480x320, probably centered...
How to do? Input clip?

c) subset of b) with no scaling, so just take a window of 480x320 in the source rotate it, convert the output format and output it into the memory used to display it...

Now just trying to figure out how best to do...

Now back to pulling some more hair out of my bald head.

T4 Pixel Pipeline Library

Senior Member+

Senior Member+

Well-known member

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Senior Member+

Well-known member

Senior Member+

Well-known member

Senior Member+

Senior Member+

Well-known member

Well-known member

Senior Member+

Well-known member

Senior Member+