Forum Rule: Always post complete source code & details to reproduce any issue!
Page 1 of 3 1 2 3 LastLast
Results 1 to 25 of 52

Thread: Uncanny Eyes is getting expensive

  1. #1
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325

    Cool Uncanny Eyes is getting expensive

    As some/many of you know, I have this thing for the Adafruit uncanny eyes project, tracking it from using it on a Teensy 3.2, 3.5, 3.6, Teensy 4.0, and Teensy 4.1. In the 3.x era, I used the 128x128 TFT and OLED displays (both from Adafruit and elsewhere). With the Teensy 4.0 and 4.1, I now use the 240x240 IPS displays, both from Adafruit and elsewhere.

    Recently there was discussion about the round eye displays, and needless to say, I needed to get some. KurtE and mjs513 (maybe defragster also) have modified the ILI9341_t3 (and ST7789_t3/ST7789_t3n) library to work with the waveshare round eye displays. So naturally I needed to update my set of displays. I got it, and it works well (creepy of course, but that is the nature of the beast).

    I'm in the middle of modding a staff from Spirit Halloween to have an interactive display. While at the moment, I'm using a Hallowing M4 for the project (due to the eye that I was using is not yet ported to Teensy), I would like to move it to the Teensy and the round eye display. I looked around ebay for suppliers to buy an additional round didplay. It looks like the stock of US suppliers is down from when I ordered in the past. But I discovered a Canadian supplier, which had a new variant of the display. So I ordered another 2 displays from them.

    I decided I need to combine the pin-outs for the various SPI displays so that I could solder up a prototype board and easily move from display to display without having to dedicate a Teensy for each display (or change the pin connections). Unfortunately in moving the connections around, I mis-plugged one of the Adafruit 240x240 square displays, and wound up burning it out. So I went to Adafruit and ordered a replacement.

    Finally after finalizing the sketch for the square display, I decided to do the round display next. In setting it up, I noticed I had cracked one of the displays, so I destroyed two displays in one day (or more likely, I had cracked the round display earlier, and I just noticed it today).

    If you don't know what I'm talking about, the original uncanny eyes for the Teensy 3.2 is at:


    Note, the author has stopped working with Teensies and is now concentrating on other processors (after the Teensy he went to the Raspberry Pi, and then onto the Adafruit M4 processors, including the Hallowing M0 with the 128x128 display and Hallowing M4/Monster M4SK with the 240x240 display).

    After the Teensy 4.0 release, the above people reworked the uncanny eyes for the Teensy 4.x processors. The ST7735_t3 library has both the original uncanny Eyes (uncannyEyes7735) for 128x128 displays as well as the Teensy 4.x version for 240x240 displays (uncannyEyes_async_st7789_240x240). The GC9A010A driver on github has its version of the program (uncannyEyes_GC9A01A).

    Thread that announced the new driver:


    <edit>
    In terms of USA suppliers, unlike 2-3 years ago, none of the cheap 240x240 square displays (without the CS pin) seem to be sold by USA dealers. In pre-supply chain shortages, these were fairly plentiful. It looks like you can still get the displays if you are willing to order from China and wait for the shipping. You can get the Adafruit square displays (the 1.3" and 1.54") from Adafruit and the distributors.

    When I ordered the round displays previously in June, they were also more plentiful, but now the stock seems to be tighter.
    Last edited by MichaelMeissner; 09-10-2022 at 08:50 AM.

  2. #2
    I run a pair of Uncanny Eyes on a 10" LCD (ER-TFTM10-1) with an RA8876 controller using a Teensy 3.6. I'm getting about 50 frames a second for both 128x128 eyes by only outputting rows that have data. When building a eye's row I mark a boolean true if a pixel is non-zero. When sending out the data to the controller I output the line only if the boolean is true or if the boolean was true the last time the same line & eye was written. I then save the boolean in an array, one boolean per row per eye. If you have gobs of memory you could save the scanline after outputting and compare it next time through, outputting the new scanline only if it has changed. That doubles the memory needed for the eye, so I thought one boolean per scanline was a good compromise.

  3. #3
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325

    Cool

    @dundakitty: Sounds like a reasonable optimization for the driver (I assume you did it in the driver and not in the uncanny eyes code). You could use less memory by instead of using a single byte for the boolean for each row, use an uint8_t variable and do shifting/masking to make each byte hold 8 rows modified status.

    Of course it depends how much each row changes from display to display.

    On the chance people are interested in what my pinout scheme is, there were some design considerations:
    • Each pinout has 10 pins: Ground, power, clock, MOSI, MISO, main CS pin, D/C pin, reset pin, secondary CS pin, blink pin. I put the 10 pins in a standard layout. I use a display specific cable to hook up the display to those pins so switching displays is a matter of switching just a cable. The blink pin is often just wired to 3.3v.
    • I color code the wires, fortunately there are 10 standard wire colors.
    • The secondary CS pin should be able to do PWM, analog read, as well as being a serial TX pin to run WS2812Serial on it (the intention is if you aren't reading data from a micro SD card, you might want to hook up a servo, potentiometer, and/or neopixel to that pin.
    • The first display should only use pins 0..23, so that you can use a Teensy 4.0 without soldering wires underneath when using a single display;
    • The second display should use pins common to the Teensy 4.0 and 4.1 (pins 0..33). This means using pin 1 as the MISO pin for the secondary display and not pin 39 as the alternate MISO pin. It assumes that a Teensy 4.0 being used has pins 24..33 in the same position as with the Teensy 4.1.
    • I avoided pins used by the audio adapter other than the standard I2C and SPI pins (6, 7, 8, 10, 20, 21, 23).
    • I avoided pin 3 since I often use that as a common push button pin.
    • I avoided pin 17/A3 since I often use that as my neopixel pin.
    • I avoided pin 2 since sometimes I have used that as the I2C interrupt pin (the prop shield also uses pin 2 in this fashion).
    • I left 2 standard pins for use with analog read (pins 15/A1 and 16/A2) -- note the audio adapter also uses pin 15/A1 for analog inputs to allow a potentiometer to be soldered to the board.
    • I left 2 pins for one Serial UART (pins 28 and 29 for Serial7). Serial1 (pins 0 and 1) could also be used if you are just using one display.


    The pins I chose are:
    • Power (red wire): Typically 3.3v (optionally VIN)
    • Ground (black wire): Ground
    • Clock (green wire): pin 13 and pin 27/A13 (required)
    • MOSI (purple wire): pin 11 and pin 26/A12 (required)
    • MISO (gray wire): pin 12 and pin 1 (pin 1 is required for Teensy 4.0 operation)
    • Standard CS (yellow wire): pin 4 and pin 0
    • D/C (brown wire): pin 5 and pin 22/A8
    • Reset (blue wire): pin 9 and pin 25/A11
    • Secondary CS (white wire): pin 14/A0 and pin 24/A10
    • Blink (orange wire): Typically hard wired to 3.3v.


    In the potential designs for protoboard layout, many of the pins can be overridden by jumper wires, so I'm not locked into those defaults. The power can be switched between 3.3v and VIN. The standard clock, MOSI, and first display MISO are fixed, but the second display MISO can be jumpered to pin 39.
    Last edited by MichaelMeissner; 09-10-2022 at 08:05 PM.

  4. #4
    Oh wow, this looks like a fun project, thanks for bringing it to my attention. I might order a couple of these and see how it goes. I don't suppose anyone's tried adding a camera and face detection via e.g. OpenCV so the eyes can follow people's faces? Imagine a costume/mask with couple of these eyes out on stalks covered with these that could look at people...

  5. #5
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Quote Originally Posted by chris.nz View Post
    Oh wow, this looks like a fun project, thanks for bringing it to my attention. I might order a couple of these and see how it goes. I don't suppose anyone's tried adding a camera and face detection via e.g. OpenCV so the eyes can follow people's faces? Imagine a costume/mask with couple of these eyes out on stalks covered with these that could look at people...
    The Adafruit learning guide has a howto on how to add PIR (infrared heat sensor) support to the original uncanny eyes (that supported the Teensy 3.2) to build a skull where the eyes tracks you:


    And as I've said elsewhere, now that we have the MTP support, it would be nice if we could retrofit the new code (that uses flash memory to hold the bitmaps to be used, and you can change which eye to use without recompiling it). The trouble is the new code changed how the display is updated to use the M4 zero DMA support, and for the Teensy 4.x support, I would imagine the DMA is quite different.

  6. #6
    Quote Originally Posted by MichaelMeissner View Post
    @dundakitty: Sounds like a reasonable optimization for the driver (I assume you did it in the driver and not in the uncanny eyes code). You could use less memory by instead of using a single byte for the boolean for each row, use an uint8_t variable and do shifting/masking to make each byte hold 8 rows modified status.

    Of course it depends how much each row changes from display to display.
    ...
    My project is a Literary Clock, see https://www.youtube.com/watch?v=YcSkkDHdfg0
    On Halloween there is a one-in-ten chance that the uncanny eyes code is activated. The text is drawn first, then a few seconds later a skull is drawn in the center of the display and two eyes are animated for the remainder of the minute. The screen is then cleared and a new literary quote is displayed, possibly followed by the skull.

    The optimization is in the uncanny eye code, not the RA8876 driver. This allows the optimization to run separately per eye, as each eye is only updating a portion of the 1024 x 600 10" display.
    The following code should look familiar:
    Code:
    typedef struct {        // Struct is defined before including config.h --
      int8_t  wink;         // and wink button (or -1 if none) specified there,
      uint8_t rotation;     // also display rotation.
      int16_t x_off;
      int16_t y_off;
    } eyeInfo_t;
    
    #include "eye_config.h"
    #define NUM_EYES (sizeof eyeInfo / sizeof eyeInfo[0])
    
    // A simple state machine is used to control eye blinks/winks:
    #define NOBLINK 0       // Not currently engaged in a blink
    #define ENBLINK 1       // Eyelid is currently closing
    #define DEBLINK 2       // Eyelid is currently opening
    typedef struct {
      uint8_t  state;       // NOBLINK/ENBLINK/DEBLINK
      uint32_t duration;    // Duration of blink state (micros)
      uint32_t startTime;   // Time (micros) of last state change
    } eyeBlink;
    
    struct {                // One-per-eye structure
      eyeBlink     blink;   // Current blink/wink state
    } eye[NUM_EYES];
    
    boolean lineNotBlank[NUM_EYES][SCREEN_HEIGHT];
    
    void drawEye( // Renders one eye.  Inputs must be pre-clipped & valid.
      uint8_t  e,       // Eye array index; 0 or 1 for left/right
      uint16_t iScale,  // Scale factor for iris (0-1023)
      uint16_t  scleraX, // First pixel X offset into sclera image
      uint16_t  scleraY, // First pixel Y offset into sclera image
      uint8_t  uT,      // Upper eyelid threshold value
      uint8_t  lT) {    // Lower eyelid threshold value
    
      uint8_t  screenX, screenY, er;
      boolean notBlank;
      uint16_t scleraXsave;
      int16_t  irisX, irisY;
      uint16_t p, a;
      uint32_t d;
    
      uint16_t colors[2][SCREEN_WIDTH];
      uint16_t *bp;
    
      uint32_t irisThreshold = (SCREEN_WIDTH * (1023 - iScale) + 512) / 1024;
      uint32_t irisScale     = IRIS_MAP_HEIGHT * 65536 / irisThreshold;
    
      // Set up raw pixel dump to entire screen.  Although such writes can wrap
      // around automatically from end of rect back to beginning, the region is
      // reset on each frame here in case of an SPI glitch.
    
      scleraXsave = scleraX; // Save initial X value to reset on each line
      irisY       = scleraY - (SCLERA_HEIGHT - IRIS_HEIGHT) / 2;
      er = eyeInfo[e].rotation;
      notBlank = false;
      for (screenY = 0; screenY < SCREEN_HEIGHT; screenY++, scleraY++, irisY++) {
        bp = colors[screenY & 1];
        scleraX = scleraXsave;
        irisX   = scleraXsave - (SCLERA_WIDTH - IRIS_WIDTH) / 2;
        for (screenX = 0; screenX < SCREEN_WIDTH; screenX++, scleraX++, irisX++) {
          if ((lower[screenY][screenX] <= lT) ||
              (upper[screenY][screenX] <= uT)) {             // Covered by eyelid
            p = 0;
          } else if ((irisY < 0) || (irisY >= IRIS_HEIGHT) ||
                     (irisX < 0) || (irisX >= IRIS_WIDTH)) { // In sclera
            p = sclera[scleraY][scleraX];
          } else {                                          // Maybe iris...
            p = polar[irisY][irisX];                        // Polar angle/dist
            d = p & 0x7F;                                   // Distance from edge (0-127)
            if (d < irisThreshold) {                        // Within scaled iris area
              d = d * irisScale / 65536;                    // d scaled to iris image height
              a = (IRIS_MAP_WIDTH * (p >> 7)) / 512;        // Angle (X)
              p = iris[d][a];                               // Pixel = iris
            } else {                                        // Not in iris
              p = sclera[scleraY][scleraX];                 // Pixel = sclera
            }
          }
    
          if (er != 0) {
            bp[(SCREEN_WIDTH-1)-screenX] = p;
          } else {
            bp[screenX] = p;
          }
          if (p != 0) notBlank = true;
        } // end column
        if (notBlank || lineNotBlank[e][screenY]) {
          while(tft.activeDMA) delayMicroseconds(20);
          tft.writeRect(eyeInfo[e].x_off, eyeInfo[e].y_off+screenY, SCREEN_WIDTH, 1, bp);
        }
        lineNotBlank[e][screenY] = notBlank;
      } // end scanline
    }
    I agree, if you're animating many eyes it would be more space efficient to use a uint8_t (or uint16_t if needed) instead of a boolean for lineNotBlank, then use a bitmask per eye (1>>eye).

  7. #7
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    For the record, I paid for the round eyes from the Canadian supplier on Thursday September 1st, and they arrived today (Monday September 12th). It might have shaved a few days off the delivery if there had been a USA supplier, but it still beats the average Asian suppliers in terms of delivery speed..

    Both eyes look great and work well. For debugging, it is nice that the eyes have 7 male pins that I can plug directly into a breadboard with 36 rows of pins. Ultimately of course, I will need to use cables to install the displays, but it is nice for debugging to keep things altogether. The waveshare board had a 2mm connector on the board, and 8 separate female 0.1" pins. From a mounting perspective, it is better to have two holes in the new display that I can easily drill through, rather than having M2 screws that I have to line up when drilling the mounting holes.

  8. #8
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325

    Cool

    And my luck is continuing. I now have moved Uncanny Eyes code into a separate library that supports both the ST7789 (240x240 square display) and GC9A01A (240x240 round display). After I got the 2nd Adafruit display and I verified both of my Adafruit displays now work, I dug out the 2 non-CS boards that I had and I attached them, changing the CS to be -1.

    One display works great. The other display corrupts the bottom 1/2 of the screen. But with the working display, I verified that it does work also.

    So I now have separate .ino sketches that pull in the library:

    • 2 GC9A01A round displays;
    • 1 GC9A01A round display;
    • 2 ST7789 square displays, both with a CS pin;
    • 2 ST7789 square displays, one with a CS pin, one without;
    • 2 ST7789 square displays, both without a CS pin;
    • 1 ST7789 square display with a CS pin; (and)
    • 1 ST7789 square display without a CS pin.


    I discovered that the arguments to GC9A01A's displayType constructor are in a different order to the ST7789 displayType constructor. This means if you call the ST7789 class constructor with GC9A01A argument order, it will crash when initializing the displays:

    Code:
      // Initialize eye objects based on eyeInfo list in config.h:
      for (e = 0; e < NUM_EYES; e++) {
        Serial.print("Create display #"); Serial.println(e);
    #if USE_GC9A01A
        //eye[e].display     = new displayType(&TFT_SPI, eyeInfo[e].cs,
        //                       DISPLAY_DC, -1);
        //for SPI
        //(TFT_CS, TFT_DC, TFT_MOSI, TFT_SCLK, TFT_RST);
        eye[e].display = new displayType(eyeInfo[e].cs, eyeInfo[e].dc, eyeInfo[e].rst,
                                         eyeInfo[e].mosi, eyeInfo[e].sck);
    #endif
    
    #if USE_ST7789
        //eye[e].display     = new displayType(&TFT_SPI, eyeInfo[e].cs,
        //                       DISPLAY_DC, -1);
        //for SPI
        //(TFT_CS, TFT_DC, TFT_MOSI, TFT_SCLK, TFT_RST);
        eye[e].display = new displayType(eyeInfo[e].cs, eyeInfo[e].dc,
                                         eyeInfo[e].mosi, eyeInfo[e].sck, eyeInfo[e].rst);
    
        // ...
      }
    Similarly, how you start up the display is different:

    Code:
      // After all-displays reset, now call init/begin func for each display:
      for (e = 0; e < NUM_EYES; e++) {
    
    #if USE_GC9A01A
        eye[e].display->begin();
        Serial.printf("Init GC9A01A display #%d, rotation %d\n",
    		  e, eyeInfo[e].rotation);
    #endif
    
    #if USE_ST7789
        // Try to handle the ST7789 displays without CS PINS.
        if (eyeInfo[e].cs < 0)
          eye[e].display->init(240, 240, SPI_MODE2);
        else
          eye[e].display->init();
    
        Serial.printf("Init ST7789 display #%d, rotation %d\n",
    		  e, eyeInfo[e].rotation);
    #endif
    
        eye[e].display->setRotation(eyeInfo[e].rotation);
      }
      Serial.println("done");
    Originally in creating the library, I moved the main parts of the code into separate .cpp files. But I discovered it doesn't work, because a lot of the font functions will get duplicate function messages from the linker, as each of the drivers wants to pull in the library functions, and each has the same name. Of course, I'm probably the one person crazy enough to want two separate displays in a program that have different drivers. So I just made the code into a .h file, and the .ino sketch using #include to bring it in.

    One other thing I learned (which is obvious once I thought about it). Originally, I had all of the constants (sizes, pin numbers, etc.) as 'extern const' instead of just using '#define' or normal const so that I could set these in the .ino file and call library. But it makes the code a little slower, since the compiler can't do the constant optimization. So when I moved the drivers back to being in .h files, and removed the 'extern const' declarations, it ran a little faster (by 1-2 fps).

    The GC9A01A display is faster than the ST7789 display. I suspect this is due to the GC9A01a driver not sending pixels that won't be displayed.

    Another thing that I noticed is if you run 2 eyes, the fps rate is about double as with 1 eye, That is because on the Teensy 4.x it uses asynchronous updates with DMA, and the Teensy can go and work on the setting up for the second eye display while the data is still being transferred to the first display. With a single eye, it has to wait for the eye to finish before returning from loop.

  9. #9
    Have you tried the "skip blank scanline" optimization? I'm curious what speed up you see.

  10. #10
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Quote Originally Posted by dundakitty View Post
    Have you tried the "skip blank scanline" optimization? I'm curious what speed up you see.
    No unless the ST7789_t3 or GC9A01A libraries do it. I haven't really dug into the code, I've mostly just been re-packaging it so that I could more easily change which pins are used, etc. The code I'm using right now just does a drawpixel and lets the library handle updating the display via DMA. The original Uncanny Eyes code for the 128x128 displays did use the lower level details to optimize things via SPI.

    Do you have a pointer to the optimization?

    There is a complete rewrite of the code for the Adafruit M4 processors (specifically the Hallowing M4 and the Monster M4SK boards) that uses the M4 Zero DMA code to build the display. Each eye is handled separately, rather than doing both eyes, and waiting until the transfer is done. Ultimately, it would be nice if we could use the high level code, and possibly switch to the Teensy libraries. But I'm not sure I want to dive into at that level.

  11. #11
    The M4-specific code is much more complex than the original. The blank-scanline optimization helps during eye blinks. The optimization is already partially present in the M4 version, handling the upper eyelid. I'll look at adding additional code for the lower eyelid and for the case of eyelid color not black.

  12. #12
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Quote Originally Posted by dundakitty View Post
    The M4-specific code is much more complex than the original. The blank-scanline optimization helps during eye blinks. The optimization is already partially present in the M4 version, handling the upper eyelid. I'll look at adding additional code for the lower eyelid and for the case of eyelid color not black.
    Great, thanks.

  13. #13
    The screens I ordered arrived yesterday and they worked first go using the default set of pins in the uncanny eyes code. Very cool!

    Click image for larger version. 

Name:	PXL_20221002_145112180.jpg 
Views:	26 
Size:	104.2 KB 
ID:	29484

  14. #14
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Some timings:

    • Teensy 4.1 using the GC9A010A_t3 driver (round 240x240), 2 eyes: 42 frames/second. The SPI bus is 48 Mhz. Bumping the SPI bus to 99 Mhz did not change the fps.
    • Teensy 4.1 using the ST7789_t3 driver (square 240x240), 2 eyes: 36 frames/second. The SPI bus is 48 Mhz.
    • Teensy 4.1 using the ST7735_t3 driver (square 128x128 TFT), 2 eyes: 6 frames/second. Note, this uses the newer code that doesn't do the special optimizations only available in the Teensy 3.x processors. It just uses the standard ST773_t3 driver. It uses 2 SPI buses. I had to disable USE_ASYNC_UPDATES, since it doesn't seem to work with the ST7735 driver.
    • Teensy 3.5 using the original uncanny Eyes with the built-in optimizations instead of the library driver (square 128x128 OLED), 2 eyes: 37 frames/second. Note, the OLED displays will start glitching if I use a SPI bus speed over 11 Mhz.
    • Teensy 3.2 using the original uncanny Eyes with the built-in optimizations instead of the library driver (square 128x128 TFT), 2 eyes: 60 frames/second. SPI bus speed is set to 23 Mhz.
    • I don't (yet) have a version of SSD1351 that uses the library instead of the built-in optimizations, so I can't use my new combined code with it.


    Note, I suspect the reason the GC9A010A_t3 driver is somewhat faster than the ST7789_t3 driver may be the library is smart and not transfering pixels that the display does not have. But this is a guess, I haven't investigated.

    The 128x128 displays might be faster than the 240x240 displays, since the 240x240 displays have 3.5 times more pixels than the 128x128 displays, so it has to send more data down the SPI bus.

    At least with the 5 year old TFT 128x128 displays that I have, you pretty much have to look at the display directly, or you can't see the image. The OLED 128x128 and the 240x240 displays can be seen at a much wider angle than the TFT 128x128 display can be seen.

    The OLED 128x128 vs TFT 128x128 display shows how much the SPI bus can affect things (11 Mhz vs. 23 Mhz).

    I need to build the variant of the code using the ST7735_t3 driver on the Teensy 3.2 to better compare using the ST7735_t3 library on the Teensy 3.2 compared to the original code doing the screen writing directly. I.e. on a Teensy 3.x microprocessor, the ST7735_t3 library knows how to use the special CS/DC pins that the Teensy 3.x microprocessors have. Unfortunately, the current code I have overflows the flash memory of the Teensy 3.2 and 3.5. I suspect some of the larger sections for the 240x240 displays aren't garbage collected as being unused.

    I don't believe the ST7735_t3 library on the Teensy 4.x uses the special pins (pins 10, 37, and 36 on the T4.1 first SPI bus, 0 and 38 on the 2nd SPI bus). Even if it does, the DC or CS in my setup doesn't use the special pins. I am currently using CS = 22, DC = 9 on the first SPI display, and CS = 0, DC = 24 on the second SPI display.

    Alternatively, I need to experiment not using the special pins on Teensy 3.x to see if the slow down to not using the special CS pins is due to that optimization.
    Last edited by MichaelMeissner; 10-03-2022 at 08:01 AM.

  15. #15
    Quote Originally Posted by MichaelMeissner View Post
    Teensy 4.1 using the GC9A010A_t3 driver (round 240x240), 2 eyes: 42 frames/second.
    Does this mean 42 fps per screen, or 42 fps in total, 21 fps on each screen? I'm seeing around 46 fps across my two GC9A010A screens here, i.e. ~23 fps per screen.

    I've started thinking about a few optimisations, currently not for speed but for space (but I hope to look at speed at some point). The first thing I've tried is change the eyelid data format so it just stores the start/stop row of each eyelid column rather than a full greyscale map. The code is here (proof of concept, not well tested and only supporting 240x240 defaultEye currently). This is similar to what the M4_Eyes code does, only I've pregenerated a lookup table in Python rather than at runtime like the M4 code does. This change results in defaultEye.h being about 40% smaller, without any performance impact. I'm not likely to try getting this working with any of the 128x128 eyes since I don't have a screen to test them with (and I'm not especially interested in using 128x128 anyway), but the space savings should be similar if implemented for those.

    I'm also considering converting all the other 240x240 M4_Eyes over to work with this codebase, adding some of the M4 features in the process. The M4 generates all its lookup tables on-the-fly when loading an eye, but I'm leaning more towards a mix of pre-processing and on-the-fly generation to try and get a better balance of flash/memory requirements vs runtime CPU. It might mean creating an optimised data/file format to store eyes in, ultimately so they can be easily and quickly be loaded dynamically. Before I go too far down this route though, I'd be interested to hear if you have any view on this or have maybe done some related work already?

  16. #16
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325

    Cool

    Quote Originally Posted by chris.nz View Post
    Does this mean 42 fps per screen, or 42 fps in total, 21 fps on each screen? I'm seeing around 46 fps across my two GC9A010A screens here, i.e. ~23 fps per screen.
    Yes and no. When you are doing 2 eyes, it does both eyes with frame buffers and starts the transfer. At the end, it waits for both eyes to finish before returning, and bumping the frame count as two. If you are only doing one eye, then the fps is 1/2 since it does the eye, and it has to wait for the transfer to finish.

    Quote Originally Posted by chris.nz View Post
    I've started thinking about a few optimisations, currently not for speed but for space (but I hope to look at speed at some point). The first thing I've tried is change the eyelid data format so it just stores the start/stop row of each eyelid column rather than a full greyscale map. The code is here (proof of concept, not well tested and only supporting 240x240 defaultEye currently). This is similar to what the M4_Eyes code does, only I've pregenerated a lookup table in Python rather than at runtime like the M4 code does. This change results in defaultEye.h being about 40% smaller, without any performance impact. I'm not likely to try getting this working with any of the 128x128 eyes since I don't have a screen to test them with (and I'm not especially interested in using 128x128 anyway), but the space savings should be similar if implemented for those.
    Fair enough. Of course with Teensy 3.2, 3.5, and 3.6 being hard to get, and the 128x128 displays being older tech, they are somewhat less interesting. However, they do have a bunch of alternate eyes that are available.

    Quote Originally Posted by chris.nz View Post
    I'm also considering converting all the other 240x240 M4_Eyes over to work with this codebase, adding some of the M4 features in the process. The M4 generates all its lookup tables on-the-fly when loading an eye, but I'm leaning more towards a mix of pre-processing and on-the-fly generation to try and get a better balance of flash/memory requirements vs runtime CPU. It might mean creating an optimised data/file format to store eyes in, ultimately so they can be easily and quickly be loaded dynamically. Before I go too far down this route though, I'd be interested to hear if you have any view on this or have maybe done some related work already?
    Sounds interesting. So far, I haven't really looked at the code to actually display the eye. Besides having other eyes, I do like the ability on the M4 eyes system to change the eyes being displayed without having to rebuild the code. For example, I recently built a dragon staff using a dragon eye cane from Spirit Halloween, and a Hallowing M4. It had a red demon eye, and I wanted a green demon eye. Because the M4 eye code just reads BMP files and a text config.eye file, I was able to go in with gimp, and change the BMP from being red based to being green based, and fiddle with the colors in the config.eye file, all without having to convert the file to a header file to include in the program and rebuild it.

    I do wish the M4 code had an option to cycle through eyes under button control. I understand why they did it (basically the M4 doesn't have space to hold more than 1 eye pattern, and if you do it on the fly, you risk having the local memory fragmented. But in a Teensy 4.1, we have a lot more space for local SRAM, flash memory, and the ability to use the remainder of flash as a file system.

    Click image for larger version. 

Name:	2022-09-11-17-57-023-staff.jpg 
Views:	22 
Size:	114.3 KB 
ID:	29503

    The new round eyes can add new dimensions to the eyes use in props.

  17. #17
    Quote Originally Posted by MichaelMeissner View Post
    Yes and no. When you are doing 2 eyes, it does both eyes with frame buffers and starts the transfer. At the end, it waits for both eyes to finish before returning, and bumping the frame count as two. If you are only doing one eye, then the fps is 1/2 since it does the eye, and it has to wait for the transfer to finish.
    It sounds like the performance I'm seeing is similar to yours then, thanks for clarifying.

    Quote Originally Posted by MichaelMeissner View Post
    However, they do have a bunch of alternate eyes that are available.
    That's kinda how I ended up where I am - I started by trying to convert the existing 128x128 eyes to 240x240. Some were easy (doe, newt, terminator) but the others were proving quite tricky. Then I saw all the M4 240x240 eyes so started to look into using those too/instead.

    Quote Originally Posted by MichaelMeissner View Post
    Besides having other eyes, I do like the ability on the M4 eyes system to change the eyes being displayed without having to rebuild the code.
    Yes, that's where I want to get to too. Intially I'll probably just try to support multiple eyes compiled into the binary that can be swapped at runtime, but from there it shouldn't be too hard to add support for loading the same sort of (precomputed) eye data from flash/SD. The precomputing could ultimately be done by either the PC or Teensy. If done on the Teensy the results could be cached on SD so the work only needs to happen once per eye, rather than every time the eye is loaded (which I think is the case with the M4 code). Lots of possibilities...

  18. #18
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Quote Originally Posted by chris.nz View Post
    Yes, that's where I want to get to too. Intially I'll probably just try to support multiple eyes compiled into the binary that can be swapped at runtime, but from there it shouldn't be too hard to add support for loading the same sort of (precomputed) eye data from flash/SD. The precomputing could ultimately be done by either the PC or Teensy. If done on the Teensy the results could be cached on SD so the work only needs to happen once per eye, rather than every time the eye is loaded (which I think is the case with the M4 code). Lots of possibilities...
    Given you only read the eye images from storage once when the microprocessor is booted, and it doesn't take that long, I'm not convinced that trying to optimize the loading will mean the code runs faster during the main loop. Changing the internal data structures to be more efficient might help, since that is run every time the loop is done. Optimizing the display so less data is transmitted via SPI would presumably help quite a bit.

    If you need to hold the entire frame in memory that would 113 be kilobytes (240 * 240 * 2). Given the Teensy 4.x has 512 kilobytes of main SRAM, that is somewhat tight. You could only hold 1 or 2 full frames. And of course the M4 processors the code is written for don't have that much memory, I believe the code has process things in chunks, and it only deals with the area of the screen that changes (i.e. the iris).

  19. #19
    Quote Originally Posted by dundakitty View Post
    Have you tried the "skip blank scanline" optimization? I'm curious what speed up you see.
    The M4 code draws one column of pixels at a time, rather than row by row, so their logic ends up somewhat convoluted. I just implemented a similar optimisation however. I keep track of where the upper/lower eyelids extend to for each column, and only call drawPixel() if the pixel wasn't already an eyelid pixel on the previous frame. This reduces the drawPixel() calls by about half or more. The GC9A01A driver (when in framebuffer mode anyway) also keeps track of the mix/max extents of the buffer that have been affected (see GC9A01A_t3n::updateChangedRange()). It then uses that to optimise what is output when updateScreen() is called, skipping the unchanged rows. The end result is a speedup of about 10%, I'm now getting around 52 fps.

  20. #20
    Quote Originally Posted by MichaelMeissner View Post
    Given you only read the eye images from storage once when the microprocessor is booted, and it doesn't take that long, I'm not convinced that trying to optimize the loading will mean the code runs faster during the main loop.
    It's not so much the storage loading time I think will need improving, it's the various transformations required. E.g. look at the comment on the bottom of this page: "Note that using slitPupilRadius makes the program a bit slower to initialize… you’ll just see blank screens for several seconds while it works. This is normal and just an unfortunate math thing".

    Quote Originally Posted by MichaelMeissner View Post
    Changing the internal data structures to be more efficient might help, since that is run every time the loop is done.
    Exactly. Like many optimisations, it comes down to a tradeoff between space (storage and memory) vs CPU.

    Quote Originally Posted by MichaelMeissner View Post
    Optimizing the display so less data is transmitted via SPI would presumably help quite a bit.
    See my previous post for an initial attempt at this
    Last edited by chris.nz; 10-04-2022 at 04:54 PM.

  21. #21
    It's still very much a work-in-progress, but I've got a proof-of-concept working that demonstrates hot-swapping eyes at runtime:

    https://youtu.be/8gHoKqfpp0w

    There's still lots I need to do before this is genuinely useable, though that's mostly to do with configuring and precomputing eyes with all their different options, loading them from SD etc, rather than the hotswapping itself.

  22. #22
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Quote Originally Posted by chris.nz View Post
    It's still very much a work-in-progress, but I've got a proof-of-concept working that demonstrates hot-swapping eyes at runtime:

    https://youtu.be/8gHoKqfpp0w

    There's still lots I need to do before this is genuinely useable, though that's mostly to do with configuring and precomputing eyes with all their different options, loading them from SD etc, rather than the hotswapping itself.
    Cool! I can imagine that second eye pattern being effective with some halloween props.

  23. #23
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    16,479
    Quote Originally Posted by MichaelMeissner View Post
    Cool! I can imagine that second eye pattern being effective with some halloween props.
    Indeed, like movies/TV where a character's eyes go from normal - to demonic

  24. #24
    Quote Originally Posted by defragster View Post
    Indeed, like movies/TV where a character's eyes go from normal - to demonic
    That eye is just the built in "newt" eye from Uncanny Eyes, which I believe originated here. But yes, that eye definitely has something of a Halloween feel to it. One thing I'm planning on trying out at some point is cross-fading between two eyes as I think that might look pretty interesting (or disturbing!). If both eyes have the same mappings (as these two do) then the performance won't be impacted much as it really just comes down to a lerp between the two chosen colours which is very cheap on CPU.

  25. #25
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,325
    Hmmm, I need to do is solder up another Audio adapter to see if I can add sound and/or connect an external I2S device.

    I suspect we will need to add some yield calls to get the interrupts handled for sound playback.

    A simpler solution is to add a dfplayer chip to do sound outside of the Teensy.

    <edit>
    One of my Teensy 4.1's had the stacking headers installed, so I could use the audio adapter I had already soldered up (*). I put in some quick code in to play some Halloween mono audio sounds I had converted from:


    via 'wav2sktech' to .h/.cpp files. It seemed to work well, but there was some dead air time. I suspect I need to dive into the eye drivers to see if there is a convenient place to do a user callback so it can immediately know when to start playing the next audio sample.

    I also discovered that the way I have built these audio files will get a linker error (DTCM size exceeded) on the Teensy 4.1 (they did all fit on the Teensy 3.5 back in the day), so I need to rework the files to put them in flash memory or similar.

    But the real solution is to move the audio files to using a SD card and/or LittleFS flash memory rather than keeping them in the local program. That would allow me eventually to use stereo files and vary them without having to rebuild.

    Getting back to the original posting, I have all of the Teensies with the eyes (Teensy 4.1 with the round 240x240, Teensy 4.1 with the square 240x240, Teensy 3.5 with the OLED 128x128, and Teensy 3.2 with the TFT 128x128) perched on top of the closed cover for my work laptop (I use external monitors and keyboards). The round Teensy got knocked to the ground as I was adjusting the other teensy and a part came off of one of the displays. It is somewhat flimsy, but it works (for now). I suspect however, I will need to think about getting yet another of these devices, as it may not be robust.

    If you are curious, the last 2 Teensy 4.1's I got (that I was using for the two 240x240 displays), I bought them from protosupplies with the PSram and flash memory chips soldered in. But the Teensies didn't come with stacking headers. So at some point, I needed to solder up a shim so the audio shield sits on a prototype board that can mount on the adjacent pins in the breadboard.
    Last edited by MichaelMeissner; 10-07-2022 at 04:10 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •