SARCASM: an (over-engineered) Rubik's cube solving robot.

vindar

Well-known member
Hi everyone,

Here is a project I have been working on and off since covid, which I finally found some time to complete this summer... So I proudly present:

S.A.R.C.A.S.M : a Slightly Annoying Rubik's Cube Automatic Solving Robot.

Here is a short video in action:


My goal was to create a "desk toy" I can play with when I get bored at work... :cool:

Its main features are:
  1. Compact: it does not take much desk space.
  2. Autonomous: battery powered with USB-C charging port (also used for flashing code), no internet or external computer required.
  3. Near instant power on/off: this ruled out using an rPi Zero for the mainboard...
  4. Compatible with any standard Rubik’s Cubes: no alteration of the cube is required (it works with hard-to-turn cubes, non-standard sticker colors...).
  5. Robust: it can withstand children mishandling the machine (e.g. interfering with the motors, misplacing the cube...) without risk of damage.
  6. Fun, perfectly useless, and completely over-engineered: many unnecessary gimmicks (rgb lights, audio, 3D graphics...). Well, as you can guess from the design, I am a Portal fan.
Below is another (longer) video of the machine showing some of its features.


Currently, there are over 900 dialogs and 100 songs loaded in (thanks to ChatGPT for helping create the dialogs!), so using it still feels fresh every time. I am quite pleased with the result (and my kids love to play with it).

From a technical pov, I think this project really showcases the power of the Teensy 4.1. In particular, here, the T4.1 performs
  • JPEG decoding and color analysis from cube images (taken by an ESPCAM camera).
  • Running the cube-solving algorithm (using my port of Kociemba's algorithm)
  • Driving the display with real-time 2D/3D graphics (using my graphics library TGX and my display driver library ILI9341_T4).
  • Running real-time audio speech synthesis (using my port of the espeak-ng library for T4).
  • Driving the audio chip via I2S (using PJRC's Audio lib).
  • Driving 2x servos and 1x stepper motor using custom drivers with precise real-time feedback monitoring.
  • Driving the RGBW led strip (using the OctoWS2811 library).

On the hardware side, I created a custom PCB and a 3D printed enclosure (it's a tight fit!). On the software side, this means using many peripherals simultaneously: 2× SPI, 2× Serial, 1× I2C, 1× I2S, SDIO, 2× ADCs, many GPIOs... several interrupts/hardware timers/DMA channels, multithreading... and a lot of memory (980KB RAM, 3 MB of EXTRAM, 7.7 MB FLASH and 400 MB on the SD card). I always like to use Teensies for my project but, for once, using a T4.1 does not feel overkill :)

Here are a few more photos from its inside (a bit outdated: cable routing was improved and the usb data line is now properly shielded).
im1-small.jpg

im2-small.jpg


im3-small.jpg



I will try to provide technical details on the build and the code in another post.
I may also make the github repo public at some point but it is currently too messy...

Cheers,
Arvind
 
That is an amazing project, and I thought that I was THE NURD. The software, hardware, 3d printing and maybe some electrical design by one person are inspiring. You even made it fun for your kids, which I could never do. You have inspired me to submit my automatic bicycle shifter I have been fiddling with for ~ 10 years and never published. It is Teensy 3.2 based. Will have 9000 miles on it in a week or so. Debugging it requires exercise so that's good... But it still sucks unlike your project.
 
Hi,

Thanks for all the kind words :) Here are some details about the build for those interested. It is a long post, so I am splitting it into two parts: one about the hardware and the other about the software.

Hardware

Enclosure:

It was designed with Fusion360 and 3d-printed on an old Prusa MK3S with white, black, and red PETG (except for the rubber feet which are in TPU). It took a few iterations and a couple of filament spools to get it right. I tried to make the design as compact as possible while still making it easy to insert and remove the cube. One difficulty is that the ESPCam has to be far enough from the cube so that it can still focus and take pictures of a whole face. The tumbling mechanism also requires some space, but fortunately it can share the space already needed for the camera.

Electronics:
  1. Power supply.
  • Everything is powered from 2×18650 Li-ion batteries in series (so the voltage ranges from 6.4V to 8.4V).
  • The batteries are monitored by a 2S BMS and charged through a 1A/2S charger module (DDTCCRU) connected to the USB-C port on the back.
  • Charging is at 1A, so there is no need for a special power delivery scheme. It could go to 2A, but this is already fast enough (the machine works several hours with a single charge).
  • The mainboard PCB also contains 2 independent step-down regulators that convert the battery voltage down to 5V. One of them is used to power the servos and the other one for the rest of the electronics.
  • An INA219 voltage/current sensor (connected via I2C to the T4.1) is used to monitor voltage and current through the USB port. The T4.1 also has an analog pin dedicated to sensing the battery voltage (after a voltage divider and clamping by a Zener diode for safety).
  1. Back panel.
  • A big momentary push-button on the back is used to turn on/off the device. It is connected to a Pololu "mini power switch board" (MOSFET-based) that controls all power going out of the batteries and into the machine.
  • The Pololu switch is also controlled by a GPIO of the T4.1 so that the machine can turn itself completely off via software.
  • The USB-C female port on the back is used both for charging and for flashing with the USB data lines connected to the T4.1.
  1. Stepper motor
  • The "cradle" that holds the cube is rotated by a stepper motor (17HM15-0904S, 400 steps/turn, 0.9 A per coil).
  • The stepper is driven by a low-voltage DRV8834 motor driver controlled by the T4.1.
  • An AS5600 magnetic rotary sensor is fixed on the back side of the stepper motor. It is used to monitor the absolute stepper position and is connected to the T4.1 via I2C.
  1. Servos
  • 2 servos are used: one for the "grip" that holds the top face of the cube and the other to "tumble" the cube.
  • Both servos are identical and of the type "cheap 9G micro servo" but with full metal gears and an additional 4th feedback wire: this is just a wire connected to the internal potentiometer of the servo, so that reading the analog voltage on this line provides an approximate position (but it is a crude and noisy signal).
  • The servos are powered by one of the 5V regulators on the mainboard. Each servo may draw up to 1A (the regulator can provide up to 3A). The regulator for the servo is also controlled by a GPIO on the T4.1 so it can programmatically be turned on/off.
  1. Display.
  • This is just the usual Chinese red-board ILI9341 display in 2.4" format with the XPT2040 touchscreen. It is connected to the T4.1 on SPI1.
  • I created a simple PCB for the display that breaks out an IDC connector so that it can be connected to the mainboard by a flat ribbon cable.
  1. ESPCam/Camera.
  • Image acquisition is done using an ESPCam module (ESP32 + OV2640 camera).
  • The ESPCam is connected to the T4.1 via both Serial and SPI (with the ESP32 as master and the T4.1 as slave).
  • The T4.1 controls the ESPCam power through a high-side MOSFET on the mainboard as well as the reset and boot pins of the ESP32. Thus, the T4.1 has complete control over the ESPCam and can turn the module on/off and reprogram it.
  • I designed a simple PCB for the ESPCam which, again, breaks out an IDC connector so that the ESPCam can be cleanly connected to the mainboard by a ribbon cable.
  1. Lighting
  • 25 LEDs from an SK6812 RGBW LED strip (144 LEDs/m). I chose the SK6812 over the WS2812 because it pulses at a higher frequency (1.2 kHz), which improves camera image consistency.
  • The LED strip is controlled by the T4.1 GPIO via a 5V level shifter (SN74LV1T34) located on the mainboard.
  1. Audio
  • A small 3W/4ohms speaker is located at the front of the machine. It may not be the best-quality speaker, but I do not mind because I am looking for old-school/metallic sounds anyway.
  • The speaker is driven by a MAX98357A I2S audio chip located on the mainboard and connected to the T4.1 via I2S.

Mainboard:

I designed a PCB to connect all the components above together. I am no expert in PCB making, so I went the easy way and used the EasyEDA software to design the board and then used JLCPCB to print it. I used their SMT assembly service to solder the small SMT components, so I just had to solder the PTH ones by hand... Honestly, I am really impressed by the cost/quality ratio offered by these Chinese companies. It may not be suitable for professionals, but for hobbyists such as myself, it is a godsend: I ordered 2 boards with SMT assembly and they worked perfectly out of the box...

I used a 4-layer PCB for the mainboard. I might have gotten away with a 2-layer board, but using 4 layers made things much cleaner and let me create proper ground and power planes. As already mentioned, the board itself contains two 5V/3A step-down regulators based on TI's TPS5430 chip. It was fun to learn how to lay out a step-down circuit. As I am a beginner, I stayed pretty close to the recommended design indicated on the chip's datasheet. The PCB also contains the audio circuit based on the MAX98357A I2S audio chip. Again, I played it safe, and my design is heavily inspired by Adafruit's breakout board for this chip. Finally, the mainboard breaks out all the connectors to the display, ESPCam, servos, holds the Pololu switch module and stepper driver and transistors/MOSFET to power on/off different parts of the circuitry.

Here is a photo of the boards.

boards.jpg
 

Software


The codebase is large and makes use of many libraries (several of which I created/ported especially for this project). All the development was done in Visual Studio with the Visual Micro extension. I prefer this setup over VS Code/PlatformIO or the Arduino IDE, and I think it is really good for large projects like this one.


1. Graphics:
  • The ILI9341 display is driven by my ILI9341_T4 library with the SPI clock set to 40 MHz. The driver uses an internal framebuffer in DMAMEM that mirrors the screen contents and uploads only the pixels that change between frames. This permits achieving a high frame rate. From the technical side, the driver uses interrupt-driven DMA transfers, the SPI FIFO, and also requires one IntervalTimer object.
  • All graphics, 2D & 3D, are created using my TGX library. 3D graphics require a z-buffer (150KB) which is stored in DMAMEM.
  • The Rubik's Cube displayed on screen is composed of 26 cubelets. Each cubelet is a mesh with textured faces, so it can either display a single color or a full image on it. Furthermore, for the “idle” animation where the machine shows itself on the screen, I used .obj files obtained from Fusion 360 to create the 3D mesh of the machine. It consists of around 1500 triangles.
  • Fun details: you can see in the some animation where the machine show itself that the screen recursively displays itself, etc... To get this effect, I reused the internal framebuffer (holding the currently displayed image) as the texture for the screen in the 3D mesh. I guess a similar trick is used in the Portal games for creating the recursive effect inside portals...
  • When the machine speaks, a robotic face appears in the background. While it speaks, the lips move and the eyes blink. This is done by using multiple images of the robot (stored in flash).
  • Lip syncing for the voice is controlled by an audio object (derived from PJRC's AudioStream base class) that monitors the speech and chooses an open/intermediate/closed mouth depending on the sonority and current state. Similarly, the blinking follows a random pattern but is reinforced at the beginning and end of a speech to make it more natural.
  • Additional post-processing effects (adding signal noise/wave deformation) are also applied under some specific conditions to make the display more “lively”.
  • The machine initializes everything at boot while it plays the intro sequence on a dedicated thread (using TeensyThread with GPT1 timer). However, this intro is unnecessarily long; in fact, everything is set up after 2 seconds, but I like it longer so it feels like the machine is coming alive :)

2. Audio:
  • Songs and music animations are stored on the T4.1 SD card in .wav mono, 44.1 kHz format. There is no need for anything more sophisticated here since the SD card offers more than enough storage. Music is simply played with the audio library using a wavPlayer object.
  • Speech is done with my port of the espeak-ng library to T4.1. Dialogs are stored in flash in simple text format with some SSML tags (for specific intonation or pauses). In this way, thousands of sentences can be stored easily. When speaking a sentence, the espeak-ng module first parses the sentence, computes the word timestamps, and then sends the audio to the audio library through a custom AudioStream object. This object is also responsible for sending each letter to the console object that displays itself on the screen (audio timestamps are per-word and then linearly interpolated per letter).

3. Lighting
  • The LED strip is driven using the OctoWS2811 library (uses QuadTimer4) and follows the current speech/music using yet another AudioStream object. Different light effects are implemented depending on the current context (speech, cube resolution, idle animation...). In particular, when nothing is happening, the LEDs simulate a slow “breathing” of the machine. The LED strip is also used to illuminate the cube uniformly when taking pictures before a solve.

4. ESPCam / Image Capture
  • The T4.1 mainly communicates with the ESPCam via Serial6 with a custom protocol (a bit like TCP/UDP but much simpler). Packets are numbered and have a checksum to prevent any loss of message. To this end, I created a “SerialPacket” class that is used on both sides (the T4.1 and the ESP32).
  • The ESPCam firmware is flashed directly from the USB-C port of the machine by using the T4.1 as a passthrough: the T4.1 powers on the ESPCam module and puts it into boot mode (it controls its reset/boot and power pins). Then, it forwards data incoming from USBSerial directly to hardware Serial6 to flash the ESP32, so I can just use the usual Arduino environment for flashing... This method works well and does not require physical access to the ESPCam. I could have used OTA updates, this way means I can disable the wifi (and it still works even if the code crashes the ESP32 on startup).
  • The code on the ESPCam is rather simple. All it does is configure the camera and then take pictures whenever requested and send them to the T4.1 in 800×600 JPEG format. To do so, it uses the SPI bus @ 60 MHz with the T4.1 acting as slave. The images received on the T4.1 are decompressed in EXTMEM on the fly using the TJpgDec library. In my tests, this made it possible to stream images up to 25 fps (at the lower 640×480 resolution, if I recall correctly).
  • Note: the ESPCam is currently underused but I planned for possible upgrades later. In particular, I am thinking of using the wifi of the ESP32 to (optionally) connect to an external LLM when a connection is available so that, with the correct prompt, it can play the part of the robot and provide even more varied dialogs... Also, I made a new version of the PCB holding the ESPCam that breaks out the pins to connect an I2S microphone that could be used, together with the LLM, to “speak” with the machine... Well, maybe someday...

5. Cube recognition and moves computation
  • In order to solve the cube, the machine must determine the exact position of each color sticker. To do so, 6 images (one of each face) are taken. The color recognition algorithm is then applied to determine each sticker color. First, color correction is applied to each image to account for the non-uniform lighting of the cube face. Then, a custom “k-means”-style clustering algorithm is used to group facet colors. Here, we can do a little better than a classic k-means because we know that each group possesses 9 elements, that 4 of them are corners, 4 are edges, and 1 is a center. These conditions restrict the state space and make color detection much more robust. The algorithm also does not rely on any “absolute” colors; it works with any cube, provided it really has different colors on each face. It also treats separately the color with the “highest variance” (usually white), which may have markings on the cube. In practice, it works very well.
  • Once the cube state is found, it is fed to my library port of “Kociemba's algorithm,” which is a very efficient algorithm to find a near-optimal cube solution. The basic idea of the algorithm is to look for a factorized solution by considering a well-chosen normal subgroup of the Rubik's cube group. The algorithm then looks for a sequence of moves that bring the cube into the subgroup and then subsequently another sequence of moves (remaining in the subgroup) that brings it to a solved state. Here, the algorithm always finds a solution with at most 22 moves (not far from the cube's god number 20). This is a computationally intensive task and may take up to a couple of seconds on the T4.1. It also uses a lot of memory (4 MB of flash for precomputed tables and 470 KB of DMAMEM, so the graphics buffer needs to be temporarily moved to EXTMEM while computation is ongoing). This task is performed on a separate thread (using TeensyThread) to keep the machine responsive.

6. Motor/stepper movement
  • The stepper motor is controlled by a custom driver. I did not use the (great) TeensyStep library because I needed real-time monitoring of the cradle position. Here the stepper rotation is interrupt-driven (using teensyTimer::GPT2). The driver implements smooth acceleration/deceleration and can reach pretty high speed without missing a step (as shown on the video when spinning the cube fast while running the Kociemba algorithm to find a solution).
  • The servos are also controlled by custom interrupt-driven code (using QuadTimer3 at 50 Hz) that also uses the feedback voltage to compare the theoretical and real positions of the servos to detect a possible stall. This is very nice as it allows detecting precisely when the grip reaches the cube and then stops it at that position to prevent putting excessive force (which makes noise and may damage the servo).
  • The AS5600 sensor for the cradle position as well as the feedback voltages for the servos (and also battery voltage and INA219 sensor) are all polled in yet another timer interrupt (using QuadTimer1) clocked at 400 Hz. I use Pedvide's library with both of the T4.1 ADCs to acquire the analog readings while concurrently using the I2C bus to check the AS5600 and INA219 sensors to minimize the interrupt duration.

7. Misc.
  • The main loop. After the intro sequence, the machine enters the main loop, where it waits for user input, which can be either touching the screen (detected by a GPIO IRQ) or spinning the cradle (detected within the AS5600 sensor polling interrupt). Spinning the cradle triggers the cube-solving routine, while touching the screen just “wakes up” the machine (making it say something). The machine has a general “mood” state that evolves following a simple Markov chain. Its mood oscillates over time, depending on cube solves, between curious/dreaming/sarcastic/... The messages stored in flash are categorized and selected according to the current mood.
  • Randomness. The code uses random numbers in many places to add variety to the experience and to get a fresh experience every time. The generator state is saved in the (emulated) EEPROM of the T4.1, as well as the list of sentences recently used, and when picking a new dialog, the program makes sure not to select one used recently during a previous power-on of the machine.
  • Memory management. The whole program is big, and most of the code has to be put in flash (using FLASHMEM). I also needed to make a few tweaks to Teensyduino's core to move around some buffers between DTCM and DMAMEM. In the end, only 8 KB of DMAMEM and 14 KB of DTCM remain available (and up to 9KB are used by the stack...). The program also leaves only 400 KB free out of the 8 MB of flash. Flashing the code of such a big program also becomes quite lengthy, and I had a few times where the upload failed (but I guess this was the USB cable’s fault)...
Ok, my post is already far too long, so I’ll stop here :) In case anyone is interested in even more details, I created a GitHub repo with the code and schematics of the build: https://github.com/vindar/SARCASM

But beware, it is very messy...
 
On top of all the technical achievements this thing packs a lot of personality! It's one thing to be amazed, but I was amazed and you put a smile on my face!
 
Hi,
Thanks for the nice write-up about the project! :)
Reading the @Minimachines write up and seeing:

"small screen presents a "slow-motion" of the solving operation, in 3D"

I wondered if the SARCASM robot might offer to restore the cube to the original condition so the Human might attempt the moves to complete it?

Though I don't see any buttons for the user to request that and step though.
 
Reading the @Minimachines write up and seeing:

"small screen presents a "slow-motion" of the solving operation, in 3D"

I wondered if the SARCASM robot might offer to restore the cube to the original condition so the Human might attempt the moves to complete it?

Though I don't see any buttons for the user to request that and step though.

Yes, indeed. It should be possible to add features like the one you suggest (or maybe performing a random scrambling of the cube and then guiding the user to solve it).

The ILI9343 screen on the machine has the usual XPT2046 touch controller so it can be used as an input device. Currently, the touchscreen is only used during idle mode: if a touch is detected, it "awakes" the machine, making it speak some sentences or play some music... However, doing this would mean re-working the UI to incorporate "on-screen buttons" while still trying to preserve the general old school feeling of the interface... And I think this might be the most tedious part! :)
 
better motors


It's impressive!
Poor cube... Apparently, they had to replace the cube's core with a reinforced one to prevent it from blowing up every time.
SARCASM may purposely annoy humans, but it does no harm to cubes. :)

I found on reddit the following breakdown of the solve:
  1. startup communication: 4ms
  2. Image aquisition: 4ms
  3. Color recognition: 0.2ms
  4. Solution computation: 0.4ms
  5. Making moves: 94ms
Only 0.4ms to compute a near optimal solution is impressive... Well, I guess we really need Teensy 5 to come out. :p
 
Just curious, since this is such a large project / makes extensive use of the flash memory, would you be able to test my pull request that speeds up flash reading time to see if it makes any observable difference in speed?
 
Back
Top