VoiceBot, a second language fluency-builder built on the amazing Teensy 4.1

VoiceBotX

New member
Hi Paul,

I've been a longtime visitor to the Teensy forum pages when I needed help or inspiration to build up my VoiceBot project. VoiceBot is an easy-to-use and somewhat cheeky fluency-builder for second language learners, independent of language of interest or vocal dimension. It's my ESL-teacher's somewhat geeky response to my students asking the eternal question, "Yes, but teacher, HOW do I practice speaking?" We all know what it's like to learn a second language, how DO we practice speaking? VoiceBot allows learners to capture samples from any audio sample they have access to for speaking practice in class, at home, or with friends. Listen carefully and repeat. VoiceBot captures the learner's sample, analyzes it's parity to the original, and scores the product. Over a short time, samples are repeated and re-scored, and the results begin to enable VoiceBot to target certain sound speech groupings for concentrated practice, and a model of the learner's speech begins to form.

VoiceBot is built on the Teensy 4.1 architecture and takes advantage of the audio adapter and SD card, haptic motor drivers, external ram, and an LED matrix, and when connected to my website via Ethernet or USB serial to Chrome, using Javascript, I can deliver a radically new approach to learning language on Twitch or YouTube via OBS with VoiceBot driving the graphics. VoiceBot decodes the audio stream and re-encodes it as a viseme stream. I feed this stream to my website GUI to simulate a speaker, or what one might call lipsync. I can also see the results of the viseme stream on VoiceBot's built-in LED dotstar matrix. The VoiceBot serial output stream animates a mouth in realtime to speech in the GUI (the input stream can also receive commands to change internal operations of VoiceBot like turning on or off the audio stream).

VoiceBot is also programmed to help learners come back to practicing their speaking as it calls out into the real world in some ways like a budgerigar while flashing little animations across it's face. The face also turns into an animated mouth and an animated orb-like character. The whole project fits nicely into a 3D-printed shell about the size of a large stressball, and rests nicely in a 3D printed holder. The haptic response to audio input simulates the vibrations of the speaker's voicebox during speech and enables the learner to feel their speech compared to a sample. The buttons (the eyes) are programmable, as you might expect, and currently control turning on or off the haptic response without having to go through the GUI (useful if one is watching a movie). This has been a highly engaging project, and at some point, I hope to progress it into a Kickstarter project. I've attached a couple youtube video clips of VoiceBot.

You can see in the demo video that the lip sync is quite nice. In parts there are obvious areas to improve. Artifacts in the audio like echos and musical instruments like pianos and guitars in background music can alter how VoiceBot interprets the viseme from the stream. The best audio is clean audio, produced in a quiet room. I'm happy, though, that many of the issues one might expect with VoiceBot in terms of ambient microphone noise are not much of a problem, and even speech recorded over audio can produce crisp speach effects by moderating the volume. At lower volumes, Voicebot produces better speech visuals with noisy speech when better-produced speech samples are unavailable.

I'm working now to transform my existing web GUI into an fluency training program for ESL learners on Twitch (check it out! VoiceBotX) or in other online scenarios like realtime chat, or perhaps a video game-like thing using the Godot or Epic Game engines. Both have Python capabilities, and Unreal Engine, I know, can read VoiceBot's serial input perfectly. I'd like learners to have the tech in their own hands, but this may be beyond my meagre capabilities.

At the outset, what I can do, though, is broadcast VoiceBot's viseme stream via websockets to a connected user's VoiceBot GUI on their PC across the web. This would control their GUI's lip sync viseme model. I'm currently working with a Node.JS websockets implementation modified to send Voicebot's viseme stream from my own VoiceBot to a learner's GUI. I'm just not ready to send my viseme algorithm into the wild, yet--I'm hoping to make a mint on an iOT platform and keep the code controlled for a longer period. Who wouldn't want a realtime lipsync controller displaying their own speech visually?

VoiceBot1.png VoiceBot2.png


Demo: https://youtu.be/XiHN6hFuGSE


Slideshow: https://youtu.be/zTDE8FaR8pA


VoiceBotDemo1.png VoiceBotsSlideshow.png
 
Last edited:
Back
Top