vindar
Well-known member
Hi,
I am working on a project where I need to speak some voices from a Teensy 4 so I made a Teensy compatible port of the MBROLA program. It a library for "phoneme based" speech synthesis. Instead of the usual TTS program, you cannot enter a text directly. Instead, you have to provide the list of phonemes to speak (which may depend on the voice/language used). It means that you must first convert your text to phonemes. This may be done offline, for instance by using espeak-ng.
I think the quality of the voices are really good for something running on an MCU. Each voice takes between 1.5MB to 4MB of PROGMEM which is huge, but the code itself is compact and it requires only around 50KB of RAM at runtime (which can be allocated anywhere, DTCM, DMAMEM or EXTMEM). Once the initial cost of storing the voice data is paid, phoneme text base instructions take very little memory so the teensy can store hours of speech...
The library is integrated into PJRC's audio library by subclassing the `AudioStream` class and follows the interface and naming convention of a "play" object. The main class is `AudioPlayMBROLA` which can itself be subclassed to be associated with a given voice. The MBROLA voices can be downloaded here and then converted to C++ file with the `voice2cpp` program available in the subdirectory of the library.
This library was made be shamelessly hacking the original MBROLA code and the mimbrola ESP32 port by ethanak. My contribution is mostly limited to writing the interfacing with the audio lib...
If anyone wants to try it, the library is on my github: https://github.com/vindar/MBROLA_T4
Cheers,
I am working on a project where I need to speak some voices from a Teensy 4 so I made a Teensy compatible port of the MBROLA program. It a library for "phoneme based" speech synthesis. Instead of the usual TTS program, you cannot enter a text directly. Instead, you have to provide the list of phonemes to speak (which may depend on the voice/language used). It means that you must first convert your text to phonemes. This may be done offline, for instance by using espeak-ng.
I think the quality of the voices are really good for something running on an MCU. Each voice takes between 1.5MB to 4MB of PROGMEM which is huge, but the code itself is compact and it requires only around 50KB of RAM at runtime (which can be allocated anywhere, DTCM, DMAMEM or EXTMEM). Once the initial cost of storing the voice data is paid, phoneme text base instructions take very little memory so the teensy can store hours of speech...
The library is integrated into PJRC's audio library by subclassing the `AudioStream` class and follows the interface and naming convention of a "play" object. The main class is `AudioPlayMBROLA` which can itself be subclassed to be associated with a given voice. The MBROLA voices can be downloaded here and then converted to C++ file with the `voice2cpp` program available in the subdirectory of the library.
This library was made be shamelessly hacking the original MBROLA code and the mimbrola ESP32 port by ethanak. My contribution is mostly limited to writing the interfacing with the audio lib...
If anyone wants to try it, the library is on my github: https://github.com/vindar/MBROLA_T4
Cheers,