Maker Arpit Sengar has upcycled an outdated PC speaker into an ultra-low-cost massive language mannequin (LLM) voice assistant constructed atop Google’s Gemini platform — for lower than ₹1,000 (round $12).
“This challenge combines embedded programs and AI [Artificial Intelligence] inference to create an end-to-end conversational assistant,” Sengar explains of the compact field, which incorporates an built-in battery for wire-free operation. “The [Espressif] ESP32 handles real-time audio recording and playback, whereas a Python backend performs: speech-to-text (STT), language understanding, [and] text-to-speech (TTS).”
The challenge relies round an outdated 2″ PC speaker, which supplies the audio outputs. That is put in in a housing which incorporates an Espressif ESP32-WROOM-32 microcontroller growth board with built-in Wi-Fi connectivity, linked to an LM386 amplifier and a TDK InvenSense INMP441 MEMS microphone for enter. There’s additionally a tactile push-button enter to the aspect, and a Prime Energy TP4056 charging module to deal with the battery.
Quite than making an attempt to run a big language mannequin straight on-device, Sengar’s design makes use of websockets to speak with a distant system over Wi-Fi. First, audio is streamed to a speech recognition mannequin based mostly on Whisper; then the ensuing textual content is fed to Google’s Gemini large-language mannequin as a immediate; Gemini’s ensuing output is then fed to the Piper speech synthesis mannequin and streamed again to the ESP32 for playback, all utilizing a Python back-end.
The ESP32 connects to a Python-powered backend, which then linked to speech recognition, speech synthesis, and LLM assistant companies. (📷: Arpit Sengar)
“Take into account this backend because the mind of the system as a result of that is the place all of the processing occurs,” Sengar explains. “For this I’d advocate internet hosting an [Amazon] AWS EC2 occasion assigned with a static IP. Alternatively you possibly can run an area server in your laptop computer and join your ESP32 via [a] hotspot.”
The challenge is documented in full over on Instructables; supply code is accessible on GitHub underneath the permissive MIT license.

