Constructing your individual customized AI assistant that’s powered by a locally-running giant language mannequin is simple today. Strategies like mannequin pruning and quantization have made it potential to run moderately highly effective fashions on modest computing platforms. Furthermore, many hobbyists have already designed their very own AI assistants and printed their work, giving us all loads of assets to lean on when constructing our personal initiatives.
However there are AI assistants after which there are AI assistants. Positive, you would slap collectively a number of elements and create one thing that will look nice hidden away in your closet (like this system constructed by yours actually). However what if you wish to have an AI assistant sitting in your desk or bookshelf with out scaring away non-techie company? If you wish to step up your hacking recreation, then Simone Marzulli has obtained a venture that it would be best to try.
A high-level overview of system operation (📷: Simone Marzulli)
Marzulli constructed an AI assistant, however this one has a pleasant case and enormous show with an animated character to make it not solely sensible, but additionally presentable. Higher nonetheless, it’s not only a chatbot, however quite it’s an AI agent. Meaning you can provide it a listing of digital duties to do, and it’ll discover the most effective instruments to perform them, then perform your requests. Even with these added capabilities, the system, referred to as Max Headbox, nonetheless runs every little thing 100% regionally.
Max Headbox is constructed round a Raspberry Pi 5, which offers simply sufficient horsepower to run small however succesful open-source AI fashions. A customized case homes the Pi, cooling fan, and a compact show that reveals the assistant’s cheerful inexperienced emoji face. A coloured ribbon circling the pinnacle signifies what the system is doing: blue when it’s listening, crimson whereas recording, and rainbow when the mannequin is busy producing a response.
On the software program aspect, Marzulli leveraged a stack of established open-source instruments. A React Vite entrance finish communicates with a Sinatra backend, which handles microphone management and recording. Audio enter is handed to faster-whisper, a high-performance reimplementation of OpenAI’s Whisper, for speech-to-text transcription. To allow hands-free operation, a wake phrase detection system constructed with Vosk listens in standby mode and mechanically shuts down when the language mannequin is lively to keep away from CPU competition.
For managing the massive language fashions themselves, Marzulli selected Ollama, a light-weight framework for operating open fashions regionally through an API. This might not be a everlasting answer, but it surely supplied a fast path to getting the proof-of-concept working. The fashions he settled on had been Qwen3 1.7b and Gemma3 1b. Bigger fashions like Qwen2.5 3b carried out higher when it comes to accuracy, however added an excessive amount of latency for the Raspberry Pi’s modest {hardware}.
The Qwen3 mannequin was used to deal with agentic activity execution, whereas Gemma3 was used as a conversational mannequin. Responses from the conversational mannequin embody each a response for the consumer and an related “feeling” similar to completely satisfied, , or confused. These map to emoji graphics displayed alongside the textual content, giving the assistant a touch of character.
Loads of particulars can be found within the venture’s weblog put up, so you should definitely give it a learn in case you are eager about constructing your individual AI-powered assistant any time quickly.