HomeBig DataEmergency Operator Voice Chatbot: Empowering Help

Emergency Operator Voice Chatbot: Empowering Help


Language fashions have been quickly evolving on the planet. Now, with Multimodal LLMs taking over the forefront of this Language Fashions race, you will need to perceive how we are able to leverage the capabilities of those Multimodal fashions. From conventional text-based AI-powered chatbots, we’re transitioning over to voice based mostly chatbots. These act as our private assistants, out there at a second’s discover to are likely to our wants. These days, yow will discover an AI-On this weblog, we’ll construct an Emergency Operator voice-based chatbot. The concept is fairly simple:

  • We converse to the chatbot
  • It listens to understands what we’ve mentioned
  • It responds with a voice observe
Voice ChatBot

Our Use-Case

Let’s think about a real-world situation. We reside in a rustic with over 1.4 billion individuals and with such an enormous inhabitants, emergencies are certain to happen whether or not it’s a medical challenge, a fireplace breakout, police intervention, and even psychological well being help like anti-suicide help and so on.

In such moments, each second counts. Additionally, contemplating the dearth of Emergency Operators and the overwhelming quantity of points raised. That’s the place a voice-based chatbot could make an enormous distinction which might provide fast, spoken help when individuals want it essentially the most.

  • Emergency Help: Instant assist for well being, hearth, crime, or disaster-related queries with out ready for a human operator (when not out there).
  • Psychological Well being Helpline: A voice-based emotional help assistant guiding customers with compassion.
  • Rural Accessibility: Areas with restricted entry to cellular apps can profit from a easy voice interface since individuals typically talk by talking in such areas.

That’s precisely what we’re going to construct. We shall be performing as somebody searching for assist, and the chatbot will play the position of an emergency responder, powered by a big language mannequin.

To implement our voice chatbot, we shall be utilizing the under talked about AI fashions:

  • Whisper (Massive) – OpenAI’s speech-to-text mannequin, operating through GroqCloud, to transform voice into textual content.
  • GPT-4.1-mini – Powered by CometAPI (Free LLM Supplier), that is the mind of our chatbot that may perceive our queries and can generate significant responses.
  • Google Textual content-to-Speech (gTTS) – Converts the chatbot’s responses again into voice so it will probably speak to us.
  • FFmpeg – A helpful library that helps us file and handle audio simply.

Necessities

Earlier than we begin coding, we have to arrange some issues:

  1. GroqCloud API Key: Get it from right here: https://console.groq.com/keys
  2. CometAPI Key
    Register and retailer your API key from: https://api.cometapi.com/
  3. ElevenLabs API Key
    Register and retailer your API key from: https://elevenlabs.io/app/residence
  4. FFmpeg Set up
    Should you don’t have already got it, observe this information to put in FFmpeg in your system: https://itsfoss.com/ffmpeg/

Affirm by typing “ffmeg -version” in your terminal

Upon getting these arrange, you’re able to dive into constructing your very personal voice-enabled chatbot!

Challenge Construction

The Challenge Construction shall be fairly easy and rudimentary and most of our working shall be taking place within the app.py and utils.py python scripts.

VOICE-CHATBOT/

├── venv/                  # Digital setting for dependencies
├── .env                   # Atmosphere variables (API keys, and so on.)
├── app.py                 # Principal utility script
├── emergency.png          # Emergency-related picture asset
├── README.md              # Challenge documentation (non-obligatory)
├── necessities.txt       # Python dependencies
├── utils.py               # Utility/helper capabilities

There are some crucial information to be modified to make sure that all our dependencies are glad:

Within the .env file

GROQ_API_KEY = "

Within the necessities.txt

ffmpeg-python

pydub

pyttsx3

langchain

langchain-community

langchain-core

langchain-groq

langchain_openai

python-dotenv

streamlit==1.37.0

audio-recorder-streamlit

dotenv

elevenlabs

gtts

Setting Up the Digital Atmosphere

We may also need to arrange a digital setting (a superb follow). We shall be doing this in terminal. 

  1. Creation of our digital setting
~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y
Virtual Environmnt creation
  1. Activating our Digital Atmosphere
~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/
Conda Activate
  1. After you end operating the applying, you’ll be able to deactivate the Digital Atmosphere too
~/Desktop/Emergency-Voice-Chatbot$ conda deactivate
Conda Deactivate

Principal Python Scripts

Let’s first discover the utils.py script. 

1. Principal Imports

time, tempfile, os, re, BytesIO – Deal with timing, short-term information, setting variables, regex, and in-memory information.

requests – Makes HTTP requests (e.g., calling APIs).

gTTS, elevenlabs, pydub – Convert textual content to speech, speech to textual content and play/manipulate audio.

groq, langchain_* – Use Groq/OpenAI LLMs with LangChain to course of and generate textual content.

streamlit – Construct interactive net apps.dotenv – Load setting variables (like API keys) from a .env file.

import time

import requests

import tempfile

import re

from io import BytesIO

from gtts import gTTS

from elevenlabs.consumer import ElevenLabs

from elevenlabs import play

from pydub import AudioSegment

from groq import Groq

from langchain_groq import ChatGroq

from langchain_openai import ChatOpenAI

from langchain_core.messages import AIMessage, HumanMessage

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

import streamlit as st

import os

from dotenv import load_dotenv

load_dotenv() 

2. Load your API Keys and initialize your fashions

# Initialize the Groq consumer

consumer = Groq(api_key=os.getenv('GROQ_API_KEY'))

# Initialize the Groq mannequin for LLM responses

llm = ChatOpenAI(

    model_name="gpt-4.1-mini",

    openai_api_key=os.getenv("COMET_API_KEY"), 

    openai_api_base="https://api.cometapi.com/v1"

)

# Set the trail to ffmpeg executable

AudioSegment.converter = "/bin/ffmpeg"
 

3. Changing the Audio file (our voice recording) into .wav format

Right here, we’ll changing our audio in bytes which is completed by AudioSegment and BytesIO and convert it right into a wav format:

def audio_bytes_to_wav(audio_bytes):
   attempt:
       with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav:
           audio = AudioSegment.from_file(BytesIO(audio_bytes))
           # Downsample to cut back file measurement if wanted
           audio = audio.set_frame_rate(16000).set_channels(1)
           audio.export(temp_wav.title, format="wav")
           return temp_wav.title
   besides Exception as e:
       st.error(f"Error throughout WAV file conversion: {e}")
       return None

4. Splitting Audio

We’ll make a operate to separate our audio as per our enter parameter (check_length_ms). We may also make a operate to eliminate any punctuation with the assistance of regex

def split_audio(file_path, chunk_length_ms):
   audio = AudioSegment.from_wav(file_path)
   return  for i in vary(0, len(audio), chunk_length_ms)]


def remove_punctuation(textual content):
   return re.sub(r'[^ws]', '', textual content)

5. LLM Response Technology

Now, to do important responder performance the place the LLM will generate an apt response to our queries. Within the immediate template, we’ll present the directions to our LLM on how they need to reply to the queries. We shall be implementing Langchain Expression Language to do that process. 

def get_llm_response(question, chat_history):
   attempt:
       template = template = """
                   You're an skilled Emergency Response Telephone Operator skilled to deal with vital conditions in India.
                   Your position is to information customers calmly and clearly throughout emergencies involving:


                   - Medical crises (accidents, coronary heart assaults, and so on.)
                   - Hearth incidents
                   - Police/regulation enforcement help
                   - Suicide prevention or psychological well being crises


                   You could:


                   1. **Stay calm and assertive**, as if talking on a telephone name.
                   2. **Ask for and ensure key particulars** like location, situation of the particular person, variety of individuals concerned, and so on.
                   3. **Present rapid and sensible steps** the consumer can take earlier than assist arrives.
                   4. **Share correct, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, and so on.).
                   5. **Prioritize consumer security**, and clearly instruct them what *not* to do as effectively.
                   6. If the scenario entails **suicidal ideas or psychological misery**, reply with compassion and direct them to applicable psychological well being helplines and security actions.


                   If the consumer's question will not be associated to an emergency, reply with:
                   "I can solely help with pressing emergency-related points. Please contact a basic help line for non-emergency questions."


                   Use an authoritative, supportive tone, brief and direct sentences, and tailor your steering to **city and rural Indian contexts**.


                   **Chat Historical past:** {chat_history}


                   **Consumer:** {user_query}
                   """


       immediate = ChatPromptTemplate.from_template(template)
       chain = immediate | llm | StrOutputParser()


       response_gen = chain.stream({
           "chat_history": chat_history,
           "user_query": question
       })


       response_text="".be a part of(record(response_gen))
       response_text = remove_punctuation(response_text)


       # Take away repeated textual content
       response_lines = response_text.break up('n')
       unique_lines = record(dict.fromkeys(response_lines))  # Eradicating duplicates
       cleaned_response="n".be a part of(unique_lines)
       return cleaned_responseChatbot
   besides Exception as e:
       st.error(f"Error throughout LLM response technology: {e}")
       return "Error"

6. Textual content to Speech

We’ll construct a operate to transform our textual content to speech with the assistance of ElevenLabs TTS Consumer which is able to return us the Audio within the AudioSegment format. We will additionally use different TTS fashions like Nari Lab’s Dia or Google’s gTTS too. Eleven Labs offers us some free credit at begin after which now we have to pay for extra credit, gTTS on the opposite aspect is totally free to make use of.

def text_to_speech(textual content: str, retries: int = 3, delay: int = 5):
   try = 0
   whereas try 

7. Create Introductory Message

We may also create an introductory textual content and cross it to our TTS mannequin since a respondent would usually introduce themselves and search what help the consumer would possibly want. Right here we shall be returning the trail of the mp3 file.

lang=” en” -> English

tld= ”co.in” -> can produce completely different localized ‘accents’ for a given language. The default is “com”

def create_welcome_message():
   welcome_text = (
       "Hiya, you’ve reached the Emergency Assist Desk. "
       "Please let me know if it is a medical, hearth, police, or psychological well being emergency—"
       "I am right here to information you straight away."
   )
   attempt:
       # Request speech synthesis (streaming generator)
       response_stream = tts_client.text_to_speech.convert(
           textual content=welcome_text,
           voice_id="JBFqnCBsd6RMkjVDRZzb",
           model_id="eleven_multilingual_v2",
           output_format="mp3_44100_128",
       )
       # Save streamed bytes to temp file
       with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
           for chunk in response_stream:
               f.write(chunk)
           return f.title
   besides requests.ConnectionError:
       st.error("Did not generate welcome message resulting from connection error.")
   besides Exception as e:
       st.error(f"Error creating welcome message: {e}")
   return None

Streamlit App

Now, let’s leap into the important.py script the place we shall be utilizing Streamlit to visualise our Chatbot. 

Import Libraries and Capabilities

Import our libraries and the capabilities we had inbuilt our utils.py

import tempfile

import re  # This may be eliminated if not used

from io import BytesIO

from pydub import AudioSegment

from langchain_core.messages import AIMessage, HumanMessage

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

import streamlit as st

from audio_recorder_streamlit import audio_recorder

from utils import *

Streamlit Setup

Now, we’ll set our Title title and good “Emergency” visible picture

st.title(":blue[Emergency Help Bot] 🚨🚑🆘")
st.sidebar.picture('./emergency.jpg', use_column_width=True)

We’ll set our Session States to maintain monitor of our chats and audio

if "chat_history" not in st.session_state:
   st.session_state.chat_history = []
if "chat_histories" not in st.session_state:
   st.session_state.chat_histories = []
if "played_audios" not in st.session_state:
   st.session_state.played_audios = {}
 

Invoking our utils capabilities

We’ll create our welcome message introduction from the Respondent aspect. This would be the begin of our dialog.

if len(st.session_state.chat_history) == 0:
   welcome_audio_path = create_welcome_message()
   st.session_state.chat_history = [
       AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)
   ]
   st.session_state.played_audios[welcome_audio_path] = False 

Now, within the sidebar we’ll organising our voice recorder and the speech-to-text, llm_response and the text-to-speech logic which is the primary crux of this venture

with st.sidebar:
   audio_bytes = audio_recorder(
       energy_threshold=0.01,
       pause_threshold=0.8,
       textual content="Converse on clicking the ICON (Max 5 min) n",
       recording_color="#e9b61d",   # yellow
       neutral_color="#2abf37",    # inexperienced
       icon_name="microphone",
       icon_size="2x"
   )
   if audio_bytes:
       temp_audio_path = audio_bytes_to_wav(audio_bytes)
       if temp_audio_path:
           attempt:
               user_input = speech_to_text(audio_bytes)
               if user_input:
                   st.session_state.chat_history.append(HumanMessage(content material=user_input, audio_file=temp_audio_path))
                   response = get_llm_response(user_input, st.session_state.chat_history)
                   audio_response = text_to_speech(response) 

We may also setup a button on the sidebar which is able to permit us to restart our session if wanted be and naturally our introductory voice observe from the respondent aspect.

if st.button("Begin New Chat"):
       st.session_state.chat_histories.append(st.session_state.chat_history)
       welcome_audio_path = create_welcome_message()
       st.session_state.chat_history = [
           AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)
       ]

And in the primary web page of our App, we shall be visualizing our Chat Historical past within the type of Click on to Play Audio file

for msg in st.session_state.chat_history:
   if isinstance(msg, AIMessage):
       with st.chat_message("AI"):
           st.audio(msg.audio_file, format="audio/mp3")
   else:  # HumanMessage
       with st.chat_message("consumer"):
           st.audio(msg.audio_file, format="audio/wav")

Now, we’re accomplished with all the Python scripts wanted to run our app. We’ll run the Streamlit App utilizing the next Command: 

streamlit run app.py 

So, that is what our Challenge Workflow seems to be like:

[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio 

For the complete code, go to this GitHub repository.

Last Output

Voice Bot Output

The Streamlit App seems to be fairly clear and is functioning appropriately!

Let’s see a few of its responses:-

  1. Consumer: Hello, somebody is having a coronary heart assault proper now, what ought to I do? 

We then had a dialog on the placement and state of the particular person after which the Chatbot supplied this 

  1. Consumer: Hiya, there was an enormous hearth breakout in Delhi. Please ship assist fast 

Respondent enquires in regards to the scenario and the place is my present location after which proceeds to offer preventive measures accordingly 

  1. Consumer: Hey there, there’s a particular person standing alone throughout the sting of the bridge, how ought to i proceed? 

The Respondent enquires in regards to the location the place I’m and the psychological state of the particular person I’ve talked about 

Total, our chatbot is ready to answer our queries in accordance to the scenario and asks the related questions to offer preventive measures.

Learn Extra: Methods to construct a chatbot in Python?

What Enhancements will be made?

  • Multilingual Assist: Can combine LLMs with robust multilingual capabilities which might permit the chatbot to work together seamlessly with customers from completely different areas and dialects.
  • Actual-Time Transcription and Translation: Including speech-to-text and real-time translation might help bridge communication gaps.
  • Location-Based mostly Providers: By integrating GPS or different real-time location-based APIs, the system can detect a consumer’s location and information the closest emergency services.
  • Speech-to-Speech Interplay: We will additionally use speech-to-speech fashions which might make conversations really feel extra pure since they’re constructed for such functionalities.
  • Advantageous-tuning the LLM: Customized fine-tuning of the LLM based mostly on emergency-specific information can enhance its understanding and supply extra correct responses.

To study extra about AI-powered voice brokers, observe these assets:

Conclusion

On this article, we efficiently constructed a voice-based emergency response chatbot utilizing a mixture of AI fashions and a few related instruments. This chatbot replicates the position of a skilled emergency operator which is able to dealing with high-stress conditions from medical crises, and hearth incidents to psychological well being help utilizing a peaceful, assertive that may alter the conduct of our LLM to go well with the various real-world emergencies, making the expertise extra lifelike for each city and rural situation.

GenAI Intern @ Analytics Vidhya | Last 12 months @ VIT Chennai
Captivated with AI and machine studying, I am desperate to dive into roles as an AI/ML Engineer or Information Scientist the place I could make an actual impression. With a knack for fast studying and a love for teamwork, I am excited to convey modern options and cutting-edge developments to the desk. My curiosity drives me to discover AI throughout numerous fields and take the initiative to delve into information engineering, guaranteeing I keep forward and ship impactful initiatives.

Login to proceed studying and revel in expert-curated content material.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments