
Voice Assistant using GPT-3 in Python
๐ต๐ก Is it possible to have a voice assistant that exploits the GPT language model? ๐ก๐ต
โSince I made the post about creating a chatbot using GPT3, I asked myself: “can I exploit this language model on a voice assistant?”
To find out I did the obvious thing, I opened the documentation of the API released by OpenAI hoping that there was already something about it. (Spoiler: there wasn’t)
๐ก That’s when the enlightenment came to me: Exploit two python libraries to convert text to audio and vice versa so that the API would only interpret text and output a text string with the answer and afterwards it would be interpreted translated into a voice message.
๐ค And how would you exploit a voice assistant that leverages GPT3? Tell me in the comments
Code Analysis
Phase 1
First of all we create a new virtual environment via venv and install the openai, pyttsx3 and SpeechRecognition packages via pip
pip install openai
pip install pyttsx3
pip install SpeechRecognition
In our project folder, we create a .json file called secrets. We will use the json file to store our access key
{
"api_key": "your-secret-key"
}
Phase 2
We create a python file and import the json and openai modules, and import our secret key to authenticate ourselves by reading it from the json file via the appropriate module.
import sys
import json
import openai
import pyttsx3 # Text -> Audio conversion
import speech_recognition as sr # Audio -> Text conversion
# Import API KEY
with open(sys.path[0] + '/../secrets.json') as f: # I used this function because the secrets.json file is in the parent directory
secrets = json.load(f)
api_key = secrets["api_key"]
openai.api_key = api_key
We now create 3 functions: listen(), openaiCompletion() and textToAudio().
ย
These functions allow us respectively to listen to what we say and convert it into text, call up the OpenAI API and finally translate the response provided by the API into a voice message.
ย
As we saw in the last article, to communicate with the API we will use the function operai.Completion.create() which will provide us with a response that will be placed in the response variable
# Function that accepts audio by microphone and convert into text
def listen():
error = 0
recognizer = sr.Recognizer() # Create a recognizer object
# Set microphone as audio source
with sr.Microphone() as source:
print("Speak now:")
audioMessage = recognizer.listen(source)
# Convert Audio -> Text
try:
outputMessage = recognizer.recognize_google(audioMessage, language='it-IT') # Language setted to italian, you can easily change it
except :
outputMessage = "An error occurred!"
error = 1
print (outputMessage)
return outputMessage, error
# Function that calls OpenAI service with a prompt and returns the response
def openaiCompletion(inputMessage):
response = openai.Completion.create(
model = "text-davinci-003",
prompt = inputMessage,
temperature = 1,
max_tokens = 500, # MAX token used for the functioncall 1000 token is around 750 words
top_p = 1,
best_of = 20,
frequency_penalty = 0.5,
presence_penalty = 0
)
print (response.choices[0].text)
return response.choices[0].text
# Function that converts text into audio
def textToAudio(text):
engine = pyttsx3.init() # Initialize Text -> Audio engine
# Set the Text -> Audio engine properties
engine.setProperty('rate', 200) # Set the speaking rate (words per minute)
engine.setProperty('volume', 1) # Set the volume (0 to 1)
engine.say(text)
engine.runAndWait()
Phase 3
inputMessage, error = listen() # User audio input
if error == 0:
try:
message = openaiCompletion(inputMessage) # Call OpenAI services with user voice message
except:
message = "I can't answer"
else:
message = "I didn't understand"
textToAudio(message) # Text -> Audio the response from OpenAI