Voice Assistant using GPT-3 in Python
🔵🟡 Is it possible to have a voice assistant that exploits the GPT language model? 🟡🔵
❓Since I made the post about creating a chatbot using GPT3, I asked myself: “can I exploit this language model on a voice assistant?”
To find out I did the obvious thing, I opened the documentation of the API released by OpenAI hoping that there was already something about it. (Spoiler: there wasn’t)
💡 That’s when the enlightenment came to me: Exploit two python libraries to convert text to audio and vice versa so that the API would only interpret text and output a text string with the answer and afterwards it would be interpreted translated into a voice message.
👤 And how would you exploit a voice assistant that leverages GPT3? Tell me in the comments
First of all we create a new virtual environment via venv and install the openai, pyttsx3 and SpeechRecognition packages via pip
pip install openai pip install pyttsx3 pip install SpeechRecognition
In our project folder, we create a .json file called secrets. We will use the json file to store our access key
We create a python file and import the json and openai modules, and import our secret key to authenticate ourselves by reading it from the json file via the appropriate module.
import sys import json import openai import pyttsx3 # Text -> Audio conversion import speech_recognition as sr # Audio -> Text conversion # Import API KEY with open(sys.path + '/../secrets.json') as f: # I used this function because the secrets.json file is in the parent directory secrets = json.load(f) api_key = secrets["api_key"] openai.api_key = api_key
We now create 3 functions: listen(), openaiCompletion() and textToAudio().
These functions allow us respectively to listen to what we say and convert it into text, call up the OpenAI API and finally translate the response provided by the API into a voice message.
As we saw in the last article, to communicate with the API we will use the function operai.Completion.create() which will provide us with a response that will be placed in the response variable
# Function that accepts audio by microphone and convert into text def listen(): error = 0 recognizer = sr.Recognizer() # Create a recognizer object # Set microphone as audio source with sr.Microphone() as source: print("Speak now:") audioMessage = recognizer.listen(source) # Convert Audio -> Text try: outputMessage = recognizer.recognize_google(audioMessage, language='it-IT') # Language setted to italian, you can easily change it except : outputMessage = "An error occurred!" error = 1 print (outputMessage) return outputMessage, error # Function that calls OpenAI service with a prompt and returns the response def openaiCompletion(inputMessage): response = openai.Completion.create( model = "text-davinci-003", prompt = inputMessage, temperature = 1, max_tokens = 500, # MAX token used for the functioncall 1000 token is around 750 words top_p = 1, best_of = 20, frequency_penalty = 0.5, presence_penalty = 0 ) print (response.choices.text) return response.choices.text # Function that converts text into audio def textToAudio(text): engine = pyttsx3.init() # Initialize Text -> Audio engine # Set the Text -> Audio engine properties engine.setProperty('rate', 200) # Set the speaking rate (words per minute) engine.setProperty('volume', 1) # Set the volume (0 to 1) engine.say(text) engine.runAndWait()
inputMessage, error = listen() # User audio input if error == 0: try: message = openaiCompletion(inputMessage) # Call OpenAI services with user voice message except: message = "I can't answer" else: message = "I didn't understand" textToAudio(message) # Text -> Audio the response from OpenAI