Post Image
By enricolacchin26 April 2023In CodeLinkedin

Voice Assistant using GPT-3 in Python

๐Ÿ”ต๐ŸŸก Is it possible to have a voice assistant that exploits the GPT language model? ๐ŸŸก๐Ÿ”ต

โ“Since I made the post about creating a chatbot using GPT3, I asked myself: “can I exploit this language model on a voice assistant?”

To find out I did the obvious thing, I opened the documentation of the API released by OpenAI hoping that there was already something about it. (Spoiler: there wasn’t)

๐Ÿ’ก That’s when the enlightenment came to me: Exploit two python libraries to convert text to audio and vice versa so that the API would only interpret text and output a text string with the answer and afterwards it would be interpreted translated into a voice message.

๐Ÿ‘ค And how would you exploit a voice assistant that leverages GPT3? Tell me in the comments

Code Analysis

Phase 1

First of all we create a new virtual environment via venv and install the openai, pyttsx3 and SpeechRecognition packages via pip

pip install openai
pip install pyttsx3
pip install SpeechRecognition

In our project folder, we create a .json file called secrets. We will use the json file to store our access key

    "api_key": "your-secret-key"
Phase 2

We create a python file and import the json and openai modules, and import our secret key to authenticate ourselves by reading it from the json file via the appropriate module.

import sys
import json
import openai
import pyttsx3 # Text -> Audio conversion
import speech_recognition as sr # Audio -> Text conversion

# Import API KEY 
with open(sys.path[0] + '/../secrets.json') as f: # I used this function because the secrets.json file is in the parent directory
    secrets = json.load(f)
    api_key = secrets["api_key"]

openai.api_key = api_key

We now create 3 functions: listen(), openaiCompletion() and textToAudio().


These functions allow us respectively to listen to what we say and convert it into text, call up the OpenAI API and finally translate the response provided by the API into a voice message.


As we saw in the last article, to communicate with the API we will use the function operai.Completion.create() which will provide us with a response that will be placed in the response variable

# Function that accepts audio by microphone and convert into text
def listen():
    error = 0
    recognizer = sr.Recognizer() # Create a recognizer object

    # Set microphone as audio source
    with sr.Microphone() as source:
        print("Speak now:")
        audioMessage = recognizer.listen(source)

    # Convert Audio -> Text
        outputMessage = recognizer.recognize_google(audioMessage, language='it-IT') # Language setted to italian, you can easily change it      
    except :
        outputMessage = "An error occurred!"
        error = 1

    print (outputMessage)    
    return outputMessage, error

# Function that calls OpenAI service with a prompt and returns the response
def openaiCompletion(inputMessage):
    response = openai.Completion.create(
        model = "text-davinci-003",
        prompt = inputMessage,
        temperature = 1,
        max_tokens = 500, # MAX token used for the functioncall 1000 token is around 750 words
        top_p = 1,
        best_of = 20,
        frequency_penalty = 0.5,
        presence_penalty = 0
    print (response.choices[0].text)   
    return response.choices[0].text

# Function that converts text into audio
def textToAudio(text):
    engine = pyttsx3.init() # Initialize Text -> Audio engine

    # Set the Text -> Audio engine properties
    engine.setProperty('rate', 200)  # Set the speaking rate (words per minute)
    engine.setProperty('volume', 1)  # Set the volume (0 to 1)

Phase 3

Having completed both phase 1 and phase 2, all we need to do is to recall the created functions in order and manage the exeptions in our actual application.

inputMessage, error = listen() # User audio input

if error == 0:
        message = openaiCompletion(inputMessage) # Call OpenAI services with user voice message
        message = "I can't answer"
    message = "I didn't understand"

textToAudio(message) # Text -> Audio the response from OpenAI
svgTonale PHEV Media Drive Stockental | 14 April 2023
svg๐Ÿ‡ฎ๐Ÿ‡น Italian Republic Day 2023