First of all we create a new virtual environment via venv and install the openai, pyttsx3 and SpeechRecognition packages via pip
In our project folder, we create a .json file called secrets. We will use the json file to store our access key
We create a python file and import the json and openai modules, and import our secret key to authenticate ourselves by reading it from the json file via the appropriate module.
import sys import json import openai import pyttsx3 # Text -> Audio conversion import speech_recognition as sr # Audio -> Text conversion # Import API KEY with open(sys.path + '/../secrets.json') as f: # I used this function because the secrets.json file is in the parent directory secrets = json.load(f) api_key = secrets["api_key"] openai.api_key = api_key
We now create 3 functions: listen(), openaiCompletion() and textToAudio().
These functions allow us respectively to listen to what we say and convert it into text, call up the OpenAI API and finally translate the response provided by the API into a voice message.
As we saw in the last article, to communicate with the API we will use the function operai.Completion.create() which will provide us with a response that will be placed in the response variable
# Function that accepts audio by microphone and convert into text def listen(): error = 0 recognizer = sr.Recognizer() # Create a recognizer object # Set microphone as audio source with sr.Microphone() as source: print("Speak now:") audioMessage = recognizer.listen(source) # Convert Audio -> Text try: outputMessage = recognizer.recognize_google(audioMessage, language='it-IT') # Language setted to italian, you can easily change it except : outputMessage = "An error occurred!" error = 1 print (outputMessage) return outputMessage, error # Function that calls OpenAI service with a prompt and returns the response def openaiCompletion(inputMessage): response = openai.Completion.create( model = "text-davinci-003", prompt = inputMessage, temperature = 1, max_tokens = 500, # MAX token used for the functioncall 1000 token is around 750 words top_p = 1, best_of = 20, frequency_penalty = 0.5, presence_penalty = 0 ) print (response.choices.text) return response.choices.text # Function that converts text into audio def textToAudio(text): engine = pyttsx3.init() # Initialize Text -> Audio engine # Set the Text -> Audio engine properties engine.setProperty('rate', 200) # Set the speaking rate (words per minute) engine.setProperty('volume', 1) # Set the volume (0 to 1) engine.say(text) engine.runAndWait()
inputMessage, error = listen() # User audio input if error == 0: try: message = openaiCompletion(inputMessage) # Call OpenAI services with user voice message except: message = "I can't answer" else: message = "I didn't understand" textToAudio(message) # Text -> Audio the response from OpenAI