In this article we are going to talk about Python Speech Recognition Tutorial for Beginners, we will create different examples on converting audio to text and also text to audio, for speech recognition in python we are going to use a third party library that is called Google Speech, so it is a library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/API support:
- CMU Sphinx (works offline)
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech to Text
- Snowboy Hotword Detection (works offline)
Installation
For the installation you can just use pip, also you can download the source distribution from PyPI, and extract the archive. In the folder, run python setup.py install.
1 |
pip install SpeechRecognition |
Learn More on TKinter
- How to Create Countdown Timer with Python & TKinter
- Create GUI Applications with Python & TKinter
- Python TKinter Layout Management
- How to Create Label in TKinter
- How to Create Buttin in Python TKinter
- Build Music Player in Python TKinter
- Python GUI Programming with TKinter
- TKinter VS PyQt, Which one is Good
- Creating Custom Widgets in TKinter
PyAudio (for microphone users)
PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. now if you are using python 3.6 you can install pyaudio using pip install pyaudio, but if you are using Python 3.7 or 3.8 you need to download the .whl file from this website, PyAudio Whl Download. for example in Python 3.7 you can use PyAudio‑0.2.11‑cp37‑cp37m‑win_amd64.whl and use command as, go to the download directory.
1 |
pip install PyAudio‑0.2.11‑cp37‑cp37m‑win_amd64.whl |
This is the installation.

OK now let’s create our first example, in this example we are going to convert our audio to text , we want to say something using Microphone, and after that it will be automatically converted to text and saved in our working directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import speech_recognition as sr def main(): r = sr.Recognizer() with sr.Microphone() as source: r.adjust_for_ambient_noise(source) print("Please say something") audio = r.listen(source) print("Recognizing Now .... ") # recognize speech using google try: print("You have said \n" + r.recognize_google(audio)) print("Audio Recorded Successfully \n ") except Exception as e: print("Error : " + str(e)) # write audio with open("recorded.wav", "wb") as f: f.write(audio.get_wav_data()) if __name__ == "__main__": main() |
In here we have created the object of our Recognizer and also we are using Microphone as source.
1 |
r = sr.Recognizer() |
also we need to add this line of code, it is used for removing noises if we have in the sound.
1 |
r.adjust_for_ambient_noise(source) |
And in here we are recognizing the speech using Google Speech.
1 |
print("You have said \n" + r.recognize_google(audio)) |
If you need to record your audio than you can use this code.
1 2 |
with open("recorded.wav", "wb") as f: f.write(audio.get_wav_data()) |
Run the code say something in the Microphone and this is the result.

Opening Website Using Speech Recognition
OK now let’s create another example, in this time i want to open a website using speech recognition, for example i want to say google.com in my microphone and after that it will open the website automatically for me, so this the code for this example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
import speech_recognition as sr import webbrowser as web def main(): path = "path = "C:/Program Files/Google/Chrome/Application/chrome.exe %s" r = sr.Recognizer() with sr.Microphone() as source: r.adjust_for_ambient_noise(source) print("Please say something ") audio = r.listen(source) print("Reconizing Now ... ") try: dest = r.recognize_google(audio) print("You have said : " + dest) web.get(path).open(dest) except Exception as e: print("Error : " + str(e)) if __name__ == "__main__": main() |
First of all you need to specify the path of your browser, as iam using Google Chrome so this is the path for my browser.
1 |
path = "C:/Program Files/Google/Chrome/Application/chrome.exe %s" |
also for removing noises we need to add this line of code.
1 |
r.adjust_for_ambient_noise(source) |
In here first we recognize the audio and after that we open the website.
1 2 3 4 |
dest = r.recognize_google(audio) print("You have said : " + dest) web.get(path).open(dest) |
Run the code and this is the result.

Convert Recorded Audio To Text
All right guys till now we have learned that how you can convert your audio using microphone in python, now sometimes you need to convert a recorded audio to text, for example we have a recorded audio and we want to convert this audio to text, so this is the complete code for this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import speech_recognition as sr def main(): sound = "recorded.wav" r = sr.Recognizer() with sr.AudioFile(sound) as source: r.adjust_for_ambient_noise(source) print("Converting Audio To Text ..... ") audio = r.listen(source) try: print("Converted Audio Is : \n" + r.recognize_google(audio)) except Exception as e: print("Error {} : ".format(e) ) if __name__ == "__main__": main() |
OK now in this code we have just changed the source, this time we are using not Microphone, but we are using AudioFile.
1 |
with sr.AudioFile(sound) as source: |
Now run the code and this the result, make sure that you have already added a recorded audio in your working directory.

Converting Text To Speech in Python
OK we have learned that how you can convert audio to text using google speech in Python, now we want to learn how you can convert text to audio, for this we are using another library. there are two ways the you can convert your Text to Audio or Speech, the first way is using Google Text To Speech (gTTS) library and the second way is usin pyttx3 library.
What is Google Text To Speech (gTTS) ?
gTTS (Google Text-to-Speech) is a Python library and CLI tool to interface with Google Translate’s text-to-speech API. writes spoken mp3
data to a file, a file-like object (bytestring) for further audio manipulation, or stdout.
gTTs Installation
You can use pip for the installation.
1 |
pip install gTTS |
So now this is the code for our example.
1 2 3 4 5 6 |
from gtts import gTTS tts = gTTS(text="welcome to my website", lang='en') tts.save("record.mp3") print("Text Converted Successfully ") |
This code convert our text to audio and after that save in our working directory.
In the second way we are using pyttsx3 library.
What is pyttsx3 Library ?
pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3.
pyttsx3 Installation
You can use pip for the installation
1 |
pip install pyttsx3 |
If you have received errors such as No module named win32com.client, No module named win32, or No module named win32api, you will need to additionally install pypiwin32.
This is the complete code.
1 2 3 4 5 6 7 8 9 |
import pyttsx3 engine = pyttsx3.init() engine.say("welcome to geekscoders website ") engine.setProperty('rate', 120) engine.setProperty('volume', 0.9) engine.runAndWait() |
Run the code and you will see your text converted to audio.