Python Speech Recognition Tutorial for Beginners

In this article we are going to talk about Python Speech Recognition Tutorial for Beginners, we will create different examples on converting audio to text and also text to audio, for speech recognition in python we are going to use a third party library that is called Google Speech, so it is a library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/API support:

 

  • CMU Sphinx (works offline)
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
  • Snowboy Hotword Detection (works offline)

 

 

 

 

Installation 

For the installation you can just use pip, also you can download the source distribution from PyPI, and extract the archive. In the folder, run python setup.py install.

 

 

 

Learn More on TKinter

 

 

 

PyAudio (for microphone users)

PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. now if you are using python 3.6 you can install pyaudio using pip install pyaudio, but if you are using Python 3.7 or 3.8 you need to download the .whl file from this website, PyAudio Whl Download. for example in Python 3.7 you can use  PyAudio‑0.2.11‑cp37‑cp37m‑win_amd64.whl and use command as, go to the download directory.

 

 

 

This is the installation.

Python Installation
Python Installation

 

 

OK now let’s create our first example, in this example we are going to convert our audio to text , we want to say something using Microphone, and after that it will be automatically converted to text and saved in our working directory.

 

 

In here we have created the object of our Recognizer and also we are using Microphone as source. 

 

 

also we need to add this line of code, it is used for removing noises if we have in the sound.

 

 

And in here we are recognizing the speech using Google Speech.

 

 

If you need to record your audio than you can use this code.

 

 

 

Run the code say something in the Microphone and this is the result.

Python Speech Recognition Tutorial for Beginners
Python Speech Recognition Tutorial for Beginners

 

 

 

Opening Website Using Speech Recognition 

OK now let’s create another example, in this time i want to open a website using speech recognition, for example i want to say google.com in my microphone and after that it will open the website automatically for me, so this the code for this example.

 

 

 

First of all you need to specify the path of your browser, as iam using Google Chrome so this is the path for my browser.

 

 

also for removing noises we need to add this line of code.

 

 

In here first we recognize the audio and after that we open the website.

 

 

 

Run the code and this is the result.

Python Speech Recognition Opening Website
Python Speech Recognition Opening Website

 

 

 

Convert Recorded Audio To Text

All right guys till now we have learned that how you can convert your audio using microphone in python, now sometimes you need to convert a recorded audio to text, for example we have a recorded audio and we want to convert this audio to text, so this is the complete code for this.

 

 

OK now in this code we have just changed the source, this time we are using not Microphone, but we are using AudioFile.

 

 

Now run the code and this the result, make sure that you have already added a recorded audio in your working directory.

Python Speech Recognition Convert Recorded Audio
Python Speech Recognition Convert Recorded Audio

 

 

 

Converting Text To Speech in Python

OK we have learned that how you can convert audio to text using google speech in Python, now we want to learn how you can convert text to audio, for this we are using another library. there are two ways the you can convert your Text to Audio or Speech, the first way is using Google Text To Speech (gTTS) library and the second way is usin pyttx3 library.

 

 

What is Google Text To Speech (gTTS) ?

gTTS (Google Text-to-Speech) is a Python library and CLI tool to interface with Google Translate’s text-to-speech API. writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout.

 

 

gTTs Installation 

You can use pip for the installation.

 

 

So now this is the code for our example.

 

This code convert our text to audio and after that save in our working directory.

 

In the second way we are using pyttsx3 library.

 

 

What is pyttsx3 Library ?

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3. 

 

pyttsx3 Installation

You can use pip for the installation 

If you have received errors such as No module named win32com.clientNo module named win32, or No module named win32api, you will need to additionally install pypiwin32.

 

 

This is the complete code.

 

 

Run the code and you will see your text converted to audio.

 

 

 

Leave a Comment