Speech to Text conversion




I am interested to convert speech/audio files to text and then apply data science techniques to analyse the data.
Can someone recommend what open source API can be used to convert audio files to text with high accuracy?

I tried to use SpeechRecognition from https://pypi.org/project/SpeechRecognition/ but somehow I am not getting correct text for given audio file.

Any good tutorial with Python or Java will be good?


Hi @premdutt,

Before analyzing the speech/audio data, the data needs to be preprocessed. You can follow this article:

It explains all the preprocessing steps for audio data and a practical overview in a real life project as well.


thank you Pulkit for your response,

I did not find reference of converting audio to text in the given link.
The analysis is required on text and not really on audio data. I am trying the sample given at https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py

but this is not converting audio to text correctly for below audio files. these are simple sentences from one person.
Sample Audio files http://www.voiptroubleshooter.com/open_speech/american.html

is there a way to configure speech_recognition in the sample to improve the accuracy?

thanks again for your inputs.


Hi @premdutt,

Here is a resource which you can follow to learn various techniques for speech to text conversion:

They have provided the implementations in python as well.