Reply To: Speech to Text Converter / Oct 9 2021

Oct 9, 2021 at 6:17 pm #18609299 Quote

Philip o staiger

Guest

I know some of the developers they’re very cool gang and they make some fabulous tools.

I’ve even been involved with one of them, the subtitle translator tool.

I have been using speech to text from VovSoft for a while. Yes it does require an IBM Watson account but it’s free. Once you have the account you need to look in your account interface for creating an API key it’s not trivial but it’s somewhat doable and definitely worth the effort looking for the instructions and following them. I don’t know of many other voice transcription system instead of free but of course there are probably some open source alternatives if you want to go that route. For me I like to use this because it has a very easy to use interface in the front end and you don’t even see the IBM connection once you have the API key properly configured – it just works.

They also have another tool which is free and brings the text back into speech format. It uses the windows narrator underneath it so you have already what you need for that on your PC and they add a fabulous easy to use interface to either hear it read to you or to save it to wave file or MP3… but that’s a different tool.

Ping me if you need some tips on how to configure that IBM API key. My name is Phil and you can find me at the best 3d dot com > about – I don’t recall exactly where to go but didn’t take too long to figure it out and I did use the instructions from VovSoft which were very helpful.

If you have never dealt with API keys, you might encounter the concept in future dealings with crypto software too. Imagine if you had a username in a password to log in somewhere but instead of those two separate items you had them merged into a single unique key. From an end user perspective that’s basically what an API key is. It identifies you as a legit user when you make a request to transcribe something in this case.

Have fun or just trust me this is worth the effort it’s really good. Of course underneath it it does depend on the quality of the voice recognition in the IBM Watson engine. I can tell you I have seen better but I have also seen much worse. The technology underneath it for proper automatic speech recognition (ASR) is constantly evolving revenue deep learning / AI neural technology. Some of the competing ASR do better than others. But it will also depend on the language and the quality of the audio recording. Some will return the sentence with the first letter capitalized others will not. You can see the same in your web browser – between Chrome and Edge, edge does much better. Some engines will not find the proper punctuation at the end. You might have to say “question mark” in order to get a “?” at the end of a question. You don’t have to do that with the Microsoft edge web speech API but you do have to do that with the Google Chrome web speech API.

In summary if you’re doing any sort of audio recordings such as for training material in a video, you will find this a very useful tool for part of the work you might need to do thereafter if you want to have it written down or subtitled in captioning or even go beyond it with translations. This doesn’t do everything but it does the transcription really well in many cases.

-Philip