“In this 10-year time frame, I believe we will not only use keyboard and mouse to interact, but during that time we will have speech recognition and speech output so well that they Interfaces will become a standard part.”

Technology has come a long way, and with each new advancement, mankind is more connected to it and craves for these new cool features in all devices.

With the advent of Siri, Alexa and Google Assistant, users of technology have longed for speech recognition in their daily use of the Internet. In this post, I will explain how to integrate native speech recognition and speech synthesis in the browser using the JavaScript WebSpeech API.

Empowering our Speech Recognition App with WebSpeech API

At the time of writing, the WebSpeech API is only available in Firefox and Chrome. Its speech synthesis interface resides on the browser’s window object as speech synthesis, while its speech recognition interface resides on the browser’s window object as speech recognition in Firefox and WebKit speech recognition in Chrome.

In the above code, in addition to instantiating speech recognition, we have also selected icons, text-boxes and sound elements on the page. We have also created a paragraph element that will hold the words we say and we have added it to the text-box.

We want to play our sound and start the speech recognition service whenever the microphone icon is clicked on the page.

In the event listener, after the sound is played, we go ahead and create a dict function and say. The dict function starts the speech recognition service by calling the start method on the speech recognition instance.

To return the result of whatever the user says, we need to add a result event to our speech recognition example.

ResultingEvent returns a SpeechRecognitionEvent that contains a Result object. This in turn includes the transcript property holding recognized speech in text. We save the recognized text in a variable called speechToText and insert it into a paragraph element on the page.

If we run the app at this point, click on the icon and say something, it should pop up on the page.

The spec function takes a function called action as a parameter. The function returns a string that is passed to the SpeechSynthesisUterance.

SpeechSynthesisUterence is the WebSpeech API interface that contains the content that the speech recognition service should read. The speech synthesis speak method is called on its instance and the material is passed to read.

To test this, we need to know when the user is speaking and says a keyword.

In the code above, we have called the isFinal method on our event result depending on whether the user has finished speaking.

If the user has finished speaking, we check whether the transcript of what was said contains keywords such as what is the time, and so on. If it does, we call our Speak function and get it from one of three functions, getTime, getDate or getTheWeather which all return a string for the browser to read.

Conclusion
In this article, we are able to make a simple speech recognition app. There are a few other cool things we can do, like selecting a different voice for users to read, but I’ll leave that for you.

If you have any questions or feedback, please leave them as a comment below.

Leave a Reply

Your email address will not be published. Required fields are marked *