How to convert voice to text with javascript (webkitSpeechRecognition API) easily

About the webkitSpeechRecognition API

The Web Speech API, introduced at the end of 2012, allows web developers to provide speech input and text-to-speech output features in a web browser. Typically, these features aren’t available when using standard speech recognition or screen reader software. This API takes care of the privacy of the users. Before allowing the website to access the voice via microphone, the user must explicitly grant permission.

Some important points you need to know :

  • It is only available till the date (23.02.2016) only in Google Chrome.
  • Local files (file:// protocol) are not allowed, the file needs to be hoster someway in a server (or localhost).

Basic example

The following code will do the most basic support to retrieve what the user says, you can use interim_transcript and final_transcript to show the user the recognized text.

var recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = "en-GB";

recognition.onresult = function(event) {
    var interim_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        final_transcript += event.results[i][0].transcript;
      } else {
        interim_transcript += event.results[i][0].transcript;
      }
    }
    
    console.log(interim_transcript,final_transcript);
  };
}

The repository in github of google have a very complete example (with many language codes, prevent errors etc) you can download the demo from the repository here.

Using a library

Artyom.js is a robust wrapper library for the webkitSpeechRecognition API, it allows you to do awesome tricks like voice commands, voice prompt, speech synthesis and many more features. In this case we will be interested in the artyom.newDictation function. This feature will wrap all the previous code in something more simple, first you need to include the library into your project, your html file should look like this :

<!DOCTYPE>
<html>
  <head>
    <title>Dictation example </title>

    <script type="text/javascript" src="path/to/artyom.min.js"></script>
  </head>
  <body>
   <input type="button" onclick="startRecognition();" value="Recognize text" />
   <input type="button" onclick="stopRecognition();" value="stop recognition" />
   <script>
      // we will write the javascript here
   </script>
</body>
</html>

If you already linked the artyom library in your document, then your javascript will look something like this:

var settings = {
    continuous:true, // Don't stop never because i have https connection
    onResult:function(text){
        // text = the recognized text
        console.log(text);
    },
    onStart:function(){
        console.log("Dictation started by the user");
    },
    onEnd:function(){
        alert("Dictation stopped by the user");
    }
};

var UserDictation = artyom.newDictation(settings);

function startRecognition(){
  UserDictation.start();
}

function stopRecognition(){
  UserDictation.stop();
}

You'll only have to handle the initialization and then , the magic will happen in the onResult property of the settings object. Although artyom makes the things a lot easier, you'll need to think if you really need to use it, if you're beginning with this topic, is recommendable to use the plain code, so you will understand how this api works and if you still interested you can use artyom later.

The potential of this api is really incredible, however is a shame that only google chrome supports it. You can improve all the previous code , for example detect in which browser you can initialize webkitSpeechRecogniton.

Become a more social person