How to use the Speech Recognition API (convert voice to text) in Cordova

In the Web development world, there's a really useful (although experimental) API that allow to convert voice to text easily. We are talking about the SpeechRecognition API, this interface of the Web Speech API is the controller interface for the recognition service this also handles the SpeechRecognitionEvent sent from the recognition service.

However, this API is only available in Google Chrome (that lefts iOS out), and as if that was not enough, this API is neither available in the WebView of Android. That completely rules out the use of this API in your Cordova app. Therefore the only option, is to use the native speech recognition service of the device.

In this article you will learn how to use the native speech recognition interface of the device within your Cordova Project through an open source plugin.

Requirements

To use the Voice Recognition API you will need a Cordova Plugin that handles the native code of the Speech Recognizer. In this case, we are going to use the cordova-plugin-speechrecognition plugin. This plugin allows you to use the native Speech Recognition from your device easily.

This plugin supports the Android and iOS platforms. To install the plugin in your project, execute the following command in the terminal:

cordova plugin add cordova-plugin-speechrecognition

Once the plugin is installed in your project, the window.plugins.speechRecognition variable will be available in your project. Read more about the plugin in its official Github repository here. The plugin itself has the following requirements:

cordova-android v5.0.0
Android API level 14
<android:launchMode> must not be singleInstance. It can be singleTask, standard, singleTop.
RECORD_AUDIO permission
Internet Connection (obviously)

Methods

The plugin offers 6 methods to handle the Speech Recognition:

1. isRecognitionAvailable

This method allows you to check wheter the speech recognition can be used on your device or not. The first callback (success) receives a boolean with this value:

window.plugins.speechRecognition.isRecognitionAvailable(function(available){
    if(available){
        // You can use the speechRecognition
    }
}, function(err){
    console.error(err);
});

2. hasPermission

This method verifies if the application has the permissions for the usage of the microphone:

window.plugins.speechRecognition.hasPermission(function (isGranted){
    if(isGranted){
        // Do other things as the initialization here
    }else{
        // You need to request the permissions
    }
}, function(err){
    console.log(err);
});

3. requestPermission

You can request the permission for the microphone with this method easily:

window.plugins.speechRecognition.requestPermission(function (){
    // Requested
}, function (err){
    // Opps, nope
});

4. getSupportedLanguages

This method retrieves all the available languages on the device in an array format:

window.plugins.speechRecognition.getSupportedLanguages(function(data){
    console.log(data); // ["es-ES","de-DE","id-ID" ........ ]
}, function(err){
    console.error(err);
});

5. startListening

This method initializes the speech recognition. It expects as third parameter an object with the options:

language {String} used language for recognition (default "en-US")
matches {Number} number of return matches (default 5, on iOS: maximum number of matches)
prompt {String} displayed prompt of listener popup window (default "", Android only)
showPopup {Boolean} display listener popup window with prompt (default true, Android only)
showPartial {Boolean} Allow partial results to be returned (default false, iOS only)

There is a difference between Android and iOS platforms. On Android speech recognition stops when the speaker finishes speaking (at end of sentence). On iOS the user has to stop manually the recognition process by calling stopListening() method:

var settings = {
    lang: "en-US",
    showPopup: true
};

window.plugins.speechRecognition.startListening(function(result){
    console.log(result);
    // By default just 5 options
    // ["Hello","Hallou", "Hellou" ...]
}, function(err){
    console.log(err);
}, settings);

6. stopListening

This method stops the recognition, but is only available for iOS:

window.plugins.speechRecognition.stopListening(function(){
    // No more recognition
}, function(err){
    console.log(err);
});

Usage

The correct way to use the speech recognition is the following:

You need to check if the speech recognition is supported.
If its supported, then check for permissions.
If there are no permissions to use the microphone, request them.
Once you have the permissions, initialize the Speech Recognizer.

With the available methods of the plugin, you can easily start the recognition with the following code (note that you need to change the options):

// Handle results
function startRecognition(){
    window.plugins.speechRecognition.startListening(function(result){
        // Show results in the console
        console.log(result);
    }, function(err){
        console.error(err);
    }, {
        language: "en-US",
        showPopup: true
    });
}

// Verify if recognition is available
window.plugins.speechRecognition.isRecognitionAvailable(function(available){
    if(!available){
        console.log("Sorry, not available");
    }

    // Check if has permission to use the microphone
    window.plugins.speechRecognition.hasPermission(function (isGranted){
        if(isGranted){
            startRecognition();
        }else{
            // Request the permission
            window.plugins.speechRecognition.requestPermission(function (){
                // Request accepted, start recognition
                startRecognition();
            }, function (err){
                console.log(err);
            });
        }
    }, function(err){
        console.log(err);
    });
}, function(err){
    console.log(err);
});

If you work with promises, then you can create a miniobject that stores the same functions providen by the plugin but with Promises as shown in the following example:

window["speechRecognition"] = {
    hasPermission: function(){
        return new Promise(function(resolve, reject){
            window.plugins.speechRecognition.hasPermission(function (isGranted){
                resolve(isGranted);
            }, function(err){
                reject(err);
            });
        });
    },
    requestPermission: function(){
        return new Promise(function(resolve, reject){
            window.plugins.speechRecognition.requestPermission(function (){
                resolve();
            }, function (err){
                reject();
            });
        });
    },
    startRecognition: function(settings){
        return new Promise(function(resolve, reject){
            window.plugins.speechRecognition.startListening(function(result){
                resolve(result);
            }, function(err){
                reject(err);
            }, settings);
        });
    },
    getSupportedLanguages: function(){
        return new Promise(function(resolve, reject){
            window.plugins.speechRecognition.getSupportedLanguages(function(result){
                resolve(result);
            }, function(err){
                reject(err);
            });
        });
    },
    isRecognitionAvailable: function(){
        return new Promise(function(resolve, reject){
            window.plugins.speechRecognition.isRecognitionAvailable(function(available){
                resolve(available);
            }, function(err){
                reject(err);
            });
        });
    },
    stopListening: function(){
        return new Promise(function(resolve, reject){
            window.plugins.speechRecognition.stopListening(function(){
                resolve();
            }, function(err){
                reject(err);
            });
        });
    }
};

This would create the speechRecognition variable in the window that you can use in the following way:

window.speechRecognition.isRecognitionAvailable().then(function(available){
    if(available){
        return window.speechRecognition.hasPermission();
    }
}).then(function(hasPermission){

    function startRecognition(){
        return window.speechRecognition.startRecognition({
            language:"en-US",
            showPopup: true
        }).then(function(data){
            console.log("Results",data);
        }).catch(function(err){
            console.error(err);
        });
    }


    if(!hasPermission){
        window.speechRecognition.requestPermission().then(function(){
            startRecognition();
        }).catch(function(err){
            console.error("Cannot get permission", err);
        });
    }else{
        startRecognition();
    }
}).catch(function(err){
    console.error(err);
});

Pretty easy, isn't ?

Voice commands

You can use a voice commands library like Artyom.js to process the voice commands (although the webkitSpeechRecognition and speechSynthesis API aren't available you can use the command processor):

artyom.addCommands([
    {
        indexes: ["Hello","Hi"],
        action: function(){
            console.log("Hello, how are you?");
        }
    },
    {
        indexes: ["Translate * in Spanish"],
        smart: true,
        action: function(i, wildcard){
            console.log("I cannot translate" + wildcard);
        }
    },
]);

// Start the recognition and say "hello"
window.plugins.speechRecognition.startListening(function (result) {

    // The hello command should be triggered
    result.forEach(function(option){
        if(artyom.simulateInstruction(option)){
            console.log("Matched : " + option, result);
            return;
        }
    });

}, function (err) {
    reject(err);
}, {
    language: "en-US",
    showPopup: true
});

Pitifully the native recognition doesn't support continuous recognition (at least not in Android but iOS), you only will be able to use a "Ok Google..." feeling in your project.

Happy coding !