How to convert text to speech (speech synthesis) in Cordova

Speech synthesis is the computer-generated simulation of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and unified messaging. As a developer that uses Javascript to create hybrid applications, we assume that you know the Speech Synthesis API available in most of the web browsers. The usage of this API is really easy, however it isn't available within a cordova application, therefore you need to resort to the native API of the device through a cordova plugin.

In this article you will learn how to convert text to speech easily in your cordova project.

Requirements

In order to use the native Speech Synthesis API of the device, we are going to use a plugin for it. Cordova plugin TTS is a plugin written by @vilic that allows you to use the native speech synthesis API of the device through Javascript. This plugin provides support for the following platforms:

iOS 7+
Windows Phone 8
Android 4.0.3+ (API Level 15+)

Install the plugin in your project executing the following command in a terminal:

cordova plugin add cordova-plugin-tts

After the installation you'll be able to use the TTS object in the window. For more information visit the official Github repository here. This plugin uses the AVSpeechSynthesizer class in iOS, the TextToSpeech class in Android and the Windows Phone Speech Synthesis class in Windows Phone.

Text to speech

To convert text to voice with Javascript use the speak method of the TTS object. This method expects as first parameter an object with up to 3 properties:

Property name	Type	Description
`text`	String	Specifies the text that will be spoken.
`locale`	String	A 4 character code that specifies the language that should be used to synthesize the text.
`rate`	Float	the speaking rate of the SpeechSynthesizer object, from 0.1 to 1

And can be used as in the following example:

TTS.speak({
    text: 'Good morning, how are you?',
    locale: 'en-US',
    rate: 1
}, function () {
    console.log('Text succesfully spoken');
}, function (reason) {
    console.log(reason);
});

It's worth to say that if you want to synthesize text when your app starts, that you need to wrap your code inside the deviceready event:

document.addEventListener('deviceready', function () {
    TTS.speak('Welcome to my awesome app', function () {
        console.log('Ready !');
    }, function (reason) {
        console.log(reason);
    });
});

Why there's no stop method?

In technical terms with the native API, you are able to stop any speech utterance object. However, with this plugin there's no way to stop the speech synthesis (at least not in the traditional way) if it's already running and the reason is very simple (no, the plugin is not bad or incomplete).

As we are using Cordova, we are limited to send and receive some data, we can interact with native code and Javascript through callbacks but we cannot interact dinamically with a single native instance of the plugin class because the execution of every Javascript method creates a new instance of the native class of the plugin.

Everytime you execute some action in this case the start, Cordova instantiates e.g the TTS class in Java that extends CordovaPlugin once and stores the TTS object in a local variable in the class. From the stored TTS object, you can execute the stop method and the speech synthesis should stop as planned. But, if you execute another method from Javascript in this case the stop, then a new instance of the TTS class in Java that extends CordovaPlugin will be created (that means that we have not access to the same TTS object from the start method anymore). The same thing happens with iOS and Windows.

Although there's no stop method included in the plugin, you can use a little trick to achieve it. Speak an empty string and the Speech synthesis will stop ! You can extend the plugin and create your own stop method:

// Extend the TTS object with the new stop method.
TTS["stop"] = function(){
    TTS.speak({text: ''});
};

// Speak some text
TTS.speak({
    text: 'Hola buenos días. Hoy para el desayuno hay pan con mantequilla.',
    locale: 'es-ES',
    rate: 0.75
}, function () {
    console.log('success');
}, function (reason) {
    console.log(reason);
});

// Stop after 3 seconds
setTimeout(function() {
    TTS.stop();
    console.log("Speech synthesis stopped");
}, 3000);

Happy coding !