1. Introduction

IVONA Speech Cloud (SaaS) is a SOAP interface for use of IVONA Text To Speech services. It will be called “API” in the further part of this document. At the moment API allows its users to:

  • generate sound files from imported texts

  • get the information about available voices and codecs

  • modify the pronunciation rules sets to improve the pronunciation of specific words.

More functionalities will be available in the future.

2. Version information

IVONA Speech Cloud (SaaS) version: 0.3.0

Last modification date: 2011-03-24

Table 1. Change Log
version date changes

0.3.0

2011-03-24

Product name change - IVONA TTS SaaS API ⇒ IVONA Speech Cloud (SaaS)! Non-streamable formats available only on-demand. Token time-to-live reduced to 5 minutes (more isn’t necessary). Downloading process and returned HTTP status codes explained.

0.2.6

2011-02-15

Additional voices in new languages available: de_hans, de_marlene, es_conchita, es_enrique, es_us_penelope, es_us_miguel.

0.2.5

2011-01-27

Additional voice available: us_chipmunk. Additional contentType available in createSpeechFile() method: "text/ssml".

0.2.4

2010-12-15

Voice modifications available only on special agreement.

0.2.3

2010-12-07

Additional voices available: us_kimberly and gb_emma. getToken() method error result correctly described.

0.2.2

2010-10-12

Additional voices available: us_joey and us_kendra.

0.2.1

2010-04-07

Additional contentType available in createSpeechFile() method: "text/html".

0.2.0

2010-03-10

Sound effects documentation. Additional error codes for sound effects. Additional values returned from getUserAgreementData for monthly renewed SaaS agreements.

0.1.3

2010-02-26

Additional error codes for Pronunciation Rules methods.

0.1.2

2010-02-24

Additional explanations of the authorization procedure, initialization of the SoapClient object in the PHP example, and few tables visually reformatted.

0.1.1

2010-02-19

Error codes' list cleanup, and few parameters descriptions corrected in createSpeechFile() method, and small refactoring of pronunciation rules' methods for consistency with speech files' methods

0.1.0

2010-02-15

Initial release

2.1. System components

Account

Account of the IVONA.com registered user. Having an account with an IVONA TTS SaaS service active is required for the use of API. The registration process (creation of new accounts) isn’t available through API at the moment. New accounts could be created at http://secure.ivona.com/register.php (the registration page on the IVONA website). Each account is identified by a pair of strings: email and password, which are used in the request authorization process (getToken() method).

Speech File

Sound file generated in the text-to-speech process of IVONA TTS SaaS from the UTF-8 encoded text supported by user. In addition to the text, the speech file is generated according to additional supported parameters: the voice which will read the text, the codec that will determine the output format and quality of sound, and additional sound parameters that will modify the speech in the desired way (change the speed or volume of it, modify the sound parameters or set ID3 tags in case of MP3 files). All speech file data is stored in the database and could be accessed only by its owner. The speech file is identified by an unique file identifier. The downloading of a speech file will result in decreasing the number of characters available in the active user account’s SaaS service.

Text

The text uploaded by user using createSpeechFile() method. The text should be UTF-8 encoded, and its MIME-type should be selected from the list of available content-types. The text is stored in the IVONA.com website database and could be accessed and deleted only by its owner (the uploader).

Table 2. Available content types
content type description

text/plain

The text will parsed by pronunciation rules, and then will be read as is.

text/html

The text will be converted from HTML to plain text (all tags will be removed, or replaced by pauses, making the text suitable for reading). After the conversion is completed the pronunciation rules will be applied.

text/ssml

The text will be interpreted as SSML 1.1, and validated with SSML 1.1 basic schema (http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis) All SSML elements, except <audio> and <lexicon> will be interpreted - and those two will be ignored. The pronunciation rules should work, except if they make the ssml document invalid.

Voice

Single Text To Speech synthesiser selected to process the text. There could be only one voice selected for a single speech file. The voice is identified by a voice identifier parameter. Currently there are following voices available:

Table 3. Available voices list
voice id voice name voice language voice gender

us_eric

Eric

American English

male

us_jennifer

Jennifer

American English

female

us_joey

Joey

American English

male

us_kendra

Kendra

American English

female

us_kimberly

Kimberly

American English

female

us_chipmunk

Chipmunk

American English

none

gb_amy

Amy

British English

female

gb_brian

Brian

British English

male

gb_emma

Emma

British English

female

de_marlene

Marlene

German

female

de_hans

Hans

German

male

es_conchita

Conchita

Castilian Spanish

female

es_enrique

Enrique

Castilian Spanish

male

es_penelope

Penelope

American Spanish

female

es_miguel

Miguel

American Spanish

male

pl_ewa

Ewa

Polish

female

pl_jacek

Jacek

Polish

male

pl_jan

Jan

Polish

male

pl_maja

Maja

Polish

female

ro_carmen

Carmen

Romanian

female

Pronunciation Rules

Table of rules (simple text substitutions and regular expression substitutions) intended for preprocessing the uploaded texts before they would be processed (synthesised) by voice. The main reason for using the pronunciation rules is to improve the pronunciation of specific words which are read by selected voice in a way different from the intended one (especially abbrevations, foreign words, etc.), or to remove parts of texts (specific sections, symbols, etc.) which shouldn’t be heared in a spoken text. There are two types of pronunciation rules: the internal pronunciation rules that are a part of IVONA TTS SaaS (supporting the pronunciation of most popular abbrevation, foreign names, and specific grammatical constructions) and are used always on the uploaded text, and user pronunciation rules that could be inserted by user and will be visible only to their owner and IVONA TTS SaaS engine. All pronunciation rules are assigned to the specific language. In the process of generating the speech file, during the usage of a voice that is intended to work in a specific language (for example Brian in English), user pronunciation rules created for such language will be used automatically BEFORE the internal pronunciation rules. The character price of the single download of the speech file is determined AFTER processing the file with the pronunciation rules. Pronunciation rules are divided into following languages in which voices are available:

Table 4. Available languages for pronunciation rules sets
language id language name list of voices assigned

en

English

us_chipmunk, us_jennifer, us_eric, us_kendra, us_joey, us_kimberly, gb_amy, gb_brian, gb_emma

pl

Polish

pl_ewa, pl_maja, pl_jacek, pl_jan

ro

Romanian

ro_carmen

de

German

de_hans, de_marlene

es

Spanish

es_conchita, es_enrique, es_us_miguel, es_us_penelope

Codec

The name of audio codec used in the process of generating the speech file. Tha encoder name is supported amongst the parameters of the createSpeechFile() method. There are several codecs currently available to use through the API:

Table 5. The list of available codecs
codec id codec description

mp3/22050

MP3, 64 kbit/s, 22.05 kHz

ogg/22050

OGG, 45 kbit/s, 22.05 kHz

pcm16/22050*

Uncompressed wav file, 16 bit, 22.05 kHz

pcm16/8000*

Uncompressed wav file, 16 bit, 8 kHz

alaw/8000*

Wav companded with A-law algoritm (for telecom purposes)

ulaw/8000*

Wav companded with µ-law algoritm (for telecom purposes)

(*) Non-streamable formats are available on demand - contact: sales@ivona.com

Sound file parameters

Parameters affecting the format of the speech file. Those parameters could for example change the audio speed, volume, pitch and other sound properties. They could also set specific values for the ID3v2 tags of a file. All parameters are optional and have default values set by IVONA TTS SaaS. The list of available parameters is constantly growing, and new ones will be available in the future. Currently there are following parameters available:

Table 6. Sound file parameters list

BASIC PARAMETERS

parameter name

parameter description

parameter value range

default value

additional info

Prosody-Volume

the volume of the recording in percentage of original volume of the voice

0-100

100

this parameter will change only the default volume used in the sound encoding process; it could be further changed by a sound player or device where the file will be installed

Prosody-Rate

the speed of the recording in percentage of the original speed of the voice

50-200

100

this parameter could be useful in the solutions directed at the visually impaired people (accustomed to the higher speed of provided speech) or for the foreign language learning solutions (slower speed will suit those solutions better)

Sentence-Break

the pause between sentences in miliseconds

0-3000

400

this parameter could be useful in the solutions intented to dictate texts to their receivers

Paragraph-Break

the pause between paragraphs (separated by empty lines in the uploaded text) in miliseconds

0-5000

650

this parameter could be useful in solutions based on splitting speech into separated blocks

ID3v2 TAGS SET FOR MP3 FILES

parameter name

parameter description

interpreted by IVONA.com Flash Player?

default value (if not set by user)

value example

Id3v2-TIT2

Frame TIT2 in ID3v2.4

yes (will show the name of a file in modes 1 and 2 of the player)

-

my speech file

Id3v2-TPE1

Frame TPE1 in ID3v2.4

yes (will show the author of a file in modes 1 and 2 of the player)

www.ivona.com

John Smith

Id3v2-TPE3

Frame TPE3 in ID3v2.4

yes (will link the name of a file in modes 1 and 2 of the player)

-

http://hostname/somepage

Id3v2-TPE4

Frame TPE4 in ID3v2.4

yes (will show the image assigned to the file in modes 1 and 2 of the player)

-

http://hostname/imagepath

Id3v2-TDTG

Frame TDTG in ID3v2.4

no

(the time of file encoding)

2010-02-01T12:00:05

Sound effects

Additional sound effects could be added on special request. Contact us at sales@ivona.com, for separate agreement on creating a modified voice.

Characters price

The “price” of downloading a file deducted from user’s account. When user activates an IVONA TTS SaaS service on his account specific number of characters are added to his account. The number of characters added depends on the type of agreement the user has signed with the IVONA.com sales department (in case of trial services this number is standarized (see http://www.ivona.com/saas.php for details). For each download of a speech file the number of characters calculated by the IVONA TTS SaaS is deducted from the user’s account. This price depends on the size of the text uploaded by user after processing it with the pronunciation rules. User could always check the price of a specific text using the checkPrice() API method. Every consecutive download of a speech file will deduct the character price of this file from user’s account.

3. Authorization

Every IVONA.com functionality is available only for registered users with IVONA TTS SaaS service active for their account. So almost every API request should be authorized. There is only one type of unauthorized request through the API, and that is a request for token, which should be used in the authorization process.

Authorization is based on a calculation of MD5 sums from password and token received from API, by using getToken() request. Every request beside the getToken() method itself should be preceeded by getToken() call. In every authorized request there is a md5 parameter in which the generated token should be used. The value of this parameter should be prepared using the following formula:

The formula for md5 parameter calculation
$md5 = md5 ( md5 ("user password") . $tokenFromGetTokenMethod );
Example authorization procedure for a sequence of method calls
  1. getToken() method call for a new token.

  2. listVoices() method call using token acquired in step 1 to calculate the md5 parameter value.

  3. getToken() method call for a new token.

  4. listCodecs() method call using token acquired in step 3 to calculate the md5 parameter value.

  5. etc…

Note
Token Time-To-Live

Every token is valid for 5 minutes, or till it’s used for an authorized request. If token is used, then every consecutive method call will need another token (that could be acquired by getToken() call) for its authorization.

4. Calling IVONA Speech Cloud (SaaS) methods through SOAP

SOAP request should comply with API WSDL, which is available here: http://www.ivona.com/saasapiwsdl.php.

Every functional request available through API need to be preceded by getToken() request, with an email parameter.

Here is a PHP example of getting token and using it for an authorized request:

// get the soap client object
$client = new SoapClient('http://www.ivona.com/saasapiwsdl.php', array('exceptions' => 0));

// preparing variables for a SOAP method (every method and variable is defined in WSDL)
// @param string $email IVONA.com user email
$input = array('email' => 'mail@ihostname.com');

// requesting a new token for an authorized request
// (single token could be used only once)
$result = $client->__soapCall('getToken',$input);

// checking if there was an error during SOAP request
if (is_soap_fault($result)) {
        // if there was an error, we should check the faultstring, and correct and prepare another valid request
        echo "\nSOAP Fault:\n faultcode:[{$result->faultcode}]\n faultstring:[{$result->faultstring}]\n
                faultactor:[{$result->faultactor}]\n";
        return;
}

// the request was successful, so the result is a token
$token=$result;

// preparing variables for an authorized request (every method and variable is defined in WSDL)
// @param string $token received token
// @param string $md5 md5() encoded string
//      prepared as follows: md5(user password for email test@ivosoftware.com) followed by a token
$input = array('token'=>$token,'md5'=>md5(md5('user password').$token));

// requesting user utterances list
$result = $client->__soapCall('listVoices', $input);

// checking if there was an error during SOAP request
if (is_soap_fault($result)) {
        // if there was an error, we should check the faultstring, and correct and prepare another valid request
        echo "\nSOAP Fault:\n faultcode:[{$result->faultcode}]\n faultstring:[{$result->faultstring}]\n
                faultactor:[{$result->faultactor}]\n";
        return;
}

// there wasn't an error, so we've got a valid result, which we could use the way we need
foreach($result as $k => $v){
                echo "\n\nLp.".$k;
                echo ",\t id = ".$v->voiceId;
                echo ",\t name = ".$v->voiceName;
                echo ",\t description = ".$v->voiceDescription;
                echo ",\t gender = ".$v->gender;
                echo ",\t language = ".$v->langId;
                echo ",\t productName = ".$v->productName;
                echo ",\t providerName = ".$v->providerName;
}

Every SOAP request could end in a failure, so we need to check if the request was valid (available error codes for easier processing will be published in the future).

5. Downloading speech files - redirections and returned HTTP codes

Using createSpeechFile() method doesn’t exactly invoke any speech synthesis process. It simply stores the synthesis requirements inside the IVONA Speech Cloud (SaaS) database and returns an url from which the file will be available. The synthesis process starts not earlier then on such url request.

The file download process invokes a couple of redirects that eventually ends in the final file location that will be streamed (over HTTP protocol) to the client. Some audio formats aren’t streamable - for example "wav" files, that require full file data inside a file header and because of it cannot be send to user before it is fully synthesized. For such files the stream isn’t available immediately after url request - instead of a file, some HTTP headers will be returned for user, carrying additional information about the file availability.

The following HTTP headers could be returned on file requests:

FOR STREAMABLE FORMATS (mp3/ogg)

HTTP Code

Additional headers

Description

200

-

The file is ready to download - the streaming should start immediately.

302

"Location"

Redirection to the selected TTS host, from which the file will be streamed. User should follow the location (most client applications, for example using the flash Sound object, will automatically proceed to the new address, without even notifying the user).

449

-

The file isn’t ready to synthesize, some additional preparation are required (for example the content analysis). User should repeat a request after few seconds.

FOR NON-STREAMABLE FORMATS (wav/alaw/ulaw)

HTTP Code

Additional headers

Description

200

-

The file is ready to download - it will be send immediately to user.

302

"Location"

Redirection to the selected TTS host, from which the file will be served. User should follow the location.

449

"Retry-After","Refresh"

There is synthesis in progress and user should wait till it’s done. The expected synthesis duration (in seconds) is returned in "Retry-After" and in "Refresh" headers (the latter one will automatically refresh the webpage if user uses the web browser to download the file). User should repeat file request after the suggested duration.

If user is using a command line application (for example wget or curl) to download the file from the returned soundUrl, there is a necessity of using quotes or apostrophes around an url, because some characters in the returned url maybe interpretted by shell (for example the ampersand ("&") character).

Note
The "449 Retry" HTTP code on file request instead of "503 Service Temporarily Unavailable" .

Many Flash multimedia applications use the "Sound" object to load the audio file into a flash movie. In most enviroments the flash is unable to detect the exact HTTP status code returned on the audio file request. In most cases only the 4xx HTTP status codes are detected as "file not available" events, for which the "retrying" procedure could be easly implemented in the application. The Flash Developer could easly implement for example the "retry after 3 seconds" (or any other time period) function and call it in such event. That’s the reason for such "exotic" status code that is returned when the file isn’t ready yet instead of standard server responses from the 5xx HTTP codes family.

6. SSML Content-Type guidelines

There is an text/ssml content type available in createSpeechFile() method that allows us to import text in SSML format. Text should be correct according to the SSML 1.1 recomendation (http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/) and will be validated with SSML 1.1 basic schema (http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis) on createSpeechFile() method call. If there is a validation error, the ERR_INVALID_SSML error will be returned (with additional validation notes), and the Speech File won’t be created.

There is a possibility to omit the SSML header/footer when importing a text. IVONA Speech Cloud (SaaS) detects the absence of them and add them before the validation step. So the following example will be still valid:

Alice was awaken by some strange noise...
<voice name="Kimberly">"Who's there?"</voice> she asked fearfully.
<prosody rate="75%"><voice name="Emma">"It's Sally here, don't be afraid. Sleep well!"</voice></prosody> Sally answered with her flegmatic voice.

When calculating characters price for a given text all SSML tags will be omitted, only the text that will be read counts. So it doesn’t matter if we use the full SSML header, or completely skip that part.

Note
Changing voice using the <voice> element.

The <voice> element will accept the voice names (see table), not the voice id. For example: <voice name="Kimberly">What’s up?</voice>. The rest of the text (except from the parts where the voice is changed using the <voice> element), will be read using the "default" voice id, from the voiceId parameter of the createdSpeechFile() method.

All SSML elements in an imported text, with the exception for <audio> and <lexicon> tags, will be interpreted. The <audio> and <lexicon> elements will be completely ignored. The pronunciation rules should work, except if they make the ssml document invalid - in that case the Speech File won’t be created and validation error message will be returned.

7. Functions available through API

All strings should be encoded to UTF-8.

Note
The order of parameters in SOAP requests

The SOAP requests are basically the function (method) calls. It’s not safe to simply change the order of parameters in a function call, because this call will be probably invalid or at least the results would be unexpected. Some SOAP clients will reorder the parameters based on their names according to the WSDL file, but some of them wouldn’t do so. To avoid possible errors in communication, the order of the parameters should be maintained in concordance to the methods parameters lists in this document.

7.1. Authorization methods

For detailed information about the authorization procedure, see “Authorization” section.

getToken()

Request a new token for an authorized request

parameters

email (string) - email of IVONA.com registered user

result

token (string)

checkToken()

Check if token/md5 are correct/valid (useful for checking if email/password pair is correct)

parameters

token (string)
md5 (string) - tokenized password

result

value of (int: 1) - if token is correct, or SOAP Fault if token is incorrect

7.2. Speech generation methods

The lists of available voices, codecs and sound parameters are also located in the “System components” section.

createSpeechFile()

Get an url for a speech file generated from uploaded text.

parameters

token (string)
md5 (string) - tokenized password
text (string) - text to process
contentType (string) - content type of the uploaded text, currently there are three contentTypes supported: "text/plain", "text/html" and "text/ssml"
voiceId (string) - voice identifier (from listVoices() method)
codecId (string) - codec identifier (from listCodecs() method)
params (list of strings) - additional sound parameters for speech file encoding

  • key (string) - additional parameter name (ex. id3-TIT2)

  • value (string) - additional parameter value (ex. "Welcome to our website")

result

fileId (string) - identifier of a speech file
charactersPrice (int) - characters price for each download of the file
soundUrl (string) - url for sound file available for download
embedCode (string) - HTML with flash player embedding code playing generated speech file

Example of createSpeechFile() method parameters array (PHP style)
                // WARNING! The order of parameters is important!
$input = array (
                'token' => $token,
                'md5' => md5(md5($password).$token),
                        // UTF-8 encoded text to synthesize:
                'text' => 'Hello John! How are you? Yours sincerely... James.',
                        // the content type of the text above
                'contentType' => 'text/plain',
                        // the voice identifier
                'voiceId' => 'us_eric',
                        // the codec identifier
                'codecId' => 'mp3/22050',
                        // no additional parameters
                'params' => array()
                );
Example of additional parameters array for the createSpeechFile() method (PHP style)
$params = array (
                '0' => array(
                                // the text should be read two times slower
                                'key' => 'Prosody-Rate',
                                'value' => '50'                                         ),
                '1' => array(
                                // 2.5s of pause between paragraphs
                                'key' => 'Paragraph-Break',
                                'value' => '2500'
                        ),
                );
Note
The actual speech synthesis procedure

The createSpeechFile() request does not invoke the speech synthesis process. This process happens after each speech file requests using returned soundURL. The exact request for soundURL backend mechanics are explained in the “Downloading speech files” section.

deleteSpeechFile()

Delete a single speech file data

parameters

token (string)
md5 (string) - tokenized password
fileId (string) - speech file identifier (returned from createSpeechFile(), getSpeechFileData() and listSpeechFiles() methods)

result

success (int: 0/1) - success (1) or failure (0) of the operation

Note
The files limit and disk space consumption

The deleteSpeechFile() method is irreversible!

Each user has a top limit of maximum active speech files that could be checked using the getUserAgreementData() method. To avoid reaching the limit the user should delete the speech files using the deleteSpeechFile() method.

There is no need to consider the disk space consumption of generated files on IVONA Speech Cloud (SaaS) speech servers. There are several automated processes that clean the physical files that haven’t been downloaded recently, so user should only take care of not reaching the top limit of his Speech Cloud agreement.

listSpeechFiles()

Get a list of all active speech files for the current user

parameters

token (string)
md5 (string) - tokenized password

result

files (array of:) - list of user speech files

  • fileId (int) - speech file identifier

  • textHead (string) - first 100 characters of speech

  • voiceId (string) - voice identifier

  • codecId (string) - codec identifier

  • created (string) - the GMT time of the file creation in ISO 8601 format YYYY-MM-DDTHH:II:SS+HH:II (e.g. 2010-01-01T15:15:10+00:00)

getSpeechFileData()

Getting data of single utterance

parameters

token (string)
md5 (string) - tokenized password
fileId (string) - speech file identifier

result

text (string) - text of speech
charactersPrice (int) - characters price of single text download
voiceId (int) - voice identifier
codecId (string) - codec identifier
created (string) - the GMT time of the file creation in ISO 8601 format YYYY-MM-DDTHH:II:SS+HH:II (e.g. 2010-01-01T15:15:10+00:00)
soundUrl (string) - url for sound file available for download
embedCode (string) - HTML with flash player embedding code playing generated speech file
params (array of strings) - additional parameters for speech file encoding

  • key (string) - additional parameter name

  • value (string) - additional parameter value

7.3. Pronunciation Rules methods

The list of available pronuciation rules languages can be found in the languages table.

listPronunciationRules()

Get the data for all user’s pronunciation rules for selected language

parameters

token (string)
md5 (string) - tokenized password
langId (string) - identifier of a language

result

pronunciationRules (list of:) - list of pronunciation rules

  • id (int) – rule identifier

  • stat (int: 1/2/3) – type of substitution

    • 1 – simple substitutions, case insensitive

    • 2 – simple substitutions, case sensitive

    • 3 – regular expressions (currently not available – will be available soon)

  • key (string) – pattern to search (“from” part of replacement)

  • value (string) – replacement value (“to” part of replacement)

addPronunciationRule()

Add any number of pronunciation rules for selected language into user’s account

parameters

token (string)
md5 (string) - tokenized password
langId (string) - identifier of a language
stat (int: 1/2/3) – type of substitution (see listPronunciationRules() method)
key (string) – pattern to search (“from” part of replacement)
value (string) – replacement value (“to” part of replacement)

result

value of (int: 0/1) - success (1) or failure (0) of the operation

Example of addPronunciationRule parameters array (PHP style)
$input = array (
                'token' => $token,
                'md5' => md5(md5($password).$token),
                'langId' => 'en',
                'stat' => 2,    // case sensitive
                'key' => 'WSG',
                'value' => 'Water Savings Groupa'
                );
Warning
Deleting and modifing the pronunciation rules

Deleting and modifing the pronunciation rules' methods are irreversible!

modifyPronunciationRule()

Modify any number of user’s pronunciation rules for selected language

parameters

token (string)
md5 (string) - tokenized password
langId (string) - identifier of a language
id (int) – rule identifier, it could be aquired by using the listPronunciationRules() method
stat (int: 1/2/3) – type of substitution (see listPronunciationRules() method)
key (string) – pattern to search (“from” part of replacement)
value (string) – replacement value (“to” part of replacement)

result

value of (int: 0/1) - success (1) or failure (0) of the operation

deletePronunciationRule()

Delete any number of user’s pronunciation rules for selected language.

parameters

token (string)
md5 (string) - tokenized password
langId (string) - identifier of a language
id (int) – rule identifier, it could be aquired by using the listPronunciationRules() method

result

value of (int: 0/1) - success (1) or failure (0) of the operation

7.4. Additional information methods

Note
The amount of available characters

All requests for the hosted file are verified againsst the available characters in the user’s SaaS agreement, and the characters price of downloaded speech file is deducted from the user’s account. If the characters price of a speech file is higher then the available account’s characters, then the file won’t be served to the user. The characters price is calculated using several variables: the text size, selected voice (determining internal pronunciation rules), and user pronunciation rules set. The text is processed using the internal pronunciation rules for a specific voice, and user pronunciation rules for the selected voice language.

checkTextPrice()

Check the characters price for a specified speech parameters (text and voice pair)

parameters

token (string)
md5 (string) - tokenized password
text (string) - text to synthesize
voiceId (string) - voice identifier

result

charactersPrice (int) - characters price for a text

getUserAgreementData

Show user TTS SaaS agreement data (this method will result in an error if there isn’t a SaaS agreement currently active for user)

parameters

token (string)
md5 (string) - tokenized password

result

isMonthlyRenewed value of (int: 0/1) - 1 if there is monthly-renewal subscription active for this SaaS agreement, 0 if there is one-use only character pool
allCharacters (int) - all characters from current active agreement, or their monthly limit if monthly renewal is active (value -1 means unlimited characters)
currentCharacters (int) - the current available characters (value -1 means unlimited characters)
previousCharacters (int) - the characters available before last operation that changed their value (value -1 means unlimited characters)
expirationDate (string) - GMT expiration time for user’s SaaS agreement in ISO 8601 format YYYY-MM-DDTHH:II:SS+HH:II (e.g. 2010-01-01T15:15:10+00:00)
renewalDate (string) - GMT renewal time for monthly character limit if user has monthly renewal option active in his SaaS agreement, time in ISO 8601 format YYYY-MM-DDTHH:II:SS+HH:II (e.g. 2010-01-01T15:15:10+00:00)
isTrial value of (int: 0/1) - 1 if there is a trial version of SaaS currently active, 0 if there is a full version of SaaS active for user’s account
limits (array of):

  • maxNumberOfSpeechFiles (int) - maximum number of speech files per user

  • maxTextLength (int) - maximum length of a processed text

listVoices()

Get all available voices list

parameters

token (string)
md5 (string) - tokenized password

result

voices (list of:)

  • voiceId (string) - voice identifier

  • langId (string) - voice language 2-letter code

  • gender (string: m/f/c) - voice gender (male, female, child)

  • voiceName (string) - voice name

  • voiceDescription (string) - voice description

  • productName (string) - general product name (from provider)

  • providerName (string) - voice provider name

getVoiceData()

Get single voice data

parameters

token (string)
md5 (string) - tokenized password
voiceId (string) - voice identifier

result

langId (string) - voice language 2-letter code
gender (string: m/f/c) - voice gender (male, female, child)
voiceName (string) - voice name
voiceDescription (string) - voice description
providerName (string) - voice provider name
providerDescription (string) - voice provider description
productName (string) - general product name (from provider)

listCodecs()

Get all available codecs list

parameters

token (string)
md5 (string) - tokenized password

result

codecs (list of:)

  • codecId (string) - codec identifier

  • description (string) - codec description

  • codec (string: wav/mp3/ogg/alaw/ulaw) - encoding format for audio file

  • rate (string: 8/12/16/22.05) - rate of sound (in khz)

  • sample (int: 0/8/16) - sound sample size (in bits) (0 - in case of mp3 and ogg format)

8. Example sequence of SOAP methods invoked for typical use of API - synthesis of a simple text

This is an example of using IVONA Speech Cloud (SaaS) for typical operations (generating and managing synthesised texts). We assume, that there is an user account registered in IVONA Online.com, with an active SaaS agreement and enough number of available characters for performing those operations.

  1. getToken() - get token for the next operation

  2. createSpeechFile() - upload text and get url for downloading the speech file

  3. download a file to the user’s machine or save an url if IVONA.com hosting will be used

(and if the hosting won’t be used):

  1. getToken() - get token for the next operation

  2. deleteSpeechFile() - delete the speech file from the system

9. Possible errors returned by API

In case of an error or bad method invocation, SOAP returns an error string which consists of three elements:

  • faultactor - name of the method that resulted in an error

  • faultcode - error code

  • faultstring - human friendly description of error

Table 7. List of possible errors:
error code error description

ERR_DATABASE

Database error, try again later.

ERR_SAAS_NOT_ACTIVE

User does not have an active TTS SaaS agreement.

ERR_SAAS_BLOCKED

User has been administrively blocked. Contact support@ivona.com.

ERR_TOO_MANY_FILES

Too many files for this SaaS agreement.

ERR_INVALID_TOKEN

Invalid/inactive token.

ERR_INVALID_MD5

Invalid authorization code (md5 hash).

ERR_INVALID_SPEECHID

Invalid speech file identifier, or user is not an owner of this speech file.

ERR_INVALID_VOICEID

Invalid voice identifier (check the list of available voice identifiers here).

ERR_INVALID_CODECID

Invalid codec identifier (check the list of available codec identifiers here).

ERR_INVALID_CONTENTTYPE

Invalid content-type parameter (check the list of supported content-types here).

ERR_INVALID_SOUNDPARAMETERS

Invalid sound parameters (check the list of available sound parameters here).

ERR_INVALID_PRONUNCIATIONRULEID

Invalid pronunciation rule identifier.

ERR_PRONUNCIATIONRULE_EMPTY

Empty pronunciation rule key.

ERR_PRONUNCIATIONRULE_EXISTS

Such pronunciation rule already exists and cannot be added again.

ERR_TEXT_EMPTY

Empty text after processing by pronunciation rules.

ERR_TEXT_TOO_LONG

Too long text after processing by pronunciation rules.

ERR_INVALID_SSML

Invalid SSML content (the text hasn’t passed the SSML validation - see here).

ERR_OTHER

Other error. Contact support@ivona.com if such error occured.