Speech Server

Get a high-performance TTS server for demanding environments

Start of content

Enables TTS synthesis in a private network

The IVONA Text-to-Speech Server is a dedicated solution for managing multiple speech synthesis requests in parallel. Create a large volume of clear, natural sounding speech files that can be used in a variety of applications that require multi-threaded TTS services. Customers can integrate Text-to-Speech with their client-server architecture.

Benefits for you

  • High performance – multiple speech synthesis requests can be managed in parallel.
  • Rich functionality - a wide range of functionalities that allow generating speech according to customer needs.
  • Standards-based - utilizes most of the available TTS standards:
    • Dynamic voice and language switching
    • Support for phonetic alphabets
    • Word-highlighting (alignment of text with audio)
    • Visemes (lip-sync), SSML events
    • Prosody control (volume, speed, pitch)
    • User level pronunciation lexicon.

Licensing Model

Licensing is based on the number of concurrent TTS sessions which can generate audio files at the same time. Restrictions are related to speech: it can only be used in predefined solutions (e.g. only in a web application for video creation). Please contact us if you are interested in purchasing IVONA Speech Server.

Text-to-Speech Cloud

If you want to embed a TTS engine into your application, device or service and you need TTS functionality right now, you may benefit from IVONA Speech Cloud

Recommended Speech Server Uses

Consumer Devices

Deploy cutting-edge speech applications in your products.

Connected TV

Add speech to enrich dynamic multimedia content.


Enable drivers to keep their hands on the wheel, stay connected and drive safely.

Announcement Systems

Generate real time speech, support efficient and up-to-date announcements.


Remove barriers to information access & improve communications.


Improve learning outcomes with Text-to-Speech.

Digital Publishing

Create professional recordings and audio content.

IVONA Speech Server Specifications

IVONA SpeechServer

IVONA SpeechServer SAPI



Natural lifelike voices resulting from innovative approach to unit selection technology. Reduced unnatural discontinuities, electronic noise, and audible glitches. High accuracy through sophisticated NLP algorithms built into TTS engine. Support for natural reading of short and long texts.

Languages and voices

See voices list at http://www.ivona.com/en/voices-list/

Prosody control

Ability to adjust volume, speech rate and pitch at runtime.

Built-in domains support

IVONA TTS has built-in mechanisms to correctly pronounce texts from specific communicative contexts such as social text, acronyms, abbreviations and numbers.

Mixing static expressive prompts

Mechanism to mix static audio prompts with dynamically generated TTS output.

Support for phonetic alphabets

IPA, X-SAMPA, TeleAtlas®, Navteq™

Standards compliance

W3C SSML 1.0/1.1, W3C PLS 1.0 (with IVONA extensions)

Support for text highlighting

Ability to synchronize audio with text through highlighting words and sentences spoken by TTS.

Support for lip synchronization

Ability to provide applications with synchronized stream of visemes – visual representations of sound.


CPU requirements

X86 (32/64 bit)

X86 (32/64 bit)


Recommended: 128 MB per each voice

Recommended: 128 MB per each voice



Windows, Windows Server

Product features

Sampling rate

22.05 kHz

22.05 kHz

Audio formats

PCM 16 bit mono

PCM 16 bit mono


Speech server (daemon), tools, documentation, examples (C/C++, PHP, Perl)

SAPI component, tools, documentation


Command line, TCP/IP, Unix socket

SAPI 5, Command line

Standards compliance

W3C SSML 1.0/1.1, W3C PLS 1.0 (with IVONA extensions)

W3C SSML 1.0/1.1, W3C PLS 1.0 (with IVONA extensions), SAPI markup (with support for mixing with SSML tags)

Copyright © 2015 IVONA Software. All rights reserved. Terms of Use | Privacy Policy