- Silero Models
- Installation and Basics
- Text-To-Speech
- Models and Speakers
- V5
- V5 CIS Base Models
- V5 CIS Ext Models
- V4
- V3
- Dependencies
- PyTorch
- Standalone Use
- SSML
- Cyrillic languages v4
- Indic languages v4
- Example
- Supported languages
- Models and Speakers
- Contact
- Licence
- Citations
- Further reading
- English
- Chinese
- Russian
Silero Models
Our TTS models satisfy the following criteria:
- Fully end-to-end;
- Large library of voices;
- Natural-sounding speech;
- One-line usage, minimal, portable;
- Impressively fast on CPU and GPU;
- For the Russian language - automated stress and homographs;
Installation and Basics
You can basically use our models in 3 flavours:
- Via PyTorch Hub:
torch.hub.load(); - Via pip:
pip install sileroand thenfrom silero import silero_tts; - Via caching the required models and utils manually and modifying if necessary;
Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these docs for more information.
PyTorch Hub and pip package are based on the same code. All of the torch.hub.load examples can be used with the pip package via this basic change:
model, example_text = silero_tts(language='ru',
speaker='v5_ru')
audio = model.apply_tts(text=example_text)
Text-To-Speech
Models and Speakers
All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.
V5
V5 models support SSML. Also see Colab examples for main SSML tag usage.
Russian-only models support automated stress and homographs.
| ID | Speakers | Auto-stress / Homographs | Language | SR | Colab |
|---|---|---|---|---|---|
v5_ru |
aidar, baya, kseniya, xenia, eugene |
yes / yes | ru (Russian) |
8000, 24000, 48000 |
V5 CIS Base Models
- All of the below models support
8000,24000,48000sampling rates and contain no auto-stress or homographs; v5_cis_basemodels assume that proper stress should be added for each word for all languages, i.e.k+oshka;v5_cis_base_nostressmodels assume that proper stress should be added for each word ONLY for slavic languages (i.e.ru,bel,ukr);- All of the below models are published under
MITlicence; - V5 UTMOS and throughput metrics;
- V5 models support SSML. Also see Colab examples for main SSML tag usage;
- Use cases for the model;
- Minimal system requirements: a PyTorch-compatible system, a modern processor with AVX2 instruction set for x86/64 platform.
Supported alphabets
Please note that Georgian and Armenian are in fact internally supported via direct translation into cyrillic script inside of the package. Azerbaijani and Uzbek support both alphabets (Cyrillic and Latin).
| ID | Nazvanie | Alfavit(y) |
|---|---|---|
| aze | aze (Azerbaijani) |
abccde@fgghxiijkqlmnooprsstuuvyz |
| aze | aze (Azerbaijani) |
abvgg'de@zhziyjkk'lmnooprstuufkhhchch'sh |
| hye | hye (Armenian) |
abgdezeet`zhilkhtskhdzghchmynshoch`pjrhsvtrts`wp`k`ofew |
| bak | bak (Bashkir) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiog'z'k'n's'uh@o |
| bel | bel (Belarus) |
abvgdezhziklmnoprstufkhtschshy'eiuiaioiu |
| kat | kat (Georgian) |
abgdevzt`iklmnopzhrstup`k`g'qshch`c`z'cchxjh |
| kbd | kbd (Kab.-Cherkes) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaio |
| kaz | kaz (Kazakh) |
abvgdezhziiklmnoprstufkhtschshshchy'eiuiaig'k'n'uu'h@o |
| xal | xal (Kalmyk) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiazh'n'uh@o |
| kir | kir (Kyrgyz) |
abvgdezhziiklmnoprstufkhtschshy'eiuiaion'uo |
| mdf | mdf (Moksha) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaio |
| ru | ru (Russian) |
abvgdeiozhziiklmnoprstufkhtschshshch'y'eiuia |
| tgk | tgk (Tajik) |
abvgdezhziiklmnoprstufkhchsh'eiuiaiog'k'kh'ch'iu |
| tat | tat (Tatar) |
abvgdezhziiklmnoprstufkhtschsh'y'eiuiazh'n'uh@o |
| udm | udm (Udmurt) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiozhzioch |
| uzb | uzb (Uzbek) |
abvgdezhziiklmnoprstufkhtschsh''eiuiaioug'k'kh' |
| uzb | uzb (Uzbek) |
abcdefghijklmnopqrstuvxyz |
| ukr | ukr (Ukrainian) |
abvgg'deiezhziiyiiklmnoprstufkhtschshshch'iuia |
| kjh | kjh (Khakas) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaioig'n'ch'ou |
| chv | chv (Chuvash) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaios'aieu |
| erz | erz (Erzya) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaio |
| sah | sah (Yakut) |
abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiog'nguho |
V5 CIS Ext Models
- All of the below models support
8000,24000,48000sampling rates and contain no auto-stress or homographs; v5_cis_extmodels assume that proper stress should be added for each word for all languages, i.e.k+oshka;v5_cis_ext_nostressare coming soon;- All of the below models are published under
CC-NC-BYlicence; - V5 models support SSML. Also see Colab examples for main SSML tag usage.
V4
V4 models support SSML. Also see Colab examples for main SSML tag usage.
V4 models: v4_ru, v4_cyrillic, v4_ua, v4_uz, v4_indic
V3
V3 models support SSML. Also see Colab examples for main SSML tag usage.
V3 models: v3_en, v3_en_indic, v3_de, v3_es, v3_fr, v3_indic
Dependencies
Basic dependencies for Colab examples:
torch, 1.10+ for v3 models/ 2.0+ for v4 and v5 models;torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);omegaconf, latest (can be removed as well, if you do not load all of the configs);
PyTorch
import torch
language = 'ru'
model_id = 'v5_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.to(device) # gpu or cpu
audio = model.apply_tts(text=example_text,
speaker=speaker,
sample_rate=sample_rate)
Standalone Use
- Standalone usage only requires PyTorch 1.12+ and the Python Standard Library;
- Please see the detailed examples in Colab;
import os
import torch
device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'
if not os.path.isfile(local_file):
torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v5_ru.pt',
local_file)
model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)
example_text = 'Menia zovut Leva Korolev. Ia iz gotov. I ia uzhe gotov otkryt' vse vashi zamki liuboi slozhnosti!'
sample_rate = 48000
speaker='baya'
audio_paths = model.save_wav(text=example_text,
speaker=speaker,
sample_rate=sample_rate)
SSML
Check out our TTS Wiki page.
Cyrillic languages v4
To be superseded with v5 model(s) soon.
Supported tokenset:
!,-.:?iuoabvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiodjgjieijnjtshkjufg'g'zh'z'k'k'n'ngs'uu'kh'ch'haaaeie@zhzioouuchy
| Speaker_ID | Language | Gender |
|---|---|---|
| b_ava | Avar | F |
| b_bashkir | Bashkir | M |
| b_bulb | Bulgarian | M |
| b_bulc | Bulgarian | M |
| b_che | Chechen | M |
| b_cv | Chuvash | M |
| cv_ekaterina | Chuvash | F |
| b_myv | Erzya | M |
| b_kalmyk | Kalmyk | M |
| b_krc | Karachay-Balkar | M |
| kz_M1 | Kazakh | M |
| kz_M2 | Kazakh | M |
| kz_F3 | Kazakh | F |
| kz_F1 | Kazakh | F |
| kz_F2 | Kazakh | F |
| b_kjh | Khakas | F |
| b_kpv | Komi-Ziryan | M |
| b_lez | Lezghian | M |
| b_mhr | Mari | F |
| b_mrj | Mari High | M |
| b_nog | Nogai | F |
| b_oss | Ossetic | M |
| b_ru | Russian | M |
| b_tat | Tatar | M |
| marat_tt | Tatar | M |
| b_tyv | Tuvinian | M |
| b_udm | Udmurt | M |
| b_uzb | Uzbek | M |
| b_sah | Yakut | M |
| kalmyk_erdni | Kalmyk | M |
| kalmyk_delghir | Kalmyk | F |
Indic languages v4
Example
(!!!) All input sentences should be romanized to ISO format using aksharamukha. An example for hindi:
import torch
from aksharamukha import transliterate
# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language='indic',
speaker='v4_indic')
orig_text = "prsidd kbiir adhyetaa, purussottm agrvaal kaa yh shodh aalekh, us raamaanNd kii khoj krtaa hai"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)
audio = model.apply_tts(roman_text,
speaker='hindi_male')
Supported languages
| Language | Speakers | Romanization function |
|---|---|---|
| hindi | hindi_female, hindi_male |
transliterate.process('Devanagari', 'ISO', orig_text) |
| malayalam | malayalam_female, malayalam_male |
transliterate.process('Malayalam', 'ISO', orig_text) |
| manipuri | manipuri_female |
transliterate.process('Bengali', 'ISO', orig_text) |
| bengali | bengali_female, bengali_male |
transliterate.process('Bengali', 'ISO', orig_text) |
| rajasthani | rajasthani_female, rajasthani_female |
transliterate.process('Devanagari', 'ISO', orig_text) |
| tamil | tamil_female, tamil_male |
transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe']) |
| telugu | telugu_female, telugu_male |
transliterate.process('Telugu', 'ISO', orig_text) |
| gujarati | gujarati_female, gujarati_male |
transliterate.process('Gujarati', 'ISO', orig_text) |
| kannada | kannada_female, kannada_male |
transliterate.process('Kannada', 'ISO', orig_text) |
Contact
Try our models, create an issue, join our chat, email us, and read the latest news.
Licence
All of the models are published under the main repo license (i.e. CC-NC-BY) except for the base cis-tts models, which are under MIT.
Citations
author = {Silero Team},
title = {Silero Models: pre-trained text-to-speech models made embarrassingly simple},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/snakers4/silero-models}},
commit = {insert_some_commit_here},
email = {hello@silero.ai}
}
Further reading
English
-
STT:
-
TTS:
-
VAD:
-
Text Enhancement:
- We have published a model for text repunctuation and recapitalization for four languages - link
Chinese
- STT:
Russian
-
STT
- OpenAI reshili raspoznavanie rechi! Razbiraemsia tak li eto ... - link
- Nashi servisy dlia besplatnogo raspoznavaniia rechi stali luchshe i udobnee - link
- Telegram-bot Silero besplatno perevodit rech' v tekst - link
- Besplatnoe raspoznavanie rechi dlia vsekh zhelaiushchikh - link
- Poslednie obnovleniia modelei raspoznavaniia rechi iz Silero Models - link
- Szhimaem transformery: prostye, universal'nye i prikladnye sposoby cdelat' ikh kompaktnymi i bystrymi - link
- Ul'timativnoe sravnenie sistem raspoznavaniia rechi: Ashmanov, Google, Sber, Silero, Tinkoff, Yandex - link
- My opublikovali sovremennye STT modeli sravnimye po kachestvu s Google - link
- Ponizhaem bar'ery na vkhod v raspoznavanie rechi - link
- Ogromnyi otkrytyi dataset russkoi rechi versiia 1.0 - link
- Naskol'ko Bystroi Mozhno Sdelat' Sistemu STT? - link
- Nasha sistema Speech-To-Text - link
- Speech-To-Text - link
-
TTS:
- Delaem bystryi, kachestvennyi i dostupnyi sintez na iazykakh Rossii -- nuzhno vashe uchastie - link
- My reshili zadachu omografov i udarenii v russkom iazyke - link
- Teper' nash sintez takzhe dostupen v vide bota v Telegrame - link
- Mozhet li sintez rechi obmanut' sistemu biometricheskoi identifikatsii? - link
- Teper' nash sintez na 20 iazykakh - link
- Teper' nash publichnyi sintez v super-vysokom kachestve, v 10 raz bystree i bez detskikh boliachek - link
- Sinteziruem golos babushki, dedushki i Lenina + novosti nashego publichnogo sinteza - link
- My sdelali nash publichnyi sintez rechi eshche luchshe - link
- My Opublikovali Kachestvennyi, Prostoi, Dostupnyi i Bystryi Sintez Rechi - link
-
VAD:
- Novyi reliz publichnogo detektora golosa Silero VAD v6 - link
- Nash publichnyi detektor golosa stal luchshe - link
- A ty ispol'zuesh' VAD? Chto eto takoe i zachem on nuzhen - link
- Modeli dlia Detektsii Rechi, Chisel i Raspoznavaniia Iazykov - link
- My opublikovali sovremennyi Voice Activity Detector i ne tol'ko -link
-
Text Enhancement: