Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Silero Models: pre-trained text-to-speech models made embarrassingly simple

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE_CIS
Notifications You must be signed in to change notification settings

snakers4/silero-models

Repository files navigation

  • Silero Models
    • Installation and Basics
    • Text-To-Speech
      • Models and Speakers
        • V5
        • V5 CIS Base Models
        • V5 CIS Ext Models
        • V4
        • V3
      • Dependencies
      • PyTorch
      • Standalone Use
      • SSML
      • Cyrillic languages v4
      • Indic languages v4
        • Example
        • Supported languages
    • Contact
    • Licence
    • Citations
    • Further reading
      • English
      • Chinese
      • Russian

Silero Models

Our TTS models satisfy the following criteria:

  • Fully end-to-end;
  • Large library of voices;
  • Natural-sounding speech;
  • One-line usage, minimal, portable;
  • Impressively fast on CPU and GPU;
  • For the Russian language - automated stress and homographs;

Installation and Basics

You can basically use our models in 3 flavours:

  • Via PyTorch Hub: torch.hub.load();
  • Via pip: pip install silero and then from silero import silero_tts;
  • Via caching the required models and utils manually and modifying if necessary;

Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these docs for more information.

PyTorch Hub and pip package are based on the same code. All of the torch.hub.load examples can be used with the pip package via this basic change:

from silero import silero_tts
model, example_text = silero_tts(language='ru',
speaker='v5_ru')
audio = model.apply_tts(text=example_text)

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.

V5

V5 models support SSML. Also see Colab examples for main SSML tag usage.

Russian-only models support automated stress and homographs.

ID Speakers Auto-stress / Homographs Language SR Colab
v5_ru aidar, baya, kseniya, xenia, eugene yes / yes ru (Russian) 8000, 24000, 48000

V5 CIS Base Models

  • All of the below models support 8000, 24000, 48000 sampling rates and contain no auto-stress or homographs;
  • v5_cis_base models assume that proper stress should be added for each word for all languages, i.e. k+oshka;
  • v5_cis_base_nostress models assume that proper stress should be added for each word ONLY for slavic languages (i.e. ru, bel, ukr);
  • All of the below models are published under MIT licence;
  • V5 UTMOS and throughput metrics;
  • V5 models support SSML. Also see Colab examples for main SSML tag usage;
  • Use cases for the model;
  • Minimal system requirements: a PyTorch-compatible system, a modern processor with AVX2 instruction set for x86/64 platform.
ID Speakers Language Colab
v5_cis_base, v5_cis_base_nostress aze_gamat aze (Azerbaijani)
v5_cis_base, v5_cis_base_nostress hye_zara hye (Armenian)
v5_cis_base, v5_cis_base_nostress bak_aigul, bak_alfia, bak_alfia2 bak (Bashkir)
v5_cis_base, v5_cis_base_nostress bak_miyau, bak_ramilia bak (Bashkir)
v5_cis_base, v5_cis_base_nostress bel_anatoliy, bel_dmitriy, bel_larisa bel (Belarus)
v5_cis_base, v5_cis_base_nostress kat_vika kat (Georgian)
v5_cis_base, v5_cis_base_nostress kbd_eduard kbd (Kab.-Cherkes)
v5_cis_base, v5_cis_base_nostress kaz_zhadyra, kaz_zhazira kaz (Kazakh)
v5_cis_base, v5_cis_base_nostress xal_kejilgan, xal_kermen xal (Kalmyk)
v5_cis_base, v5_cis_base_nostress kir_nurgul kir (Kyrgyz)
v5_cis_base, v5_cis_base_nostress mdf_oksana mdf (Moksha)
v5_cis_base, v5_cis_base_nostress all of these speakers, but with ru_ prefix ru (Russian)
v5_cis_base, v5_cis_base_nostress tgk_onaoy, tgk_safarhuja tgk (Tajik)
v5_cis_base, v5_cis_base_nostress tat_albina, tat_marat tat (Tatar)
v5_cis_base, v5_cis_base_nostress udm_bogdan udm (Udmurt)
v5_cis_base, v5_cis_base_nostress uzb_saida uzb (Uzbek)
v5_cis_base, v5_cis_base_nostress ukr_igor, ukr_roman ukr (Ukrainian)
v5_cis_base, v5_cis_base_nostress kjh_karina, kjh_sibday kjh (Khakas)
v5_cis_base, v5_cis_base_nostress chv_ekaterina chv (Chuvash)
v5_cis_base, v5_cis_base_nostress erz_alexandr erz (Erzya)
v5_cis_base, v5_cis_base_nostress sah_zinaida sah (Yakut)
Supported alphabets

Please note that Georgian and Armenian are in fact internally supported via direct translation into cyrillic script inside of the package. Azerbaijani and Uzbek support both alphabets (Cyrillic and Latin).

ID Nazvanie Alfavit(y)
aze aze (Azerbaijani) abccde@fgghxiijkqlmnooprsstuuvyz
aze aze (Azerbaijani) abvgg'de@zhziyjkk'lmnooprstuufkhhchch'sh
hye hye (Armenian) abgdezeet`zhilkhtskhdzghchmynshoch`pjrhsvtrts`wp`k`ofew
bak bak (Bashkir) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiog'z'k'n's'uh@o
bel bel (Belarus) abvgdezhziklmnoprstufkhtschshy'eiuiaioiu
kat kat (Georgian) abgdevzt`iklmnopzhrstup`k`g'qshch`c`z'cchxjh
kbd kbd (Kab.-Cherkes) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaio
kaz kaz (Kazakh) abvgdezhziiklmnoprstufkhtschshshchy'eiuiaig'k'n'uu'h@o
xal xal (Kalmyk) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiazh'n'uh@o
kir kir (Kyrgyz) abvgdezhziiklmnoprstufkhtschshy'eiuiaion'uo
mdf mdf (Moksha) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaio
ru ru (Russian) abvgdeiozhziiklmnoprstufkhtschshshch'y'eiuia
tgk tgk (Tajik) abvgdezhziiklmnoprstufkhchsh'eiuiaiog'k'kh'ch'iu
tat tat (Tatar) abvgdezhziiklmnoprstufkhtschsh'y'eiuiazh'n'uh@o
udm udm (Udmurt) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiozhzioch
uzb uzb (Uzbek) abvgdezhziiklmnoprstufkhtschsh''eiuiaioug'k'kh'
uzb uzb (Uzbek) abcdefghijklmnopqrstuvxyz
ukr ukr (Ukrainian) abvgg'deiezhziiyiiklmnoprstufkhtschshshch'iuia
kjh kjh (Khakas) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaioig'n'ch'ou
chv chv (Chuvash) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaios'aieu
erz erz (Erzya) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaio
sah sah (Yakut) abvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiog'nguho

V5 CIS Ext Models

  • All of the below models support 8000, 24000, 48000 sampling rates and contain no auto-stress or homographs;
  • v5_cis_ext models assume that proper stress should be added for each word for all languages, i.e. k+oshka;
  • v5_cis_ext_nostress are coming soon;
  • All of the below models are published under CC-NC-BY licence;
  • V5 models support SSML. Also see Colab examples for main SSML tag usage.
ID Speakers Language Colab
v5_cis_ext kaz_abai, kaz_aidana, kaz_aisha, kaz_bakir, kaz_danara kaz (Kazakh)
v5_cis_ext xal_delghir, xal_erdni xal (Kalmyk)
v5_cis_ext tat_adiba, tat_alsou, tat_amir, tat_azat, tat_batir tat (Tatar)
v5_cis_ext tat_bulat, tat_damir, tat_guzel, tat_ildar, tat_ilgiz tat (Tatar)
v5_cis_ext tat_karim, tat_mansur, tat_murat, tat_rasima, tat_rustem tat (Tatar)
v5_cis_ext tat_timur, tat_zifa, tat_zufar, tat_zulfiya tat (Tatar)
v5_cis_ext uzb_anora, uzb_dilnavoz uzb (Uzbek)
v5_cis_ext ukr_kateryna, ukr_lada, ukr_mykyta, ukr_oleksa, ukr_tetiana ukr (Ukrainian)
v5_cis_ext chv_aihwa, chv_alima chv (Chuvash)

V4

V4 models support SSML. Also see Colab examples for main SSML tag usage.

V4 models: v4_ru, v4_cyrillic, v4_ua, v4_uz, v4_indic
ID Speakers Auto-stress Language SR Colab
v4_ru aidar, baya, kseniya, xenia, eugene, random yes ru (Russian) 8000, 24000, 48000
v4_cyrillic b_ava, marat_tt, kalmyk_erdni... no cyrillic (Avar, Tatar, Kalmyk, ...) 8000, 24000, 48000
v4_ua mykyta, random no ua (Ukrainian) 8000, 24000, 48000
v4_uz dilnavoz no uz (Uzbek) 8000, 24000, 48000
v4_indic hindi_male, hindi_female, ..., random no indic (Hindi, Telugu, ...) 8000, 24000, 48000

V3

V3 models support SSML. Also see Colab examples for main SSML tag usage.

V3 models: v3_en, v3_en_indic, v3_de, v3_es, v3_fr, v3_indic
ID Speakers Auto-stress Language SR Colab
v3_en en_0, en_1, ..., en_117, random no en (English) 8000, 24000, 48000
v3_en_indic tamil_female, ..., assamese_male, random no en (English) 8000, 24000, 48000
v3_de eva_k, ..., karlsson, random no de (German) 8000, 24000, 48000
v3_es es_0, es_1, es_2, random no es (Spanish) 8000, 24000, 48000
v3_fr fr_0, ..., fr_5, random no fr (French) 8000, 24000, 48000
v3_indic hindi_male, hindi_female, ..., random no indic (Hindi, Telugu, ...) 8000, 24000, 48000

Dependencies

Basic dependencies for Colab examples:

  • torch, 1.10+ for v3 models/ 2.0+ for v4 and v5 models;
  • torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
  • omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

# V5
import torch

language = 'ru'
model_id = 'v5_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.to(device) # gpu or cpu

audio = model.apply_tts(text=example_text,
speaker=speaker,
sample_rate=sample_rate)

Standalone Use

  • Standalone usage only requires PyTorch 1.12+ and the Python Standard Library;
  • Please see the detailed examples in Colab;
# V5
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v5_ru.pt',
local_file)

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_text = 'Menia zovut Leva Korolev. Ia iz gotov. I ia uzhe gotov otkryt' vse vashi zamki liuboi slozhnosti!'
sample_rate = 48000
speaker='baya'

audio_paths = model.save_wav(text=example_text,
speaker=speaker,
sample_rate=sample_rate)

SSML

Check out our TTS Wiki page.

Cyrillic languages v4

To be superseded with v5 model(s) soon.

Supported tokenset: !,-.:?iuoabvgdezhziiklmnoprstufkhtschshshch'y'eiuiaiodjgjieijnjtshkjufg'g'zh'z'k'k'n'ngs'uu'kh'ch'haaaeie@zhzioouuchy

Speaker_ID Language Gender
b_ava Avar F
b_bashkir Bashkir M
b_bulb Bulgarian M
b_bulc Bulgarian M
b_che Chechen M
b_cv Chuvash M
cv_ekaterina Chuvash F
b_myv Erzya M
b_kalmyk Kalmyk M
b_krc Karachay-Balkar M
kz_M1 Kazakh M
kz_M2 Kazakh M
kz_F3 Kazakh F
kz_F1 Kazakh F
kz_F2 Kazakh F
b_kjh Khakas F
b_kpv Komi-Ziryan M
b_lez Lezghian M
b_mhr Mari F
b_mrj Mari High M
b_nog Nogai F
b_oss Ossetic M
b_ru Russian M
b_tat Tatar M
marat_tt Tatar M
b_tyv Tuvinian M
b_udm Udmurt M
b_uzb Uzbek M
b_sah Yakut M
kalmyk_erdni Kalmyk M
kalmyk_delghir Kalmyk F

Indic languages v4

Example

(!!!) All input sentences should be romanized to ISO format using aksharamukha. An example for hindi:

# V3
import torch
from aksharamukha import transliterate

# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language='indic',
speaker='v4_indic')

orig_text = "prsidd kbiir adhyetaa, purussottm agrvaal kaa yh shodh aalekh, us raamaanNd kii khoj krtaa hai"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)

audio = model.apply_tts(roman_text,
speaker='hindi_male')

Supported languages

Language Speakers Romanization function
hindi hindi_female, hindi_male transliterate.process('Devanagari', 'ISO', orig_text)
malayalam malayalam_female, malayalam_male transliterate.process('Malayalam', 'ISO', orig_text)
manipuri manipuri_female transliterate.process('Bengali', 'ISO', orig_text)
bengali bengali_female, bengali_male transliterate.process('Bengali', 'ISO', orig_text)
rajasthani rajasthani_female, rajasthani_female transliterate.process('Devanagari', 'ISO', orig_text)
tamil tamil_female, tamil_male transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe'])
telugu telugu_female, telugu_male transliterate.process('Telugu', 'ISO', orig_text)
gujarati gujarati_female, gujarati_male transliterate.process('Gujarati', 'ISO', orig_text)
kannada kannada_female, kannada_male transliterate.process('Kannada', 'ISO', orig_text)

Contact

Try our models, create an issue, join our chat, email us, and read the latest news.

Licence

All of the models are published under the main repo license (i.e. CC-NC-BY) except for the base cis-tts models, which are under MIT.

Citations

@misc{Silero Models,
author = {Silero Team},
title = {Silero Models: pre-trained text-to-speech models made embarrassingly simple},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/snakers4/silero-models}},
commit = {insert_some_commit_here},
email = {hello@silero.ai}
}

Further reading

English

  • STT:

    • Towards an Imagenet Moment For Speech-To-Text - link
    • A Speech-To-Text Practitioners Criticisms of Industry and Academia - link
    • Modern Google-level STT Models Released - link
  • TTS:

    • Multilingual Text-to-Speech Models for Indic Languages - link
    • Our new public speech synthesis in super-high quality, 10x faster and more stable - link
    • High-Quality Text-to-Speech Made Accessible, Simple and Fast - link
  • VAD:

    • One Voice Detector to Rule Them All - link
    • Modern Portable Voice Activity Detector Released - link
  • Text Enhancement:

    • We have published a model for text repunctuation and recapitalization for four languages - link

Chinese

  • STT:
    • Mai Xiang Yu Yin Shi Bie Ling Yu De ImageNet Shi Ke - link
    • Yu Yin Ling Yu Xue Zhu Jie He Gong Ye Jie De Qi Zong Zui - link

Russian

  • STT

    • OpenAI reshili raspoznavanie rechi! Razbiraemsia tak li eto ... - link
    • Nashi servisy dlia besplatnogo raspoznavaniia rechi stali luchshe i udobnee - link
    • Telegram-bot Silero besplatno perevodit rech' v tekst - link
    • Besplatnoe raspoznavanie rechi dlia vsekh zhelaiushchikh - link
    • Poslednie obnovleniia modelei raspoznavaniia rechi iz Silero Models - link
    • Szhimaem transformery: prostye, universal'nye i prikladnye sposoby cdelat' ikh kompaktnymi i bystrymi - link
    • Ul'timativnoe sravnenie sistem raspoznavaniia rechi: Ashmanov, Google, Sber, Silero, Tinkoff, Yandex - link
    • My opublikovali sovremennye STT modeli sravnimye po kachestvu s Google - link
    • Ponizhaem bar'ery na vkhod v raspoznavanie rechi - link
    • Ogromnyi otkrytyi dataset russkoi rechi versiia 1.0 - link
    • Naskol'ko Bystroi Mozhno Sdelat' Sistemu STT? - link
    • Nasha sistema Speech-To-Text - link
    • Speech-To-Text - link
  • TTS:

    • Delaem bystryi, kachestvennyi i dostupnyi sintez na iazykakh Rossii -- nuzhno vashe uchastie - link
    • My reshili zadachu omografov i udarenii v russkom iazyke - link
    • Teper' nash sintez takzhe dostupen v vide bota v Telegrame - link
    • Mozhet li sintez rechi obmanut' sistemu biometricheskoi identifikatsii? - link
    • Teper' nash sintez na 20 iazykakh - link
    • Teper' nash publichnyi sintez v super-vysokom kachestve, v 10 raz bystree i bez detskikh boliachek - link
    • Sinteziruem golos babushki, dedushki i Lenina + novosti nashego publichnogo sinteza - link
    • My sdelali nash publichnyi sintez rechi eshche luchshe - link
    • My Opublikovali Kachestvennyi, Prostoi, Dostupnyi i Bystryi Sintez Rechi - link
  • VAD:

    • Novyi reliz publichnogo detektora golosa Silero VAD v6 - link
    • Nash publichnyi detektor golosa stal luchshe - link
    • A ty ispol'zuesh' VAD? Chto eto takoe i zachem on nuzhen - link
    • Modeli dlia Detektsii Rechi, Chisel i Raspoznavaniia Iazykov - link
    • My opublikovali sovremennyi Voice Activity Detector i ne tol'ko -link
  • Text Enhancement:

    • Vosstanovlenie znakov punktuatsii i zaglavnykh bukv -- teper' i na dlinnykh tekstakh - link
    • My opublikovali model', rasstavliaiushchuiu znaki prepinaniia i zaglavnye bukvy v tekste na chetyrekh iazykakh - link