This repository was archived by the owner on Mar 22, 2023. It is now read-only.

reith / deepspeech-playground Public archive

Notifications You must be signed in to change notification settings
Fork 4
Star 23

Baidu's DeepSpeech updated for better training

License

Apache-2.0 license

23 stars 4 forks Branches Tags Activity

Star

Notifications

reith/deepspeech-playground

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
pre-trained		pre-trained
LICENSE		LICENSE
README.md		README.md
arpabets.py		arpabets.py
beamsearch.py		beamsearch.py
char_map.py		char_map.py
create_arpabet_json.py		create_arpabet_json.py
create_desc_json.py		create_desc_json.py
data_generator.py		data_generator.py
download.sh		download.sh
flac_to_wav.sh		flac_to_wav.sh
flac_to_wav_ffmpeg.sh		flac_to_wav_ffmpeg.sh
model.py		model.py
model.tar.gz		model.tar.gz
model_wrp.py		model_wrp.py
models-evaluation.ipynb		models-evaluation.ipynb
plot.py		plot.py
test.py		test.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py
visualize.py		visualize.py

Repository files navigation

deepspeech-playground

This repo is a fork of Baidu's DeepSpeech. Unlike Baidu's repo:

It works with both Tensorflow and Theano
It has helpers for better training by training against auto-generated phonograms
Training by Theano can be much faster, since CTC calculation may be done by GPU

Training

If you want train by Theano you'll need Theano>=0.10 since It has bindings for Baidu's CTC.

Using Phonogram

HalfPhonemeModelWrapper class in model_wrp module implements training of a model with half of RNN layers trained for Phonorgrams and rest of them for actual output text. To generate Phonograms, Logios tool of CMU Sphinx can be used. Sphinx Phonogram symbols are called Arpabets. To generate Arpabets from Baidu's DeepSpeech description files you can:

train_corpus.txt # make_pronunciation.pl script is provided by logios # https://github.com/skerit/cmusphinx/tree/master/logios/Tools /MakeDict $ perl ./make_pronunciation.pl -tools ../ -dictdir . -words prons/train_corpus.txt -dict prons/train_corpus.dict $ python create_arpabet_json.py train_corpus.json train_corpus.dict train_corpus.arpadesc">$ cat train_corpus.json | sed -e 's/.*"text": "$[^"]*$".*/\1/' > train_corpus.txt # make_pronunciation.pl script is provided by logios # https://github.com/skerit/cmusphinx/tree/master/logios/Tools/MakeDict $ perl ./make_pronunciation.pl -tools ../ -dictdir . -words prons/train_corpus.txt -dict prons/train_corpus.dict $ python create_arpabet_json.py train_corpus.json train_corpus.dict train_corpus.arpadesc

Choose backend

Select Keras backend by environment variable KERAS_BACKEND to theano or tensorflow.

Train!

Make a train routine, a function like this:

def train_sample_half_phoneme(datagen, save_dir, epochs, sortagrad, start_weights=False, mb_size=60): model_wrp = HalfPhonemeModelWrapper() model = model_wrp.compile(nodes=1000, conv_context=5, recur_layers=5) logger.info('model :\n%s' % (model.to_yaml(),)) if start_weights: model.load_weights(start_weights) train_fn, test_fn = (model_wrp.compile_train_fn(1e-4), model_wrp.compile_test_fn()) trainer = Trainer(model, train_fn, test_fn, on_text=True, on_phoneme=True) trainer.run(datagen, save_dir, epochs=epochs, do_sortagrad=sortagrad, mb_size=mb_size, stateful=False) return trainer, model_wrp

And call it in from main() of train.py. Training can be done by:

$ KERAS_BACKEND="tensorflow" python train.py descs/small.arpadesc descs/test-clean.arpadesc models/test --epochs 20 --use-arpabets --sortagrad 1

Evaluation

visualize.py will give you a semi-shell for testing your model by giving it input files. There is also models-evaluation notebook, though it may look too dirty..

Pre-trained models

These models are trained for about three days by LibriSpeech corpus on a GTX 1080 Ti GPU:

A five layers unidirectional RNN model trained by LibriSpeech using Theano: mega, drive
A five layers unidirectional RNN model trained by LibriSpeech using Tensorflow: mega, drive

Validation ~~WER~~ CER of these models on test-clean is about %5 an It's about %15 on test-other.

About

Baidu's DeepSpeech updated for better training

Topics

deep-learning deepspeech

Resources

Readme

License

Apache-2.0 license

Activity

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

reith/deepspeech-playground

Folders and files

Latest commit

History

Repository files navigation

deepspeech-playground

Training

Using Phonogram

Choose backend

Train!

Evaluation

Pre-trained models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

reith/deepspeech-playground

Folders and files

Latest commit

History

Repository files navigation

deepspeech-playground

Training

Using Phonogram

Choose backend

Train!

Evaluation

Pre-trained models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages