You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 22, 2023. It is now read-only.
It has helpers for better training by training against auto-generated phonograms
Training by Theano can be much faster, since CTC calculation may be done by GPU
Training
If you want train by Theano you'll need Theano>=0.10 since It has bindings for Baidu's CTC.
Using Phonogram
HalfPhonemeModelWrapper class in model_wrp module implements training of a model with half of RNN layers trained for Phonorgrams and rest of them for actual output text. To generate Phonograms, Logios tool of CMU Sphinx can be used. Sphinx Phonogram symbols are called Arpabets. To generate Arpabets from Baidu's DeepSpeech description files you can:
visualize.py will give you a semi-shell for testing your model by giving it input files. There is also models-evaluation notebook, though it may look too dirty..
Pre-trained models
These models are trained for about three days by LibriSpeech corpus on a GTX 1080 Ti GPU:
A five layers unidirectional RNN model trained by LibriSpeech using Theano: mega, drive
A five layers unidirectional RNN model trained by LibriSpeech using Tensorflow: mega, drive
Validation WER CER of these models on test-clean is about %5 an It's about %15 on test-other.