DeepTranslit: Towards better transliteration for Indic languages.
telugu, kannada, tamil, malayalam, marathi, hindi are the current supported languages.
Usage
Via docker
# Start the container in background
docker run -d -p 8080:8080 notaitech/deeptranslit:hindi
docker run -d -p 8080:8080 notaitech/deeptranslit:hindi
# Query from python
import requests
requests.post('http://localhost:8080/sync', json={"data": ['mera naam amitab.']}).json()
import requests
requests.post('http://localhost:8080/sync', json={"data": ['mera naam amitab.']}).json()
As python module
pip install --upgrade deeptranslit
from deeptranslit import DeepTranslit
# hindi
transliterator = DeepTranslit('hindi')
# Single sentence prediction
transliterator.transliterate('mera naam amitab.')
# [{'pred': 'meraa naam amitaab.', 'prob': 0.25336900797483103}]
# Multiple sentence prediction
transliterator.transliterate(['mera naam amitab.', 'amitab-aur-abhishek'])
#[[{'pred': 'meraa naam amitaab.', 'prob': 0.25336900797483103}],
# [{'pred': 'amitaab-aur-abhissek', 'prob': 0.1027598988040056}]]
# hindi
transliterator = DeepTranslit('hindi')
# Single sentence prediction
transliterator.transliterate('mera naam amitab.')
# [{'pred': 'meraa naam amitaab.', 'prob': 0.25336900797483103}]
# Multiple sentence prediction
transliterator.transliterate(['mera naam amitab.', 'amitab-aur-abhishek'])
#[[{'pred': 'meraa naam amitaab.', 'prob': 0.25336900797483103}],
# [{'pred': 'amitaab-aur-abhissek', 'prob': 0.1027598988040056}]]
Notes:
- Tokens (characters) not present in input space (english alphabet) are copied over to output.
- eg: (
amitab.->amitaab.,amitab-aur-abhishek->amitaab-aur-abhissek)
- eg: (
- Predictions are cached at word level. i.e: computationally,
transliterate('amitab amitab')is equivalent totransliterate('amitab')ortransliterate('amitab amitab amitab')