myanmar-tts
所属分类:模式识别(视觉/语音等)
开发工具:Python
文件大小:26KB
下载次数:0
上传日期:2020-03-04 16:05:57
上 传 者:
sh-1993
说明: no intro
(myanmar-tts,Myanmar Text-to-Speech with End-to-End Speech Synthesis)
文件列表:
app.py (4432, 2020-03-05)
constants (0, 2020-03-05)
constants\hparams.py (706, 2020-03-05)
model (0, 2020-03-05)
model\__init__.py (0, 2020-03-05)
model\feeder.py (6223, 2020-03-05)
model\helpers.py (2771, 2020-03-05)
model\modules.py (4885, 2020-03-05)
model\networks.py (6150, 2020-03-05)
model\tacotron.py (3930, 2020-03-05)
preprocess.py (4219, 2020-03-05)
requirements.txt (94, 2020-03-05)
signal_proc (0, 2020-03-05)
signal_proc\__init__.py (0, 2020-03-05)
signal_proc\audio.py (3894, 2020-03-05)
signal_proc\synthesizer.py (1768, 2020-03-05)
test.py (2164, 2020-03-05)
text (0, 2020-03-05)
text\__init__.py (0, 2020-03-05)
text\character_set.py (910, 2020-03-05)
text\mm_num2word.py (3509, 2020-03-05)
text\tokenizer.py (2135, 2020-03-05)
train.py (5619, 2020-03-05)
utils (0, 2020-03-05)
utils\__init__.py (444, 2020-03-05)
utils\logger.py (661, 2020-03-05)
utils\plotter.py (473, 2020-03-05)
# Myanmar End-to-End Text-to-Speech
This is the development of a Myanmar Text-to-Speech system with the famous End-to-End Speech Synthesis Model, Tacotron. It is a part of a thesis for B.E. Degree that I've been assigned at Yangon Technological University. My supervisor was **Dr. Yuzana Win** and she guided me throughout this development.
## License
This work is licensed under the Creative Commons Attribution-NonCommercial-Share Alike 4.0 International (CC BY-NC-SA 4.0) License. View detailed info of the [license](http://creativecommons.org/licenses/by-nc-sa/4.0/).
myanmar-tts educational purpose commercial use case
## Corpus
**[Base Technology, Expa.Ai (Myanmar)](https://expa.ai)** kindly provided Myanmar text corpus and their amazing tool for creating speech corpus.
Speech corpus (mmSpeech as I call it) is created solely on my own with a recorder tool (as previously mentioned) and it currently contains over 5,000 recorded `` pairs. I intend to upload the created corpus on some channel in future.
## Instructions
### Installing dependencies
1. Install Python 3
2. Install [TensorFlow](https://www.tensorflow.org/install/)
3. Install a number of modules
```
pip install -r requirements.txt
```
### Preparing Text and Audio Dataset
1. First of all, the corpus should reside in `~/mm-tts`, although it is not a **must** and can easily be changed by a command line argument.
```
mm-tts
| mmSpeech
| metadata.csv
| wavs
```
2. **Preprocess the data**
```
python3 preprocess.py
```
After it is done, you should see the outputs in `~/mm-tts/training/`
### Training
```
python3 train.py
```
If you want to restore the step from a checkpoint
```
python3 train.py --restore_step Number
```
### Evaluation
There are some sentences defined in test.py, you may test them out with the trained model to see how good the current model is.
```
python3 test.py --checkpoint /path/to/checkpoint
```
### Testing with Custom Inputs
There is a simple app implemented to try out the trained models for their performance.
```
python3 app.py --checkpoint /path/to/checkpoint
```
This will create a simple web app listening at port 4000 unless you specify.
Open up your browser and go to `http://localhost:4000`, you should see a simple interface with a text input to get the text from the user.
### Pretrained Model
* [mmSpeech 150K Steps](https://drive.google.com/open?id=1P3JQYjGNoPbNykOg4-45LPUsZGuWkAd7)
### Generated Audio Samples
Generated Samples are available on SoundCloud
* [mmSpeech 150K Steps](https://soundcloud.com/htoo-pyae-466960846/sets/mmspeech-outputs)
### Notes
* Google Colab which gives excellent GPU access was used for training this model.
* On average, each step tooks about 1.6 seconds and at peak, each step took about 1.2 and sometimes 1.1 seconds.
* For my thesis, I have trained this model for 150,000 steps (took me about a week).
### Loss Curve
Below is the produced loss curves from training mmSpeech for 150,000 Steps.
![Loss](https://user-images.githubusercontent.com/34838719/65132116-7353e080-da26-11e9-9299-a08883811f47.png)
### Alignment Plot
![Alignment Plot](https://user-images.githubusercontent.com/34838719/65257737-afbb3580-db27-11e9-9046-e9a9931c55d3.gif)
### References
* [Tacotron: Towards End-to-End Speech Synthesis](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwiBjuL828vkAhWh6nMBHYccCdYQFjABegQIABAB&url=https%3A%2F%2Farxiv.org%2Fabs%2F1703.10135&usg=AOvVaw0_KT-Hbe9h_egPMynMsJOM)
* [keithito/tacotron](https://github.com/keithito/tacotron)
近期下载者:
相关文件:
收藏者: