shallow-wavenet
所属分类:音频处理
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2020-02-09 13:13:30
上 传 者:
sh-1993
说明: 浅波网,,
(shallow-wavenet,,)
文件列表:
LICENSE (11357, 2020-02-09)
egs/ (0, 2020-02-09)
egs/shallow-wavenet/ (0, 2020-02-09)
egs/shallow-wavenet/cmd.sh (1067, 2020-02-09)
egs/shallow-wavenet/conf/ (0, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SF1.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SF2.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SF3.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SF4.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SM1.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SM2.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SM3.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/VCC2SM4.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/VCC2TF1.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/VCC2TF2.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/VCC2TM1.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/VCC2TM2.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/bdl.f0 (7, 2020-02-09)
egs/shallow-wavenet/conf/slt.f0 (8, 2020-02-09)
egs/shallow-wavenet/conf/slurm.conf (393, 2020-02-09)
egs/shallow-wavenet/loss_summary.sh (8060, 2020-02-09)
egs/shallow-wavenet/path.sh (442, 2020-02-09)
egs/shallow-wavenet/proc_loss_log.awk (1309, 2020-02-09)
egs/shallow-wavenet/proc_loss_log_cont.awk (1883, 2020-02-09)
egs/shallow-wavenet/proc_loss_log_cont_resume.awk (1898, 2020-02-09)
egs/shallow-wavenet/proc_loss_log_resume.awk (1309, 2020-02-09)
egs/shallow-wavenet/run.sh (32582, 2020-02-09)
src/ (0, 2020-02-09)
src/bin/ (0, 2020-02-09)
src/bin/calc_stats.py (9298, 2020-02-09)
src/bin/decode_cswnv_laplace-shift1.py (11011, 2020-02-09)
src/bin/decode_dswnv_softmax.py (11270, 2020-02-09)
src/bin/feature_extract.py (12694, 2020-02-09)
src/bin/fine-tune_cswnv_laplace-stftcmplx_shift1.py (42016, 2020-02-09)
src/bin/fine-tune_dswnv_softmax.py (23888, 2020-02-09)
src/bin/noise_shaping.py (6598, 2020-02-09)
src/bin/spk_stat.py (4398, 2020-02-09)
src/bin/train_cswnv_laplace-stftcmplx_shift1.py (43520, 2020-02-09)
src/bin/train_dswnv_softmax.py (25044, 2020-02-09)
... ...
# Shallow WaveNet Vocoder with Laplacian Distribution using Multiple Samples Output based on Linear Prediction / with Softmax Output
## Install
Environment
$ cd tools
$ make
Python3.7 (for Python3.6, please edit Makefile accordingly)
$ cd ..
Main directory is 'egs/shallow-wavenet'
$ cd egs/shallow-wavenet
Waveforms are stored in
egs/shallow-wavenet/wav_
Experiment output directory will be located in
egs/shallow-wavenet/exp
h5 files directory will be located in
egs/shallow-wavenet/hdf5
stats and lists directory will be located in
egs/shallow-wavenet/data
Logging of training will be stored in
egs/shallow-wavenet/exp//log/
or
egs/shallow-wavenet/exp//train.log
Logging of decoding will be stored in
egs/shallow-wavenet/exp/decoding//
## Usage
open run.sh
set stage=0 for initialize data list and speaker configs.
set stage=init for computing F0 statistics of speakers
set stage=1 for feature extraction
set stage=2 for statistics of features
set stage=3 for noise shaping
set stage=4 for training step
set stage=5 for decoding
set stage=6 for restoring noise-shaped waveform
set stage=7 for fine-tuning step
set stage=8 for decoding of fine-tuned model
set stage=9 for restoring noise-shaped waveform from fine-tuned model
$ bash run.sh
1. set *fs*, *spks*, *wav\_org\_dir*, *data_name*, and *GPU\_device* variables accordingly
2. also set your design of training/development/testing sets in STAGE 0 accordingly
3. run STAGE 0init for new dataset to initialize configs. of the speakers, as well as the histograms of F0 of each speaker
4. please change the initial values of "*.f0" in "conf/" according to the calculated histograms in "exp/init\_spk\_stat/"
5. then, it's ready to start STAGE 2-4
6. [please set "n_jobs" variable accordingly for STAGE init/1, i.e., for doing parallel jobs of feature extraction]
7. run STAGE 7 for fine-tuning of target speaker using pretrained model from STAGE 4
set variables under "# DECODING/FINE-TUNING SETTING" accordingly for STAGE 5 and 6 / 7 / 8 and 9
## Links
Please refer to wavenet\_trained\_example/egs/shallow-wavenet for example of script settings, speaker configs, trained model, logs, and/or synthesized waveforms
within the following link:
* [package](https://drive.google.com/open?id=18AapBApXuiiJDFUocxn1hWs2Jit7v0or)
* [ASRU 2019] (https://drive.google.com/open?id=1pBHiIj8M3jAb0N1oU_0uR4KzvvgekoXJ)
* [ICASSP 2020] (https://drive.google.com/open?id=1B1FxkUdqOxAj7PtZIbvtyyIZOPN1RwYh)
* [GitHub] (https://github.com/patrickltobing/shallow-wavenet)
## Logging
To summarize the training log for determining the possible optimum model, please edit
"loss\_summary.sh" in the main directory accordingly, and run it.
* The contents of the output file from the "loss_summary.sh" may look something like this for continuous output with Laplacian distribution:
262 -5.719502 (+- 1.225024) 9.539128 dB (+- 0.800174 dB) 0.002558 (+- 0.002514) -5.836762 (+- 0.987462) 9.676376 dB (+- 0.662477 dB) 0.002122 (+- 0.001677)
> The followings are their correspondences:
262 --> number of epoch
-5.719502 (+- 1.225024) --> negative log-likelihood (NLL) in development set
9.539128 dB (+- 0.800174 dB) --> log-spectral distortion (LSD) in development set
0.002558 (+- 0.002514) --> absolute-error of waveform samples in development set
-5.836762 (+- 0.987462) --> negative log-likelihood (NLL) in training set
9.676376 dB (+- 0.662477 dB) --> log-spectral distortion (LSD) in training set
0.002122 (+- 0.001677) --> absolute-error of waveform samples in training set
* The contents of the output file from the "loss_summary.sh" may look something like this for discrete softmax output:
13 1.911825 (+- 0.605316) 1.836777 (+- 0.607977)
> The followings are their correspondences:
13 --> number of epoch
1.911825 (+- 0.605316) --> cross-entropy in development set
1.836777 (+- 0.607977) --> cross-entropy in training set
*all values should be lower for better*
## References
This system is based on the following papers:
1. P. L. Tobing, T. Hayashi, and T. Toda, "Investigation of Shallow WaveNet Vocoder with Laplacian Distribution Output", in Proc. IEEE ASRU, Sentosa, Singapore, Dec. 2019, pp. 176--183.
2. P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "Efficient Shallow WaveNet Vocoder using Multiple Samples Output based on Laplacian Distribution and Linear Prediction", Accepted for ICASSP 2020.
Structures of the scripts are heavily influenced by a WaveNet implementation package in [https://github.com/kan-bayashi/PytorchWaveNetVocoder]
##
For further information/troubleshooting, please inquire to:
Patrick Lumban Tobing (Patrick)
Graduate School of Informatics, Nagoya University
patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp
近期下载者:
相关文件:
收藏者: