Arctic voices
The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research.
The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl), female (slt) speakers (both experinced voice talent) and few other accented speakers.
To run one of these voices, cd egs/slt_arctic/s1
and follow the below steps:
Setting up
The first step is to run setup as it creates directories and downloads the required training data files.
To see the list of available voices, run:
./01_setup.sh
The next steps demonstrate on how to setup slt arctic voice.
- To run on short data(about 50 utterances for training)
./01_setup.sh slt_arctic_demo
- To run on full data(about 1000 sentences for training)
./01_setup.sh slt_arctic_full
It also creates a global config file: conf/global_settings.cfg
, where default settings are stored.
Prepare config files
At this point, we have to prepare two config files to train DNN models - Acoustic Model - Duration Model
To prepare config files:
./02_prepare_conf_files.sh conf/global_settings.cfg
Four config files will be generated: two for training, and two for testing.
Train duration model
To train duration model:
./03_train_duration_model.sh <path_to_duration_conf_file>
Train acoustic model
To train acoustic model:
./04_train_acoustic_model.sh <path_to_acoustic_conf_file>
Synthesize speech
To synthesize speech:
./05_run_merlin.sh <path_to_test_dur_conf_file> <path_to_test_synth_conf_file>