Note, that this recipe is not updated long time and could be outdated!
Got it.

HTS Engine

The HTS Engine is software to synthesize speech waveform from HMMs trained by the HMM-based speech synthesis system (HTS).
This page describes, how to set up necessary HTS engine for demonstation needs on 64-bit Ubuntu 16.04 workstation.

Setup steps

Ubuntu packages

  1. Set up necessary packages:

    sudo apt-get install gcc make aplay

hts_engine API

  1. Open download hts_engine API source code:

    wget https://downloads.sourceforge.net/hts-engine/hts_engine_API-1.10.tar.gz
  2. Extract package:

    tar xvf hts_engine_API-1.10.tar.gz
  3. Compile it:

    cd hts_engine_API-1.10
    ./configure
    make
    cd ..

Flite+hts_engine

  1. Open download hts_engine API source code:

    wget https://downloads.sourceforge.net/hts-engine/flite%2Bhts_engine-1.07.tar.gz
  2. Extract package:

    tar xvf flite+hts_engine-1.07.tar.gz
  3. Compile it:

    cd flite+hts_engine-1.07
    ./configure
    make
    cd ..

CMU_ARCTIC English voice

  1. Download voice package:

    wget https://downloads.sourceforge.net/hts-engine/hts_voice_cmu_us_arctic_slt-1.06.tar.gz
  2. Extract voice data:

    tar xvf hts_voice_cmu_us_arctic_slt-1.06.tar.gz

Bash test script

  1. Create file flite_hts_engine.sh with following content:

    #!/bin/bash

    # NOTE: flite+hts_engine only reads the FIRST line of a text file.
    # So, to speak a multi-line text, we must read that text file
    # one line at a test into another file ( such as ./.line.txt),
    # and then repeatedly call flite+hts_engine
    # to read (each new line) in ./.line.txt

    # First, always flush the FIFO buffer
    dd if=./.FIFO.wav iflag=nonblock of=/dev/null >/dev/null 2>&1

    # Define linebreak as delimiter for items
    IFS=$'\n'
    # read input line by line
    while read -r line
    do
      if [[ -n $line ]]; then
        for sentence in $(echo $line|sed 's/\. \([[:upper:]]\)/.\n\1/g'); do
          echo "$sentence" > ./.line.txt
          #   Here are relevant flite_hts_engine options explained:
          #   -m  : the location of the voice; you must have this option
          #   -s  : the sound sample rate in Hz (48000 should work; or 44100)
          #   -r  : speech speed rate (0.5=half speed, 1.0 is default, 2.0=twice speed)
          #   -fm : raise the pitch with "1", "2.3", or lower it with "-1" "-2", etc.
          #   -o  : the output; this must be a .wav filename
          #   the input must be a text file

          # And this is how you call flite_hts_engine (reading .line.txt):
          ./flite+hts_engine-1.07/bin/flite_hts_engine \
          -m ./hts_voice_cmu_us_arctic_slt-1.06/cmu_us_arctic_slt.htsvoice \
          -s 44100 \
          -r 1.2 \
          -o ./.FIFO.wav \
          ./.line.txt

          # Set up aplay to play when .FIFO.wav gets some sound
          # DON'T forget the '&' at the end of the line!
          aplay -q ./.FIFO.wav &

          # IMPORTANT: call 'wait' to make sure that both aplay and flite_hts_engine are completely done before continuing
          wait
        done
      fi
    done < "${1:-/dev/stdin}"
  2. Make file executable:

    chmod +x flite_hts_engine.sh
  3. Test engine by executing flite_hts_engine.sh file:

    ./flite_hts_engine.sh

    and enter text.
    Press Ctrl+c (or Ctrl+d) to finish script.

  4. flite_hts_engine.sh can take one parameter where you can pass text file as an input for it e.g.:

    flite_hts_engine.sh test.txt

    will read text from test.txt file.


  
Tags Valoda English
Created by Valdis Vītoliņš on 2018-01-10 14:05
Last modified by Valdis Vītoliņš on 2021-05-09 15:33
 
Xwiki Powered
Creative Commons Attribution 3.0 Unported License