march 2012

The authors train a 4-layer MLP. The last layer outputs the phonemes (list of phonemes from IPA) probability. The first 3 layer are shared and the last layer is language specific.

  1. The authors first train the 4-layered MLP on a high resource language.
  2. And then use the first 3 layers as fixed and re train the last layer on the low resource language.
  3. Lastly the authors re train the whole network on low resource language using the weights from step 2.

Dataset: 3 languages are used English, German and, Spanish from the Callhome corpora.

Their method gives better features instead of training the acoustic model.