MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS

May 26, 2013 Icassp

In this work, the authors focus on multilingual acoustic modelling.

Dataset:

GlobalPhone corpus using seven languages from three different language families: Germanic, Romance, and Slavic. The languages used are: Czech, French, German, Polish, Brazilian Portuguese, Russian and Costa Rican Spanish. Each language has roughly 20 hours of speech for training and two hours each for development and evaluation sets, from a total of about 100 speakers. The detailed statistics for each of the languages is shown in Table 1.

Main idea: To use DNN features for multiple languages in the DNN-HMM setup. In this work they re-train the output layer for each individual language (in a sequential manner, table 2) separately sharing all the hidden layers.