Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

14 june 2018

In this paper the authors use seq-2-seq transformer based architecture using BPE units for the multilingual ASR task.

The authors experiment with :

No language information
Language information during the training
1. Adding a LANG token in the starting of each sub word
2. Adding a LANG token in the end of each sub word.
  1. Gives better results but not much compared to adding a token in the starting.
Language info during training + testing.
1. Gives the best result. Because it alleviates the confusion between languages during the testing.