8 JUL 2020: Facebook

In this paper the authors experiment with 51 languages, first of its kind in the multilingual ASR family. They train different models:

  1. Monolingual
  2. Joint.
  3. Joint + language ID.
  4. Joint + 6 clusters of 51 languages: Best results.

They use seq-2-seq architecture

They train using sentence pieces model for the output representations.

Sampling frequency helps when there is just a balance between uniform and natural frequency. neither of the extreme.

They use curriculum learning as explained in section 3.3.1