11 Sep 2019

Google

In this paper, the authors experiment on the RNN-T based ASR models in the multilingual settings.

  1. Multilingual RNN-T (A0)
    1. The authors use it as a baseline.
  2. With a language ID (A1)
    1. The authors condition the A0 with a one-hot language ID vector.
  3. Imbalance dataset (A2)
    1. Using Sampling the authors try to reduce the data/prior bias in the overall dataset.
      1. A0+sampling.
      2. A1 +sampling
      3. A1+sampling+60k The authors also stop the sampling after 60k steps.
  4. Using language adapters.
    1. A1+adapters. The authors use the adapter modules trained for each language separately.
  5. Baselines:
    1. Monolingual
      1. CTC
      2. RNN-T

For a detailed comparison of the above models read the Table 2 and 3 in the paper.