11 Sep 2019
Google
In this paper, the authors experiment on the RNN-T based ASR models in the multilingual settings.
- Multilingual RNN-T (A0)
- The authors use it as a baseline.
- With a language ID (A1)
- The authors condition the A0 with a one-hot language ID vector.
- Imbalance dataset (A2)
- Using Sampling the authors try to reduce the data/prior bias in the overall dataset.
- A0+sampling.
- A1 +sampling
- A1+sampling+60k The authors also stop the sampling after 60k steps.
- Using language adapters.
- A1+adapters. The authors use the adapter modules trained for each language separately.
- Baselines:
- Monolingual
- CTC
- RNN-T
- The authors observe that conditioning on a language ID is an essential step in a multilingual settings for a performance boost on the languages with smaller footprint in the training dataset.
- The authors also observe that sampling also degrades the performance in with and without both the cases.
- The authors observe that adapter modules do not provide much of a boost overall but are important for the languages with smaller training datasets.
- Finally, Multilingual is a better choice that monolingual systems especially in the case of low resource languages.
For a detailed comparison of the above models read the Table 2 and 3 in the paper.