
[1. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning (1)](

[2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (1)](

[3. XLNet: Generalized Autoregressive Pretraining for Language Understanding (1)](

[4. RoBERTa: A Robustly Optimized BERT Pretraining Approach (1)](

[5. Improving Language Understanding by Generative Pre-Traininfog (GPT1/GPT-1) (1)](

6. Language Models are Unsupervised Multitask Learners (1)

[7 Language Models are Few-Shot Learners (1)](

8 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (1)

[9. XLNet: Generalized Autoregressive Pretraining for Language Understanding (1)](