Papers

[1. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning (1)](https://jointphd.notion.site/1-A-Unified-Architecture-for-Natural-Language-Processing-Deep-Neural-Networks-with-Multitask-Learn-58e09bb6ca864c2d9e06058c6dc375ff)

[2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (1)](https://jointphd.notion.site/2-BERT-Pre-training-of-Deep-Bidirectional-Transformers-for-Language-Understanding-1-2d51890f19d941649bee00f7501cedb7)

[3. XLNet: Generalized Autoregressive Pretraining for Language Understanding (1)](https://jointphd.notion.site/3-XLNet-Generalized-Autoregressive-Pretraining-for-Language-Understanding-1-f501e182ee0d449da41be71a45cbff7c)

[4. RoBERTa: A Robustly Optimized BERT Pretraining Approach (1)](https://jointphd.notion.site/4-RoBERTa-A-Robustly-Optimized-BERT-Pretraining-Approach-1-05a771aa499345fa8792c7b0f8bdfd33)

[5. Improving Language Understanding by Generative Pre-Traininfog (GPT1/GPT-1) (1)](https://jointphd.notion.site/5-Improving-Language-Understanding-by-Generative-Pre-Traininfog-GPT1-GPT-1-1-363239536f114a47b83c8e855d9749ea)

6. Language Models are Unsupervised Multitask Learners (1)

[7 Language Models are Few-Shot Learners (1)](https://jointphd.notion.site/7-Language-Models-are-Few-Shot-Learners-1-fdc55d72c3e84f80b827053d58f00a05)

8 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (1)

[9. XLNet: Generalized Autoregressive Pretraining for Language Understanding (1)](https://jointphd.notion.site/9-XLNet-Generalized-Autoregressive-Pretraining-for-Language-Understanding-1-49b92c72b7de41ec928ab1eb0a777f2b)