Contents

Reading Happy-LLM Notes: Preface

The emergence of ChatGPT changed people’s perception of AI and brought about a transformative phase in the field of NLP. LLMs, represented by GPT, have become the mainstream research focus in NLP and even the broader field of AI.


LLMs are actually a derivative achievement of the classic research method of Pretrained Language Models (PLMs) in the field of NLP. The field of NLP focuses on the processing, understanding, and generation of natural language text written by humans, and has undergone multiple transformations:


graph LR
    A[Symbolic Stage] --> B[Statistical Learning]
    B --> C[Deep Learning]
    C --> D[Pre-training Model]
    D --> E[Large Model Stage]
    
    D --> F[PLM
GPT & BERT] E --> G[LLM
ChatGPT & GPT-4] F --> H[Attention Mechanism
Pre-training & Fine-tuning] G --> I[Instruction Fine-tuning
Human Feedback RL] style A fill:#e1f5fe style B fill:#e8f5e8 style C fill:#fff3e0 style D fill:#f3e5f5 style E fill:#ffebee style F fill:#e3f2fd style G fill:#e8f5e8
  • Symbolic stageStatistical learning stageDeep learning stagePre-training model stageCurrent large model stage

PLMs represented by GPT and BERT are the core research achievements of the previous stage of NLP, with:

  • Attention mechanism as the model architecture
  • Pre-training and fine-tuning approach
  • Self-supervised pre-training on large amounts of unsupervised text

However, traditional PLMs still depend on a certain amount of supervised data for downstream task fine-tuning, and their performance in natural language generation tasks is still not satisfactory. The performance of NLP systems is still far from the expected general artificial intelligence, with a significant gap.

LLMs are a breakthrough achievement realized by:

  • Expanding model parameters and pre-training data scale
  • Introducing instruction fine-tuning and human feedback reinforcement learning

graph TB
    subgraph "PLM Era"
        A[Traditional PLM] --> B[Supervised Data Required]
        B --> C[Task-Specific Fine-tuning]
        C --> D[Limited Generation Quality]
    end
    
    subgraph "LLM Era"
        E[Large Language Model] --> F[Few-Shot Learning]
        F --> G[Instruction Following]
        G --> H[High-Quality Generation]
        H --> I[General AI Capabilities]
    end
    
    A --> E
    
    style A fill:#ffcdd2
    style E fill:#c8e6c9
    style I fill:#dcedc8


Compared with traditional PLMs, LLMs have:

  • Emergent capabilities
  • Strong contextual learning ability
  • Instruction understanding ability
  • Text generation ability

At the large model stage, NLP researchers can to some extent abandon a large amount of supervised data annotation work. Through providing a small amount of supervised examples, LLMs can achieve performance on specified downstream tasks that is comparable to that of large-scale fine-tuned PLMs.

The strong instruction understanding ability and text generation ability enable LLMs to directly, efficiently, and accurately respond to user instructions, thus truly approaching the goal of general artificial intelligence.

The breakthrough progress in LLMs has ignited research enthusiasm in the field of NLP and the entire field of artificial intelligence. Universities, research institutes, major corporations, and even many traditional industry enterprises have joined the wave of LLM research.

Since 2023, there have been numerous achievements in LLMs:

  • Model performance continuously breaking through the upper limit
  • From ChatGPT to GPT-4
  • More powerful and customized models:
    • Reasoning large models (e.g., DeepSeek-R1)
    • Multimodal large models (e.g., Qwen-VL)

LLM applications have emerged, bringing innovative applications that:

  • ✅ Enhance actual productivity
  • ✅ Empower users’ daily lives

From the “hundred models battle” to the “Agent Year”, the research on LLMs may have reached a stable structure, but the research on LLMs is still in full swing.

It is certain that in the not-too-distant future, LLMs and applications based on LLMs will definitely become infrastructure in people’s lives, closely intertwined with everyone’s life, study, and work.

Related Content