Author: Shirley Li
-

Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
9 min read -

Mastering the art of fine-tuning: Learnings for training your own LLMs.
22 min read -

Scaling from 117M to 175B: Insights into GPT-2 and GPT-3.
10 min read -

Understanding the Evolution of ChatGPT: Part 1-An In-Depth Look at GPT-1 and What Inspired It
Deep LearningTracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’s LLMs
11 min read