Transformer
-

Reverse-engineering large languages models’ computation circuit to understand their decision-making processes
7 min read -

Implementing CPTR (CaPtion TransformeR) from scratch with PyTorch
33 min read -
![Vision Transformer architecture, quoted from [1].](https://towardsdatascience.com/wp-content/uploads/2021/07/1FReqmx_EKhrrJtLPV75kJw.png)
On the differences between Transformer and CNN, why Transformer matters, and what its weaknesses are.
24 min read -

Introduction to NMT with sequence-to-sequence architecture and the Transformers
19 min read