Author: Alex Dremov
-

Find out how Flash Attention works. Afterward, we’ll refine our understanding by writing a GPU…
7 min read -

We’ll begin with torch.compile, move on to writing a custom Triton kernel, and finally dive…
5 min read -

If all machine learning engineers want one thing, it’s faster model training - maybe after good test…
12 min read