Author: Alex Dremov

Understanding Flash Attention: Writing the Algorithm from Scratch in Triton
Artificial Intelligence

Find out how Flash Attention works. Afterward, we’ll refine our understanding by writing a GPU…

Alex Dremov

January 15, 2025

7 min read
Speed Up PyTorch With Custom Kernels. But It Gets Progressively Darker
Artificial Intelligence

We’ll begin with torch.compile, move on to writing a custom Triton kernel, and finally dive…

Alex Dremov

January 9, 2025

5 min read
Simple Ways to Speed Up Your PyTorch Model Training
Machine Learning

If all machine learning engineers want one thing, it’s faster model training - maybe after good test…

Alex Dremov

May 28, 2024

12 min read