Large Language Models
-

Understanding all versions of flash attention through a triton implementation
16 min read -

Build Your Own AI Coding Assistant in JupyterLab with Ollama and Hugging Face
Artificial IntelligenceA step-by-step guide to creating a local coding assistant without sending your data to the…
8 min read -

How to capitalize on ModernBERT’s extended context window to build a token-level classifier for hallucination…
10 min read -

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation
Machine LearningIntroducing the pyramid search approach
17 min read -

And how to order a cheeseburger with an LLM
28 min read -

Exploring the sources of randomness in GPT-4o from the known and controllable to the opaque…
14 min read -

Understanding hallucinations as emergent cognitive effects of the training pipeline
11 min read -

For a long time, one of the common ways to start new Node.js projects was…
7 min read -

While building my own LLM-based application, I found many prompt engineering guides, but few equivalent…
8 min read -

Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
9 min read