Large Language Models
-
Understanding all versions of flash attention through a triton implementation
16 min read -
Build Your Own AI Coding Assistant in JupyterLab with Ollama and Hugging Face
Artificial IntelligenceA step-by-step guide to creating a local coding assistant without sending your data to the…
8 min read -
How to capitalize on ModernBERT’s extended context window to build a token-level classifier for hallucination…
10 min read -
Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation
Machine LearningIntroducing the pyramid search approach
17 min read -
And how to order a cheeseburger with an LLM
28 min read -
Exploring the sources of randomness in GPT-4o from the known and controllable to the opaque…
14 min read -
Understanding hallucinations as emergent cognitive effects of the training pipeline
11 min read -
For a long time, one of the common ways to start new Node.js projects was…
7 min read -
While building my own LLM-based application, I found many prompt engineering guides, but few equivalent…
8 min read -
Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
9 min read