Interpretability
-
Reverse-engineering large languages models’ computation circuit to understand their decision-making processes
7 min read -
Disentangle features in complex Neural Network with superpositions
6 min read -
Create more interpretable models by using concise, highly predictive features, automatically engineered based on arithmetic…
41 min read -
A gentle introduction to mechanistic interpretability through simple algorithmic examples
9 min read -
These are terms commonly used to describe the transparency of a model, but what do…
7 min read -
Mechanistic Interpretability on prediction of repeated tokens
8 min read -
Patient rule induction method finds 35% better segments than previously reported
9 min read -
From the OmniXAI, Shapash, and Dalex interpretability packages to the Boruta, Relief, and Random Forest…
23 min read -
Language models (LLMs) have revolutionized the field of natural language processing (NLP) over the last…
8 min read -
Another interpretability tool for your toolbox
7 min read