Interpretability | Towards Data Science

Circuit Tracing: A Step Closer to Understanding Large Language Models

Machine Learning

Reverse-engineering large languages models’ computation circuit to understand their decision-making processes

Sudheer Singh

April 8, 2025

7 min read

Image by author: features learned by encoder

Sparse AutoEncoder: from Superposition to interpretable features

Artificial Intelligence

Disentangle features in complex Neural Network with superpositions

Shuyang

February 1, 2025

6 min read

FormulaFeatures: A Tool to Generate Highly Predictive Features for Interpretable Models

Create more interpretable models by using concise, highly predictive features, automatically engineered based on arithmetic…

W Brett Kennedy

October 6, 2024

41 min read

How Tiny Neural Networks Represent Basic Functions

Machine Learning

A gentle introduction to mechanistic interpretability through simple algorithmic examples

Amir Taubenfeld

September 10, 2024

9 min read

Model Insights. Screenshot by author from Xplainable.

Explainability, Interpretability and Observability in Machine Learning

Machine Learning

These are terms commonly used to describe the transparency of a model, but what do…

Jason Zhong

June 30, 2024

7 min read

Image for author: Log probs on correct tokens

How to Interpret GPT2-Small

Artificial Intelligence

Mechanistic Interpretability on prediction of repeated tokens

Shuyang

March 22, 2024

8 min read

Find Unusual Segments in Your Data with Subgroup Discovery

Machine Learning

Patient rule induction method finds 35% better segments than previously reported

Vadim Arzamasov

February 2, 2024

9 min read

A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code)

Data Science

From the OmniXAI, Shapash, and Dalex interpretability packages to the Boruta, Relief, and Random Forest…

Dr. Theophano Mitsa

December 19, 2023

23 min read

Why and How to Achieve Longer Context Windows for LLMs

Artificial Intelligence

Language models (LLMs) have revolutionized the field of natural language processing (NLP) over the last…

Davide Ghilardi

October 13, 2023

8 min read

Deep Dive into PFI for Model Interpretability

Data Science

Another interpretability tool for your toolbox

Tiago Toledo Jr.

July 20, 2023

7 min read