Category: Transformers
-
I look at some features in a pre-trained sparse autoencoder trained on an MLP layer in a TinyStories Model. I look at the features through a statistical lens and also just examine a few of them hand by hand.
-
I (mostly) figure out how a 2-digit subtraction transformer determines the difference between two numbers.
-
I examine a toy transformer trained to perform two-digit subtraction and find that it learns a simple linear classification algorithm to predict whether the output is positive or negative.
-
I train a 1-layer transformer to do 2-digit subtraction and find some interesting patterns in weights and activations.