Here we talk about features - those extracted from large-scale, sparse, monosematic, highly linearized MLP-activation-MLP neural networks aka SAE, Sparse AutoEncoder.

I think for the background knowledge of SAE, Circuits threads are more recommended.

Which kind of problems we can solve?

Why SAE + Alignment

How many SAEs are available?

The current situation is significant: there are now 5 or 6 Sparse Autoencoders (SAEs) available for research purposes. If we focus on these ‘family theories,’ our primary resources are limited to the LLaMA-Scope and Gemma-Scope frameworks, with the former offering an 8B model and the latter providing both 2B and 9B versions.