interpreting GPT: the logit lens
Source: nostalgebraist Date: unknown
Summary
The logit lens popularized the simple but fruitful move of directly unembedding intermediate residual states to inspect what later layers appear to be refining. Its very brittleness is part of why tuned lenses later mattered.
Why it matters here
This is the starting point for “read” interfaces into internal computation. It is important both as a useful diagnostic and as a cautionary example of how basis dependence can make naive decoding misleading.