interpreting GPT: the logit lens

Source: nostalgebraist Date: unknown

Summary

The logit lens popularized the simple but fruitful move of directly unembedding intermediate residual states to inspect what later layers appear to be refining. Its very brittleness is part of why tuned lenses later mattered.

Why it matters here

This is the starting point for “read” interfaces into internal computation. It is important both as a useful diagnostic and as a cautionary example of how basis dependence can make naive decoding misleading.

Agent Harness Wiki

Browse

interpreting GPT: the logit lens

Summary

Why it matters here

Graph View

Table of Contents

Backlinks