Overshadowing Attribution Visualizer

An interactive web application for visualizing how prompt tokens influence LLM outputs through attention scaling interventions. Identifies overshadower tokens that encourage hallucinations and overshadowee tokens that get suppressed.

Usage

Installation

# Clone or copy the project files
cd overshadow_vis

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Start App

python app.py --port 8888

Then open your browser to http://localhost:8888

Workflow

Load Model: Click "Load Model" to load OLMo-2-7B (requires ~14GB GPU memory)
Enter Prompt: Type your prompt in the text area
Generate: Click "Generate Response" to get model output
Select Tokens: Click on response tokens to select a sequence for analysis
Compute Attributions: Click to run attention scaling interventions
Analyze:
- View colored prompt tokens (red=overshadower, blue=overshadowee)
- Hover over tokens to see detailed impact metrics

Features

Greedy Decoding (Temperature=0): Generates deterministic model outputs
Position Attention Scaling: Applies interventions on layers 22-28
Interactive Token Selection: Click to select output sequences for analysis
Real-time Attribution: Computes log probability changes under interventions
Visual Classification:
- 🔴 Red = Overshadower (scaling up increases hallucinated sequence probability)
- 🔵 Blue = Overshadowee (scaling up decreases hallucinated sequence probability)
Detailed Tooltips: Hover to see per-scale-factor impacts

Methodology

Overshadower

A token is classified as an overshadower when:

Scaling DOWN (e.g. x 0.3) its positional attention decreases the highlighted sequence's log probability
Scaling UP (e.g. x3.0) its positional attention increases the highlighted sequence's log probability

This means the token is "encouraging" the hallucinated output.

Overshadowee

A token is classified as an overshadowee when:

Scaling DOWN its positional attention increases the highlighted sequence's log probability
Scaling UP its positional attention decreases the highlighted sequence's log probability

This means the token possibly contains information that is being suppressed in favor of the hallucinated output.

Attention Scaling Intervention

The intervention scales the hidden states at specific positions before they enter the attention mechanism:

scaled_hidden = hidden_states * scale_mask.view(1, -1, 1)

Where scale_mask[pos] = scale_factor for target positions.

Sequence Log Probability

For a highlighted sequence of tokens, we compute:

log P(seq|prompt) = Σ log P(token_i | prompt, tokens_1..i-1)

The delta is: Δ = log P(seq|prompt, intervention) - log P(seq|prompt, baseline)

Requirements

Python 3.9+
CUDA-capable GPU with ~14GB memory (for OLMo-2-7B)
PyTorch 2.0+
Transformers 4.35+

Example Usage

Prompt: "A famous rock musician from North Korea is named"

Expected Output: "Kim Jong-un ..."

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
templates		templates
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overshadowing Attribution Visualizer

Usage

Installation

Start App

Workflow

Features

Methodology

Overshadower

Overshadowee

Attention Scaling Intervention

Sequence Log Probability

Requirements

Example Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overshadowing Attribution Visualizer

Usage

Installation

Start App

Workflow

Features

Methodology

Overshadower

Overshadowee

Attention Scaling Intervention

Sequence Log Probability

Requirements

Example Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages