Skip to content

my1100/overshadow_vis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Overshadowing Attribution Visualizer

An interactive web application for visualizing how prompt tokens influence LLM outputs through attention scaling interventions. Identifies overshadower tokens that encourage hallucinations and overshadowee tokens that get suppressed.

Usage

Installation

# Clone or copy the project files
cd overshadow_vis

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Start App

python app.py --port 8888

Then open your browser to http://localhost:8888

Workflow

  1. Load Model: Click "Load Model" to load OLMo-2-7B (requires ~14GB GPU memory)
  2. Enter Prompt: Type your prompt in the text area
  3. Generate: Click "Generate Response" to get model output
  4. Select Tokens: Click on response tokens to select a sequence for analysis
  5. Compute Attributions: Click to run attention scaling interventions
  6. Analyze:
    • View colored prompt tokens (red=overshadower, blue=overshadowee)
    • Hover over tokens to see detailed impact metrics

Features

  • Greedy Decoding (Temperature=0): Generates deterministic model outputs
  • Position Attention Scaling: Applies interventions on layers 22-28
  • Interactive Token Selection: Click to select output sequences for analysis
  • Real-time Attribution: Computes log probability changes under interventions
  • Visual Classification:
    • 🔴 Red = Overshadower (scaling up increases hallucinated sequence probability)
    • 🔵 Blue = Overshadowee (scaling up decreases hallucinated sequence probability)
  • Detailed Tooltips: Hover to see per-scale-factor impacts

Methodology

Overshadower

A token is classified as an overshadower when:

  • Scaling DOWN (e.g. x 0.3) its positional attention decreases the highlighted sequence's log probability
  • Scaling UP (e.g. x3.0) its positional attention increases the highlighted sequence's log probability

This means the token is "encouraging" the hallucinated output.

Overshadowee

A token is classified as an overshadowee when:

  • Scaling DOWN its positional attention increases the highlighted sequence's log probability
  • Scaling UP its positional attention decreases the highlighted sequence's log probability

This means the token possibly contains information that is being suppressed in favor of the hallucinated output.

Attention Scaling Intervention

The intervention scales the hidden states at specific positions before they enter the attention mechanism:

scaled_hidden = hidden_states * scale_mask.view(1, -1, 1)

Where scale_mask[pos] = scale_factor for target positions.

Sequence Log Probability

For a highlighted sequence of tokens, we compute:

log P(seq|prompt) = Σ log P(token_i | prompt, tokens_1..i-1)

The delta is: Δ = log P(seq|prompt, intervention) - log P(seq|prompt, baseline)

Requirements

  • Python 3.9+
  • CUDA-capable GPU with ~14GB memory (for OLMo-2-7B)
  • PyTorch 2.0+
  • Transformers 4.35+

Example Usage

Prompt: "A famous rock musician from North Korea is named"

Expected Output: "Kim Jong-un ..."

About

An interactive web application for visualizing how prompt tokens influence LLM hallucinations through attention scaling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors