This repository provides empirical validation of non-identifiability in persona steering vectors for language models.
uv syncAuthenticate with HuggingFace to access both models:
huggingface-cli loginConfiguration files are located in the config/ directory:
prompts.json: Persona prompts for all traitsconfig.yml: Model configurations
Available Traits: formality, politeness, sentiment, truthfulness, and agreeableness. You can specify any combination of these traits.
Test orthogonal component irrelevance:
python src/experiments/test_orthogonal.py --traits formality politeness sentiment truthfulness agreeableness --n_seeds 10 --model Qwen/Qwen2.5-3B-InstructTest alpha sweep (varying steering strength):
python src/experiments/alpha_sweep.py --traits formality politeness sentiment truthfulness agreeableness --alphas 0.0 0.5 1.0 2.0 --n_seeds 10 --model Qwen/Qwen2.5-3B-InstructTest multi-environment validation:
python src/experiments/multi_environment_validation.py --traits formality politeness sentiment truthfulness agreeableness --model Qwen/Qwen2.5-3B-InstructTest logit distance equivalence:
python src/experiments/logit_distance_equivalence_test.py --traits formality politeness sentiment truthfulness agreeablenessTest vector equivalence (non-orthogonal):
python src/experiments/test_vector_equivalence.py --models Qwen/Qwen2.5-3B-Instruct meta-llama/Llama-3.1-8B-Instruct --traits formality politeness sentiment truthfulness agreeablenessMeasure null-space dimensionality:
python src/experiments/nullspace_dimensionality.pyTest null-space spanning (subspace equivalence):
python src/experiments/nullspace_spanning.py --trait formality --n_individual_checks 50 --n_subspace_samples 5.
├── config/
│ ├── prompts.json
│ ├── config.yml
│ └── style.yaml
├── src/
│ └── experiments/
│ ├── persona_vector_experiment.py
│ ├── test_orthogonal.py
│ ├── alpha_sweep.py
│ ├── multi_environment_validation.py
│ ├── logit_distance_equivalence_test.py
│ ├── test_vector_equivalence.py
│ ├── nullspace_dimensionality.py
│ └── nullspace_spanning.py
└── data/
If you use this repository in your research, please cite:
@article{venkatesh2026non,
title={On the Non-Identifiability of Steering Vectors in Large Language Models},
author={Venkatesh, Sohan and Mahendran Kurapath, Ashish},
journal={arXiv e-prints},
pages={arXiv--2602},
year={2026}
}