What Do SWE Agents Look For? An Empirical Study of Semantic and Symbolic Context Usage in Repository Level Code Generation
This is the repository of evaluation framework and source code for "What Do SWE Agents Look For? An Empirical Study of Semantic and Symbolic Context Usage in Repository Level Code Generation".
This repository contains tools and scripts for evaluating different context retrieval strategies in code completion tasks using the RepoClassBench dataset and OpenHands framework.
The evaluation framework tests various approaches for providing relevant code context to language models when implementing Java classes. It compares different retrieval strategies including:
- Vanilla: No additional context provided
- Semantic: Context retrieved using semantic embeddings
- Symbolic: Context retrieved using symbolic code analysis
- Combined: Hybrid approach combining semantic and symbolic retrieval
- Agentic: Agentic tool-based context retrieval
- OpenHands: Install and configure OpenHands at
/home/azureuser(or update paths in scripts): for more information, visit https://github.com/OpenHands/OpenHands/blob/main/Development.md - RepoClassBench: Dataset should be present in
repoclassbench/directory, install requirements fromrepoclassbench/requirements.txt - API Keys: Create a
credentials.jsonfile in the parent directory with:{ "EMBED_API_KEY": "your-embed-api-key", "LLM_API_KEY": "your-llm-api-key" } - Python Dependencies:
- pandas
- numpy
- tiktoken
- requests
- javalang
- langchain
- tqdm
The main evaluation script (eval_script.py) demonstrates how to run different evaluators.
Important: Before importing the evaluators, you need to set up the Python path correctly, the beginning of eval_script.py shows an example which may need to be edited depending on your setup.
- Caching: Use embedding caches to avoid re-computing embeddings for the same repository
- Context Paths: Pre-compute context and save to disk for faster repeated evaluations
- Parallel Execution: Adjust
num_workersbased on available resources - Memory Management: The framework includes cleanup to prevent memory leaks with large repositories
- Iterative Development: Start with a small
task_listto validate configuration before full runs