What Do SWE Agents Look For? An Empirical Study of Semantic and Symbolic Context Usage in Repository Level Code Generation

This is the repository of evaluation framework and source code for "What Do SWE Agents Look For? An Empirical Study of Semantic and Symbolic Context Usage in Repository Level Code Generation".

This repository contains tools and scripts for evaluating different context retrieval strategies in code completion tasks using the RepoClassBench dataset and OpenHands framework.

Overview

The evaluation framework tests various approaches for providing relevant code context to language models when implementing Java classes. It compares different retrieval strategies including:

Vanilla: No additional context provided
Semantic: Context retrieved using semantic embeddings
Symbolic: Context retrieved using symbolic code analysis
Combined: Hybrid approach combining semantic and symbolic retrieval
Agentic: Agentic tool-based context retrieval

Prerequisites

OpenHands: Install and configure OpenHands at /home/azureuser (or update paths in scripts): for more information, visit https://github.com/OpenHands/OpenHands/blob/main/Development.md
RepoClassBench: Dataset should be present in repoclassbench/ directory, install requirements from repoclassbench/requirements.txt

API Keys: Create a credentials.json file in the parent directory with:

{
  "EMBED_API_KEY": "your-embed-api-key",
  "LLM_API_KEY": "your-llm-api-key"
}

Python Dependencies:
- pandas
- numpy
- tiktoken
- requests
- javalang
- langchain
- tqdm

Quick Start

The main evaluation script (eval_script.py) demonstrates how to run different evaluators.

Important: Before importing the evaluators, you need to set up the Python path correctly, the beginning of eval_script.py shows an example which may need to be edited depending on your setup.

Best Practices

Caching: Use embedding caches to avoid re-computing embeddings for the same repository
Context Paths: Pre-compute context and save to disk for faster repeated evaluations
Parallel Execution: Adjust num_workers based on available resources
Memory Management: The framework includes cleanup to prevent memory leaks with large repositories
Iterative Development: Start with a small task_list to validate configuration before full runs

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
repoclassbench		repoclassbench
training_data_generation		training_data_generation
LICENSE		LICENSE
README.md		README.md
agentic_evaluator.py		agentic_evaluator.py
combined_evaluator.py		combined_evaluator.py
default_evaluator.py		default_evaluator.py
eval_script.py		eval_script.py
eval_tools.py		eval_tools.py
filtration_prompt.txt		filtration_prompt.txt
openhands_init.sh		openhands_init.sh
semantic_evaluator.py		semantic_evaluator.py
symbolic_evaluator.py		symbolic_evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Do SWE Agents Look For? An Empirical Study of Semantic and Symbolic Context Usage in Repository Level Code Generation

Overview

Prerequisites

Quick Start

Best Practices

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What Do SWE Agents Look For? An Empirical Study of Semantic and Symbolic Context Usage in Repository Level Code Generation

Overview

Prerequisites

Quick Start

Best Practices

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages