Skip to content

DynBench — Dynamic Question and Query Creation for Generating new Benchmarking Datasets

License

Notifications You must be signed in to change notification settings

WSE-research/DynBenchContainer

Repository files navigation

DynBench — Dynamic Question and Query Creation for Generating new Benchmarking Datasets

A FastAPI application for transforming questions and SPARQL queries over Wikidata by replacing entities with semantically similar alternatives.



Features

  • Multi-language Support: Handles questions in English, German, French, Russian, and Ukrainian

  • Complexity Levels: Generates variations at different difficulty levels (easy, normal, hard, random)

  • LLM Integration: Uses an LLM to rephrase questions after entity replacement

  • PageRank Scoring: Ranks substitutes based on entity popularity

  • REST API: Exposes /transform endpoint for batch processing

  • Knowledge Graph: Uses (only) the Wikidata Knowledge Graph to find semantically similar entities

  • MongoDB Caching: Caches SPARQL results for performance

Environment Variables

Required in .env:

  • MongoDB credentials: MONGO_HOST, MONGO_USER, MONGO_PASS

  • LLM service credentials: LLM_URL, KEY (empty string for local instance)

  • Wikidata SPARQL endpoint: WIKIDATA_ENDPOINT, WIKIDATA_AGENT (agent string is empty for the basic endpoint)

A sample .env file is provided in the root directory. Copy it to .env and fill in the missing values.

Setup

DynBench uses the PageRank of Wikidata URIs (provided on the https://danker.s3.amazonaws.com/index.html page) and the NLTK library. To download the required resources, run setup.py before using the tool. Note that the PageRank file is 726 MB in size.

Usage

Requirements for Python development

Requirements are defined in requirements.txt. Install them with:

pip install -r requirements.txt

CLI

Run the Python application

python3 dynbench.py \
  --query "SELECT ?answer WHERE { wd:Q14452 wdt:P17 ?answer }" \
  --question "Which country does the famous Easter island belong to?" \
  --language en \
  --complexity normal \
  --model "mistral-small:latest"

Start uvicorn webservice

uvicorn dynbench:app --reload --host 0.0.0.0 --port 8000

Docker

Build the Docker image

docker build -t dynbench .

Start the Docker container

docker run \
  --add-host=host.docker.internal:host-gateway \
  -e MONGO_HOST="mongodb://host.docker.internal:27017" \
  -e LLM_URL="http://host.docker.internal:11434/api/generate" \
  -p 8000:8000 \
  dynbench

Thereafter, the API will be available at http://localhost:8000.

Contribute

We are happy to receive your contributions. Please create a pull request or an issue. As this tool is published under the MIT license, feel free to fork it and use it in your own projects.

Disclaimer

This tool is provided "as is" and without any warranty, express or implied.

About

DynBench — Dynamic Question and Query Creation for Generating new Benchmarking Datasets

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •