(formerly known as ScandEval)
- Dan Saattrup Smart (@saattrupdan, dan.smart@alexandra.dk)
See the documentation for more information.
All datasets used in this project are generated using the scripts located in the src/scripts folder. To reproduce a dataset, run the corresponding script with the following command
uv run src/scripts/<name-of-script>.pyReplace with the specific script you wish to execute, e.g.,
uv run src/scripts/create_allocine.pyA huge thank you to all the contributors who have helped make this project a success!
We welcome contributions to EuroEval! Whether you're fixing bugs, adding features, or contributing new datasets, your help makes this project better for everyone.
- General contributions: Check out our contribution guidelines for information on how to get started.
- Adding datasets: If you're interested in adding a new dataset to EuroEval, we have a dedicated guide with step-by-step instructions.
- Thanks to Google for sponsoring Gemini credits as part of their Google Cloud for Researchers Program.
- Thanks @Mikeriess for evaluating many of the larger models on the leaderboards.
- Thanks to OpenAI for sponsoring OpenAI credits as part of their Researcher Access Program.
- Thanks to UWV and KU Leuven for sponsoring the Azure OpenAI credits used to evaluate GPT-4-turbo in Dutch.
- Thanks to Miðeind for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in Icelandic and Faroese.
- Thanks to CHC for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in German.
If you want to cite the framework then feel free to use this:
@article{smart2024encoder,
title={Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks},
author={Smart, Dan Saattrup and Enevoldsen, Kenneth and Schneider-Kamp, Peter},
journal={arXiv preprint arXiv:2406.13469},
year={2024}
}
@inproceedings{smart2023scandeval,
author = {Smart, Dan Saattrup},
booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
month = may,
pages = {185--201},
title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}},
year = {2023}
}