GitHub - Touzen/EuroEval: The robust European language model benchmark.

The robust European language model benchmark

(formerly known as ScandEval)

Maintainer

Dan Saattrup Smart (@saattrupdan, dan.smart@alexandra.dk)

Installation and usage

See the documentation for more information.

Reproducing the evaluation datasets

All datasets used in this project are generated using the scripts located in the src/scripts folder. To reproduce a dataset, run the corresponding script with the following command

uv run src/scripts/<name-of-script>.py

Replace with the specific script you wish to execute, e.g.,

uv run src/scripts/create_allocine.py

Contributors 🙏

A huge thank you to all the contributors who have helped make this project a success!

Contribute to EuroEval

We welcome contributions to EuroEval! Whether you're fixing bugs, adding features, or contributing new datasets, your help makes this project better for everyone.

General contributions: Check out our contribution guidelines for information on how to get started.
Adding datasets: If you're interested in adding a new dataset to EuroEval, we have a dedicated guide with step-by-step instructions.

Special thanks

Thanks to Google for sponsoring Gemini credits as part of their Google Cloud for Researchers Program.
Thanks @Mikeriess for evaluating many of the larger models on the leaderboards.
Thanks to OpenAI for sponsoring OpenAI credits as part of their Researcher Access Program.
Thanks to UWV and KU Leuven for sponsoring the Azure OpenAI credits used to evaluate GPT-4-turbo in Dutch.
Thanks to Miðeind for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in Icelandic and Faroese.
Thanks to CHC for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in German.

Citing EuroEval

If you want to cite the framework then feel free to use this:

@article{smart2024encoder,
  title={Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks},
  author={Smart, Dan Saattrup and Enevoldsen, Kenneth and Schneider-Kamp, Peter},
  journal={arXiv preprint arXiv:2406.13469},
  year={2024}
}
@inproceedings{smart2023scandeval,
  author = {Smart, Dan Saattrup},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  month = may,
  pages = {185--201},
  title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}},
  year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4,089 Commits
.github		.github
docs		docs
gfx		gfx
src		src
tests		tests
.gitignore		.gitignore
.markdownlint.jsonc		.markdownlint.jsonc
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NEW_DATASET_GUIDE.md		NEW_DATASET_GUIDE.md
README.md		README.md
makefile		makefile
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The robust European language model benchmark

Maintainer

Installation and usage

Reproducing the evaluation datasets

Contributors 🙏

Contribute to EuroEval

Special thanks

Citing EuroEval

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The robust European language model benchmark

Maintainer

Installation and usage

Reproducing the evaluation datasets

Contributors 🙏

Contribute to EuroEval

Special thanks

Citing EuroEval

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages