Potato Annotation Showcase

A curated collection of 361 example annotation task configurations for Potato, the lightweight annotation tool for NLP research. Covers all 22 Potato annotation types, 90 SemEval shared tasks (2013-2025), and benchmarks from ACL, EMNLP, NeurIPS, ICML, ICLR, CVPR, and more.

Text Annotation (111 tasks)

Subcategory	Tasks	Examples
Emotion & Sentiment	8	GoEmotions, SemEval Sentiment, Multirate Sentiment
Hate Speech & Moderation	6	HateXplain, Implicit Hate, Social Bias Frames, Toxic Spans
Named Entity Recognition	5	CoNLL-2003, WNUT-2017, Biomedical NER, Complex NER
Information Extraction	7	KG-BERT, Event Arguments, Dialogue Relations
Argumentation & Stance	5	Argument Quality, Stance Detection, Rumor Stance
Fact Verification	8	FActScore, FAVA, Scientific Claims, Propaganda
Commonsense & Ethics	5	Social Chemistry, Moral Stories, Commonsense Inference
Explainability	2	Rationale Annotation, NLI Explanation
Dialogue	2	SWBD-DAMSL Dialogue Acts, Conversation Quality
Political & Media	1	Political Discourse
Discourse	3	PDTB Discourse Trees, DISRPT, Timeline Relations
Coreference	4	OntoNotes, CorefUD, MAVEN-ERE, Legal Coreference
Cross-lingual	5	XNLI, Belebele, FLORES MT Quality, IndicNLP
Domain-specific	8	BioNLP, ChemProt, Clinical NER, Legal, Medical
Computational Social Science	7	OffensEval, Moral Foundations, Politeness, Media Frames
Relation Extraction	6	MultiTACRED, CrossRE, RadGraph, SciER
Entity Linking	2	AIDA-CoNLL, MedMentions
Code Annotation	1	CodeXGLUE Defect Detection
Tabular	1	Tabular Data Annotation
Reading Comprehension	1	SQuAD Extractive QA
Natural Language Inference	2	SNLI, MultiNLI
Question Answering	2	Natural Questions, TriviaQA
Information Retrieval	2	MS MARCO, TREC-DL
Semantic Similarity	1	STS Benchmark
Word Sense	1	SemEval-2007 WSD
Parsing	1	Universal Dependencies
Education	3	Essay Scoring, MathDial Tutoring, Student Essay Discourse
Financial	3	FinBERT, FLARE NER, Financial PhraseBank
Bias & Toxicity	—	See subcategories

Image Annotation (33 tasks)

Subcategory	Tasks	Examples
Classification	6	MS-COCO, ImageNet, Places365, CUB-200
Segmentation	3	Cityscapes, ADE20K, LIP Human Parsing
Visual QA	2	VQAv2, TextVQA
Visual Grounding	1	RefCOCO
Medical Imaging	3	CheXpert, MIMIC-CXR, Camelyon Pathology
Human Pose	1	ViTPose Keypoint Annotation
Generation Evaluation	1	T2I-CompBench
Autonomous Driving	2	KITTI, BDD100K
Aerial & Remote Sensing	3	BigEarthNet, xView, DOTA
Specialized Domains	6	MVTec-AD, DeepFashion, CelebA, iWildCam
Document Analysis	3+	DocLayNet, OmniDocBench, SA-1B

Video Annotation (37 tasks)

Subcategory	Tasks	Examples
Action Recognition	8	AVA, Charades, THUMOS14, EPIC-KITCHENS
Temporal Grounding	3	ActivityNet Captions, DiDeMo, Charades-STA
Video Summarization	4	TVSum, SumMe, YouTube Highlights, LSMDC
Boundary Detection	3	Scene/Shot Boundary, MovieScenes
Video QA	2	NExT-QA, MVBench
Scene Understanding	1	MovieNet Scene Classification
Instructional Video	2	HowTo100M, YouCook2
Other Video Tasks	14	Video-ChatGPT, Sign Language, Child Language, etc.

Audio Annotation (17 tasks)

Task	Description
librispeech-transcription	Audio quality + transcription (slider, audio_annotation)
speech-commands-recognition	Speech command labeling (audio_annotation)
covost-speech-translation	Speech translation evaluation
clotho-audio-captioning	Audio event captioning
audio-transcription	Speech transcription review
speaker-diarization	Speaker identification
emotion-recognition	Speech emotion classification
music-genre-classification	Music genre tagging
+ 9 more	DiSPLACE, DoReCo, EmoBox, VoiceMOS, etc.

Evaluation Tasks (23 tasks)

Task	Paper	Types
wildbench-llm-eval	WildBench (COLM 2024)	pairwise, likert, text
mt-bench-judge-consistency	MT-Bench (NeurIPS 2023)	pairwise, likert, radio
arena-hard-auto	Arena Hard (2024)	pairwise (scale), likert
rewardbench-reward-eval	RewardBench (ICML 2024)	pairwise, radio, multirate
mmlu-knowledge-eval	MMLU (ICLR 2021)	radio, text
humaneval-code-correctness	HumanEval (2021)	radio, text, number
gpqa-expert-qa	GPQA (ICLR 2024)	number, radio, text
big-bench-task-eval	BIG-Bench (TMLR 2023)	radio, text, number
helm-model-card-display	HELM (TMLR 2023)	pure_display, likert
chatbot-arena-pairwise-bws	Chatbot Arena (ICML 2024)	bws, pairwise
+ 13 more	AlpacaEval, DoNotAnswer, ESA-MT, IFEval, etc.

Preference Learning & RLHF (18 tasks)

Task	Paper	Types
dpo-preference-data	DPO (NeurIPS 2023)	pairwise, radio, text
ultrafeedback-multiaspect	UltraFeedback (ICML 2024)	multirate, likert, text
spin-self-play	SPIN (ICML 2024)	pairwise, radio
constitutional-ai-harmlessness	Constitutional AI (2022)	radio, likert, text
mmlu-pro-tiered-eval	MMLU-Pro (NeurIPS 2024)	tiered_annotation, radio
+ 13 more	HH-RLHF, SafeRLHF, BeaverTails, WebGPT, etc.

SemEval Shared Tasks (90 tasks)

Comprehensive coverage of SemEval shared tasks from 2013-2025. See SEMEVAL.md for details.

Year	Tasks	Highlights
2025	10	Multimodal idiomaticity, entity-aware MT, emotion detection
2024	9	Semantic relatedness, persuasion in memes, BRAINTEASER
2023	10	Visual WSD, clickbait spoiling, AfriSenti
2022	10	Patronizing language, idiomaticity, news similarity
2021	9	Lexical complexity, humor detection, MeasEval
2020	9	Commonsense validation, counterfactuals, code-mixed
2019	7	HatEval, hyperpartisan news, suggestion mining
2018	10	Emoji prediction, irony, cybersecurity NER
2017	5	Financial sentiment, humor, pun detection
2016	7	Stance detection, aspect sentiment, clinical TempEval
2013-2015	4	Drug interactions, ABSA, timeline ordering, clinical

Annotation Type Coverage

All 22 Potato annotation types are represented:

Type	Count	Example Tasks
radio	483	GoEmotions, SNLI, MMLU, most classification tasks
text	160	SQuAD, Natural Questions, code review, translations
likert	128	STS-B, essay scoring, MT quality, humor ratings
multiselect	126	GoEmotions, moral foundations, persuasion techniques
span	110	NER tasks, PICO extraction, SQuAD answer spans
video_annotation	46	Action recognition, temporal grounding, MVBench
pairwise	16	DPO, Arena Hard, WildBench, MT-Bench
slider	8	STS-B similarity, essay scoring, word similarity
image_annotation	6	ViTPose, RefCOCO, Camelyon pathology
select	6	MS MARCO, WSD, Financial PhraseBank
number	5	GPQA confidence, HumanEval, NumEval, event counting
multirate	3	UltraFeedback, RewardBench, SemEval sentiment
audio_annotation	3	LibriSpeech, Speech Commands, CoVoST
tree_annotation	3	PDTB, UD parsing, RumourEval thread structure
video	2	Video-ChatGPT display
triage	2	CoNLL-2003 triage, triage template
tiered_annotation	1	MMLU-Pro tiered evaluation
bws	1	Chatbot Arena best-worst scaling
pure_display	1	HELM model card display
event_annotation	1	BioNLP gene regulation events
coreference	1	OntoNotes coreference resolution
span_link	9	Chemical-disease relations, structured sentiment

Structure

Each task folder contains:

metadata.json - Task metadata (title, description, tags, paper reference, citation)
config.yaml - Potato configuration file
sample-data.json - Example data for testing (8-12 items)

Quick Start

# Clone this repository
git clone https://github.com/davidjurgens/potato-showcase.git

# Navigate to a task
cd potato-showcase/text/emotion-sentiment/goemotions

# Run with Potato
potato start config.yaml

Usage

Clone this repository
Browse categories to find a relevant task
Copy the task folder to your project
Customize the config.yaml for your needs
Run with: potato start config.yaml

Contributing

We welcome contributions! To add a new task:

Create a folder in the appropriate category
Add required files (metadata.json, config.yaml, sample-data.json)
Include paper reference and BibTeX citation if based on published work
Submit a pull request

License

MIT License - feel free to use these configurations in your projects.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
agentic		agentic
audio		audio
evaluation		evaluation
image		image
multimodal		multimodal
preference-learning		preference-learning
semeval		semeval
templates		templates
text		text
video		video
.gitignore		.gitignore
ANNOTATION-TYPES.md		ANNOTATION-TYPES.md
CATALOG.md		CATALOG.md
README.md		README.md
SEMEVAL.md		SEMEVAL.md

Category	Description	Tasks
text/	Text-based NLP tasks (emotion, NER, IE, QA, parsing, etc.)	111
image/	Image annotation (classification, VQA, grounding, medical)	33
video/	Video annotation (action recognition, QA, summarization)	37
audio/	Audio annotation (transcription, commands, captioning)	17
evaluation/	AI output evaluation (LLM judging, code, benchmarks)	23
preference-learning/	RLHF, DPO, and preference annotation tasks	18
multimodal/	Cross-modal tasks (robotics, chart analysis, science QA)	9
agentic/	Agent evaluation tasks (web agents, code agents)	3
semeval/	SemEval shared tasks (2013-2025, 90 tasks)	90
templates/	Generic reusable annotation templates	20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Potato Annotation Showcase

Categories

Text Annotation (111 tasks)

Image Annotation (33 tasks)

Video Annotation (37 tasks)

Audio Annotation (17 tasks)

Evaluation Tasks (23 tasks)

Preference Learning & RLHF (18 tasks)

SemEval Shared Tasks (90 tasks)

Annotation Type Coverage

Structure

Quick Start

Usage

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

davidjurgens/potato-showcase

Folders and files

Latest commit

History

Repository files navigation

Potato Annotation Showcase

Categories

Text Annotation (111 tasks)

Image Annotation (33 tasks)

Video Annotation (37 tasks)

Audio Annotation (17 tasks)

Evaluation Tasks (23 tasks)

Preference Learning & RLHF (18 tasks)

SemEval Shared Tasks (90 tasks)

Annotation Type Coverage

Structure

Quick Start

Usage

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages