Sahara Benchmark

Task Clusters

Sahara is designed to evaluate model performance across a diverse set of languages and tasks, reflecting the continent's rich linguistic landscape. The benchmark comprises 16 tasks across four primary task clusters, providing a robust framework for evaluation.

(1) Multiple-Choice, Comprehensive and Reasoning (MCCR)
Task Name	Identifier	#Languages	Score Metric	BibTeX
Context-based Question Answering	squad_qa	1	Macro F1
General Knowledge	mmlu	16	Accuracy
Mathematical Word Problems	mgsm	16	Exact Match
Reading Comprehension	belebele	25	Accuracy
(2) Text Classification
Task Name	Identifier	#Languages	Score Metric	BibTeX
Cross-Lingual Natural Language Inference	xlni	16	Accuracy
Language Identification	lid	517	Macro F1
News Classification	news	4	Macro F1
Sentiment Analysis	sentiment	3	Macro F1
Topic Classification	topic	2	Macro F1
(3) Text Generation
Task Name	Identifier	#Languages	Score Metric	BibTeX
Machine Translation - African to African	mt_xx2xx	29	spBleu-1K
Machine Translation - English to African	mt_eng2xx	29	spBleu-1K
Machine Translation - French to African	mt_fra2xx	29	spBleu-1K
Paraphrase	paraphrase	4	spBleu-1K
Summarization	summary	10	RougeL
Title Generation	title	10	spBleu-1K
(4) Tokens
Task Name	Identifier	#Languages	Score Metric	BibTeX
NER	ner	27	Macro F1
Phrase Chunking	phrase	8	Macro F1
POS Tagging	pos	1	Macro F1

Task Clusters

(1) Multiple-Choice, Comprehensive and Reasoning (MCCR)

(2) Text Classification

(3) Text Generation

(4) Tokens