Task Clusters

Sahara is designed to evaluate model performance across a diverse set of languages and tasks, reflecting the continent's rich linguistic landscape. The benchmark comprises 16 tasks across four primary task clusters, providing a robust framework for evaluation.


(1) Multiple-Choice, Comprehensive and Reasoning (MCCR)

Task Name Identifier
#Languages
Score Metric
BibTeX
Context-based Question Answeringsquad_qa
1
Macro F1
General Knowledgemmlu
16
Accuracy
Mathematical Word Problemsmgsm
16
Exact Match
Reading Comprehensionbelebele
25
Accuracy


(2) Text Classification

Task Name Identifier
#Languages
Score Metric
BibTeX
Cross-Lingual Natural Language Inferencexlni
16
Accuracy
Language Identificationlid
517
Macro F1
News Classificationnews
4
Macro F1
Sentiment Analysissentiment
3
Macro F1
Topic Classificationtopic
2
Macro F1


(3) Text Generation

Task Name Identifier
#Languages
Score Metric
BibTeX
Machine Translation - African to Africanmt_xx2xx
29
spBleu-1K
Machine Translation - English to African mt_eng2xx
29
spBleu-1K
Machine Translation - French to Africanmt_fra2xx
29
spBleu-1K
Paraphraseparaphrase
4
spBleu-1K
Summarizationsummary
10
RougeL
Title Generationtitle
10
spBleu-1K


(4) Tokens

Task Name Identifier
#Languages
Score Metric
BibTeX
NERner
27
Macro F1
Phrase Chunkingphrase
8
Macro F1
POS Taggingpos
1
Macro F1