Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Classification on Classification task dataset
Loading...
31.3
Tok-F1
LatentQA
11.436
16.593
21.75
26.907
May 25, 2026
Tok-F1
chrF
Updated 8d ago
Evaluation Results
Method
Method
Links
Tok-F1
chrF
LatentQA
Donor=Llama-3.1-8B-Ins...
2026.05
31.3
28.5
AO
Donor=Llama-3.1-8B-Ins...
2026.05
31.1
28.5
UAV
Donor=Llama-3.1-8B-Ins...
2026.05
30.9
28.4
UAV
Donor=Llama-3.1-8B-Ins...
2026.05
30.7
28.3
UAV
Donor=Qwen3-4B-Instruc...
2026.05
29.8
27.6
UAV
Donor=Llama-3.1-8B-Ins...
2026.05
29.8
27.3
AO
Donor=Qwen3-4B-Instruc...
2026.05
29.2
27.9
UAV
Donor=Qwen3-4B-Instruc...
2026.05
29.1
27.1
LatentQA
Donor=Qwen3-4B-Instruc...
2026.05
28.4
26.6
PatchScope
Donor=Llama-3.1-8B-Ins...
2026.05
15.6
19.2
PatchScope
Donor=Qwen3-4B-Instruc...
2026.05
12.6
19
SelfIE
Donor=Llama-3.1-8B-Ins...
2026.05
12.3
16.4
SelfIE
Donor=Qwen3-4B-Instruc...
2026.05
12.2
18.7
Feedback
Search any
task
Search any
task