Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on expert-curated (test)
Loading...
31.65
Token F1
DoRA SFT with human-annotated supervision
22.6124
24.9587
27.305
29.6513
Apr 20, 2026
Token F1
ROUGE-L
BLEU
BLEURT Score
BERTScore F1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Token F1
ROUGE-L
BLEU
BLEURT Score
BERTScore F1
DoRA SFT with human-annotated supervision
Retrieval mechanism=GT...
2026.04
31.65
29.28
6.78
-0.4
75.88
Llama3.1-8B-Instruct (base)
Retrieval mechanism=GT...
2026.04
25.27
23.5
6.62
-0.827
70.98
GPT-4o
Retrieval mechanism=GT...
2026.04
25.19
23.61
5.9
-0.793
71.05
DoRA SFT
Retrieval mechanism=GT...
2026.04
22.96
23.07
2
-0.565
72.26
Feedback
Search any
task
Search any
task