Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text Reconstruction on BBC News (test)
Loading...
99.67
BERTScore F1
Llama-3.2-3B-Instruct (SFT)
83.7476
87.8813
92.015
96.1487
May 27, 2026
BERTScore F1
Updated 5d ago
Evaluation Results
Method
Method
Links
BERTScore F1
Llama-3.2-3B-Instruct (SFT)
rkeep=0.9, protocol=QL...
2026.05
99.67
Llama-3.2-3B-Instruct (ZS)
rkeep=0.9, protocol=Ze...
2026.05
98.57
Gemini 2.0 Flash
rkeep=0.9, protocol=ze...
2026.05
98.39
Gemini 2.0 Flash
rkeep=0.7, protocol=ze...
2026.05
96.45
Llama-3.2-3B-Instruct (SFT)
rkeep=0.7, protocol=QL...
2026.05
96.08
Llama-3.2-3B-Instruct (SFT)
rkeep=0.5, protocol=QL...
2026.05
95.98
Llama-3.2-3B-Instruct (ZS)
rkeep=0.7, protocol=Ze...
2026.05
94.65
Gemini 2.0 Flash
rkeep=0.5, protocol=ze...
2026.05
93.14
Llama-3.2-3B-Instruct (SFT)
rkeep=0.3, protocol=QL...
2026.05
92.06
Llama-3.2-3B-Instruct (ZS)
rkeep=0.5, protocol=Ze...
2026.05
90.32
Gemini 2.0 Flash
rkeep=0.3, protocol=ze...
2026.05
89.48
Llama-3.2-3B-Instruct (SFT)
rkeep=0.1, protocol=QL...
2026.05
88.27
Llama-3.2-3B-Instruct (ZS)
rkeep=0.3, protocol=Ze...
2026.05
87.11
Gemini 2.0 Flash
rkeep=0.1, protocol=ze...
2026.05
85.27
Llama-3.2-3B-Instruct (ZS)
rkeep=0.1, protocol=Ze...
2026.05
84.36
Feedback
Search any
task
Search any
task