Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Natural Language Processing on BigBench II
Loading...
-0.37
Accuracy Degradation (%)
PromptCOS
-0.478
0.251
0.98
1.709
Sep 3, 2025
Accuracy Degradation (%)
BERTScore
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy Degradation (%)
BERTScore
PromptCOS
Model=Deepseek-d-qwen
2025.09
-0.37
62
PromptCOS
Model=Gemma2-it
2025.09
0
65
PC*
Model=Gemma2-it
2025.09
0.03
53
PromptCOS
Model=TinyLlama-chat
2025.09
0.08
68
PC*
Model=TinyLlama-chat
2025.09
0.53
71
PC*
Model=Deepseek-d-qwen
2025.09
0.83
81
PCG
Model=TinyLlama-chat
2025.09
1.41
45
PCG
Model=Deepseek-d-qwen
2025.09
1.6
52
PCG
Model=Gemma2-it
2025.09
2.33
46
Feedback
Search any
task
Search any
task