Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Understandability on Understandability Experiment Paraphrase strategy
Loading...
7
Significance Count (out of 7)
Claude
2.84
3.92
5
6.08
May 7, 2026
Significance Count (out of 7)
Adjusted R2
Spearman Correlation
Majority Fit
Updated 26d ago
Evaluation Results
Method
Method
Links
Significance Count (out of 7)
Adjusted R2
Spearman Correlation
Majority Fit
Claude
2026.05
7
0.63
-
-
GPT-4o-m
2026.05
6
0.67
-
-
Majority Vote (MUM)
2026.05
6
-
-
-
GPT-4o
2026.05
5
0.67
-
-
Llama
2026.05
5
0.98
-
-
Grok
2026.05
5
0.54
-
-
Mistral
2026.05
3
1
-
-
Qwen
2026.05
3
1
-
-
Feedback
Search any
task
Search any
task