Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Correlation Analysis on HAMLETJudge (val)
Loading...
0.792
CP Correlation
HamletJudge
0.67032
0.70191
0.7335
0.76509
Jul 21, 2025
CP Correlation
NQ Correlation
IE Correlation
Average Correlation
Updated 1mo ago
Evaluation Results
Method
Method
Links
CP Correlation
NQ Correlation
IE Correlation
Average Correlation
HamletJudge
2025.07
0.792
0.807
0.773
0.791
Gemini-2.5-Pro
2025.07
0.72
0.684
0.701
0.702
Claude-4-Sonnet
2025.07
0.698
0.783
0.804
0.762
GPT-4.1
2025.07
0.675
0.593
0.622
0.63
Feedback
Search any
task
Search any
task