Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pairwise Discrimination on Management and Economics Research Pitch Pairs shared pairwise subset (test)
Loading...
78.67
Distance 1 Accuracy
SFT GPT-4.1
68.27
70.97
73.67
76.37
Mar 17, 2026
Distance 1 Accuracy
Distance 2 Accuracy
Distance 3 Accuracy
Weighted Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Distance 1 Accuracy
Distance 2 Accuracy
Distance 3 Accuracy
Weighted Accuracy
SFT GPT-4.1
task=label-free
2026.03
78.67
89
92
84.33
GPT-5.2 High
task=label-free
2026.03
69.33
85
94
78.67
GPT-4.1 (baseline)
task=label-free
2026.03
69.33
79
90
76
Gemini 3.1 Pro
task=label-free
2026.03
68.67
86
86
77.33
Feedback
Search any
task
Search any
task