Share your thoughts, 1 month free Claude Pro on usSee more

Pairwise Discrimination on Management and Economics Research Pitch Pairs shared pairwise subset (test)

78.67Distance 1 Accuracy

SFT GPT-4.1

Updated 4mo ago

Evaluation Results

Method	Links
SFT GPT-4.1 2026.03		78.67	89	92	84.33
GPT-5.2 High 2026.03		69.33	85	94	78.67
GPT-4.1 (baseline) 2026.03		69.33	79	90	76
Gemini 3.1 Pro 2026.03		68.67	86	86	77.33