Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Preference Bias Analysis on AlpacaEval

74.1PIR

LongCat-Flash-Chat

-2.23617.58237.457.218Apr 24, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
74.143.40.3071,311890----
59.841.80.1811,305864----
56.841.60.1521,357902----
56.634.10.2261,349916----
5141.50.0951,336827----
50.441.50.091,396948----
49.939.90.11,362902----
47.635.20.1241,228554----
39.537.10.0241,368897----
37.333.70.0351,371922----
31.435.7-0.0431,283846----
23.638.8-0.1521,098714----
19.342.2-0.2291,155773----
12.524.2-0.1171,106753----
11.927.1-0.1511,331898----
38.2-0.0521,183766----
28-0.061,050710----
0.810.5-0.0971,247845----
0.70.7-0.0011,004667----
0.710.8-0.1021,296848----