Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Evaluation on English lm-evaluation-harness

0.819ARC Easy Acc (Norm)

OjaKV

0.5052320.5866910.668150.749609Sep 25, 2025Oct 15, 2025Nov 4, 2025Nov 25, 2025Dec 15, 2025Jan 4, 2026Jan 25, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
0.819-0.5179--0.5914---0.7992--0.73950.6934
2025.09
0.819-0.5188--0.5835---0.7982--0.73950.6918
2025.09
0.8186-0.5188--0.5904---0.8003--0.73880.6934
2025.09
0.8165-0.5154--0.5774---0.7965--0.73560.6883
2025.09
0.8165-0.5154--0.5774---0.7965--0.73560.6883
2025.09
0.7938-0.4812--0.5487---0.7867--0.69850.6618
2025.09
0.7938-0.4812--0.5487---0.7867--0.69850.6618
2025.09
0.7386-0.442--0.578---0.7639--0.66380.6373
2025.09
0.7386-0.4437--0.5751---0.7579--0.6630.6357
2025.09
0.7374-0.4437--0.5706---0.7639--0.6630.6357
2025.09
0.713-0.4138--0.5629---0.7503--0.6590.6198
2025.09
0.713-0.4138--0.5629---0.7503--0.6590.6198
2025.09
0.7003-0.3993--0.5499---0.7454--0.62980.6049
2025.09
0.7003-0.3993--0.5499---0.7454--0.62980.6049
2026.01
0.53870.2590.27640.02370.0490.42470.23020.01690.310.68390.7160.0160.5375-
2026.01
0.51730.25850.2670.0180.03820.42040.23310.01660.2960.67730.6760.02380.5295-