Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chatbot Evaluation on WildBench

71.64Overall Score

o3-mini

63.94465.94267.9469.938Sep 25, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.09
71.6469.0472.4474.3765.8173.21
2025.09
70.3371.7370.7369.3768.9670.94
2025.09
67.5768.6367.9564.6866.7869.53
2025.09
67.3868.4268.1365.3266.3468.49
2025.09
65.4566.7265.9463.5963.0867.36
2025.09
64.2470.7566.2959.268.5661.04