Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MTBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM-as-a-JudgeMTbench (test)
StdDev2.24
45
General CapabilityMTBench
MTBench Score9.14
43
Multi-turn DialogueMTBench101
Score9.03
33
Pair-wise comparisonMTBench Human
Accuracy88.9
16
Trend PredictionMTBench Weather (Long)
Past Accuracy93.496
10
Trend PredictionMTBench Weather Short
Past Trend Prediction Score93.877
10
Trend PredictionMTBench Finance Long
3-way Accuracy62.671
10
Trend PredictionMTBench Finance Short
3-way Score66.849
10
Time Series ForecastingMTBench Weather Long
MSE11.823
10
Time Series ForecastingMTBench Weather Short
MSE10.02
10
Time Series ForecastingMTBench Finance (Long)
MAPE3.531
10
Time Series ForecastingMTBench Finance Short
MAPE2.545
10
Question AnsweringMTBench Weather
Accuracy71.7
9
Question AnsweringMTBench Finance
Accuracy91.3
9
RegressionMTBench Weather
MAE3.523
9
RegressionMTBench Finance
MAE0.814
9
ClassificationMTBench Weather
Accuracy55.7
9
ClassificationMTBench Finance
Accuracy54.3
9
Helpfulness EvaluationMTBench
Helpfulness9.35
8
Temperature ForecastingMTBench Temperature Forecasting (14-day)
MSE5.026
8
Temperature ForecastingMTBench Temperature Forecasting 7-day
MSE4.021
8
Stock Indicator ForecastingMTBench Stock Indicator Forecasting (30-day)
MACD Score3.342
8
Stock Indicator ForecastingMTBench Stock Indicator Forecasting (7-day)
MACD2.047
8
Stock Price ForecastingMTBench Stock Price Forecasting (30-day)
MAE1.122
8
Stock Price ForecastingMTBench Stock Price Forecasting 7-day
MAE0.788
8
Showing 25 of 30 rows