Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Independent Scoring on TSQueryBench
Loading...
0.96
Robustness: Linear Spike
Qwen 3 8B
0.3256
0.4903
0.655
0.8197
Apr 2, 2026
Robustness: Linear Spike
Accuracy
Robustness: Seasonal Drop
Robustness: Structural Break
Multi-Metric Consistency Score
Relative Extremum Score
Robustness: Mean Shift
Robustness: Volatility Shift
Updated 2mo ago
Evaluation Results
Method
Method
Links
Robustness: Linear Spike
Accuracy
Robustness: Seasonal Drop
Robustness: Structural Break
Multi-Metric Consistency Score
Relative Extremum Score
Robustness: Mean Shift
Robustness: Volatility Shift
Qwen 3 8B
Prompting strategy=rub...
2026.04
0.96
-
0.82
0.75
0.65
0.45
0.91
0.72
Gemma 2 9B
Prompting strategy=rub...
2026.04
0.58
-
0.03
0.68
0.3
0.33
0.33
0.45
LLaMA 3.1 8B
Prompting strategy=rub...
2026.04
0.35
-
0.27
0.22
0.38
0.16
0.27
0.35
Feedback
Search any
task
Search any
task