Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Reasoning on AITP 1.0 (test)
Loading...
0.0382
BLEU
AITP
0.01688
0.022415
0.02795
0.033485
Apr 11, 2026
BLEU
ROUGE-L
BERTScore
MoverScore
GPTEval
Updated 1mo ago
Evaluation Results
Method
Method
Links
BLEU
ROUGE-L
BERTScore
MoverScore
GPTEval
AITP
Model Type=Traffic-spe...
2026.04
0.0382
0.18
0.636
0.6565
0.5485
Gemma-3n-E4B
Model Type=General MLL...
2026.04
0.0334
0.1623
0.6277
0.5454
0.5879
InternVL 3.5
Model Type=General MLL...
2026.04
0.0315
0.1687
0.6215
0.6182
0.2862
Kimi-VL-A3B
Model Type=General MLL...
2026.04
0.0257
0.1406
0.6001
0.5465
0.4553
Qwen3-VL
Model Type=General MLL...
2026.04
0.0177
0.1198
0.5883
0.6291
0.4545
Feedback
Search any
task
Search any
task