Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Traffic Reasoning on CrossPlatform-Android
Loading...
1
JSON Validity
Zero-shot LLM
0.95
0.975
1
1.025
Apr 9, 2026
JSON Validity
Evidence ROUGE-L
Evidence BERTScore
Description ROUGE-L
Description BERTScore
Updated 9d ago
Evaluation Results
Method
Method
Links
JSON Validity
Evidence ROUGE-L
Evidence BERTScore
Description ROUGE-L
Description BERTScore
Zero-shot LLM
tuning=none, input=fea...
2026.04
1
0
0
0
0.8405
Vanilla
encoder=frozen, auxili...
2026.04
1
0.2107
0.8661
0.1283
0.8542
mmTraffic
2026.04
1
0.5482
0.906
0.5605
0.9299
Feedback
Search any
task
Search any
task