Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Traffic Reasoning on CrossPlatform iOS
Loading...
100
JSON Validity
Zero-shot LLM
95
97.5
100
102.5
Apr 9, 2026
JSON Validity
Evidence ROUGE-L
Evidence BERTScore
Description ROUGE-L
Description BERTScore
Updated 9d ago
Evaluation Results
Method
Method
Links
JSON Validity
Evidence ROUGE-L
Evidence BERTScore
Description ROUGE-L
Description BERTScore
Zero-shot LLM
tuning=none, input=fea...
2026.04
100
19.62
85.09
12.68
85.03
Vanilla
encoder=frozen, auxili...
2026.04
100
22.18
85.91
12.55
85.35
mmTraffic
2026.04
100
68.8
93.87
59.72
92.83
Feedback
Search any
task
Search any
task