Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Traffic Reasoning on USTC-TFC 2016
Loading...
100
JSON Validity (%)
Zero-shot LLM
95
97.5
100
102.5
Apr 9, 2026
JSON Validity (%)
Evidence ROUGE-L
Evidence BERTScore
Description ROUGE-L
Description BERTScore
Updated 9d ago
Evaluation Results
Method
Method
Links
JSON Validity (%)
Evidence ROUGE-L
Evidence BERTScore
Description ROUGE-L
Description BERTScore
Zero-shot LLM
tuning=none, input=fea...
2026.04
100
13.86
83.77
13.65
85.36
Vanilla
encoder=frozen, auxili...
2026.04
100
63.83
92.72
54.47
91.63
mmTraffic
2026.04
100
88.53
97.69
77.14
95.27
Feedback
Search any
task
Search any
task