Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Metric Sensitivity Analysis on Quilt-1M Logic Error
Loading...
91
Score
BERTScore
9.88
30.94
52
73.06
Mar 17, 2026
Score
Delta (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Delta (%)
BERTScore
Focus=Semantic
2026.03
91
1.1
PathGLS (Sg)
Focus=Visual-Text
2026.03
73
5.2
PathGLS (Sl)
Focus=Consistency
2026.03
67
26.4
RadGraph
Focus=Entity
2026.03
25
19.4
BLEU-4
Focus=Lexical
2026.03
13
18.8
Feedback
Search any
task
Search any
task