Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BAMBOO

Benchmarks

Task NameDataset NameSOTA ResultTrend
AMR Similarity ConsistencyBAMBOO (test)
Main - STS-B66.94
17
Question AnsweringBamboo
Accuracy50.4
14
Long-context ReasoningBAMBOO 16k
AltQA Score41.5
13
Expected Calibration ErrorBamboo
Expected Calibration Error34.01
10
Showing 4 of 4 rows