Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BAMBOO

Benchmarks

Task NameDataset NameSOTA ResultTrend
AMR Similarity ConsistencyBAMBOO (test)
Main - STS-B66.94
17
Question AnsweringBamboo
Accuracy50.4
14
Long-context ReasoningBAMBOO 16k
AltQA Score41.5
13
Expected Calibration ErrorBamboo
Expected Calibration Error34.01
10
Story Point Estimationbamboo
Spearman's Rho0.1768
3
Showing 5 of 5 rows