Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AmazonHistoryPrice

Benchmarks

Task NameDataset NameSOTA ResultTrend
BargainingAmazonHistoryPrice CI 1.0 (test)
Count44
28
BargainingAmazonHistoryPrice ALL 1.0 (test)
Count930
28
BargainingAmazonHistoryPrice (held-out test)
Reward0.7664
16
BargainingAmazonHistoryPrice MI 1.0 (test)
Count835
14
NegotiationAmazonHistoryPrice evaluated against gpt-5.4-high-reasoning seller (test)
Reward0.408
7
BargainingAmazonHistoryPrice (test)
Deal Rate46.8
4
Showing 6 of 6 rows