Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-domain datasets

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM RoutingIn-domain datasets Cost First, alpha=0.8
Accuracy93
11
LLM RoutingIn-domain datasets Balance, alpha=0.5
Accuracy93
11
LLM RoutingIn-domain datasets Performance First, alpha=0.2
Accuracy93
11
Claim Verification9 In-Domain datasets (FEVER, ClaimDecomp, HoVer, FEVEROUS, WiCE, Ex-FEVER, PubHealth, PubMedClaim, FoolMeTwice)
FEVER Accuracy74.1
6
Showing 4 of 4 rows