Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Combined Datasets

Benchmarks

Task NameDataset NameSOTA ResultTrend
Trajectory ReconstructionCombined Datasets All
Manual Interventions1,202
2
Fact-checking Explanation GenerationCombined Datasets (Overall)
Helpfulness Score73
2
Showing 2 of 2 rows