Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TRAIL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error category predictionTRAIL Planning and Reasoning categories (117 traces)
Micro F149.7
6
Multi-agent recommendationtrail-benchmark
Top-1 Accuracy100
4
Single-agent tool selectionTRAIL
Top-1 Accuracy98.15
4
Showing 3 of 3 rows