Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

T3

Benchmarks

Task NameDataset NameSOTA ResultTrend
SafetyT3
T3 Score85.1
21
ClusteringT3
Clustering Accuracy (CA)58.63
12
ClusteringT3
ARI0.0338
12
Workflow ExecutionT3 Payment
TSR100
11
Oracle MatchingT3 update events
Oracle Match (%)75
5
Stitched image rectanglingT3 (test)
PSNR25.1
4
Research AssistantT3 Research 1.0 (test)
Task Completion Rate88
4
Task T3T3
Token Usage (Input + Output)2,156
4
Predictive ModelingT3
Loss0.063
3
Showing 9 of 9 rows