Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Template T3

Benchmarks

Task NameDataset NameSOTA ResultTrend
HelpfulnessTemplate T3 GPT-4 evaluation (test)
Win Rate91.62
5
HarmlessnessTemplate T3 GPT-4 evaluation (test)
Win Rate87.5
5
Showing 2 of 2 rows