Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open-Ended Questions

Benchmarks

Task NameDataset NameSOTA ResultTrend
Alignment EvaluationOpen-ended questions
Win Rate68.9
6
Instruction followingOpen-Ended Questions (Human Generated Questions)
Instruction Following Rate70.1
2
Instruction followingOpen-Ended Questions GPT-4 Generated Questions
Instruction Following Rate96.9
2
Instruction followingOpen-Ended Questions Generated Questions GPT-3.5-Turbo
Instruction Following Rate95.9
2
Showing 4 of 4 rows