Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open PL LLM Leaderboard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Polish Instruction FollowingOpen PL LLM Leaderboard
Average Score69.84
45
Large Language Model EvaluationOpen PL LLM Leaderboard instruction-tuned
Overall Average Score69.84
44
Linguistic Implicatures DecodingOpen PL LLM Leaderboard Implicatures component base models
Average Score67.38
30
Showing 3 of 3 rows