Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Non-Agentic Performance Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Non-Agentic Performance Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Fortress (test)
Llama Maverick 4
Mean Score
78.75
4
1mo ago
HarmBench (test)
LLama 4 Maverick
Mean Score
73.92
4
1mo ago
Persuade (test)
Gemini 2.5 Pro
Mean Score
53.2
4
1mo ago
Showing 3 of 3 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task