Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RPEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Discriminative ReasoningRPEval Multi-MICRO Explicit Memory
IA Score0.71
16
Discriminative ReasoningRPEval Multi-MACRO, Explicit Memory
IA Score0.4
16
Discriminative ReasoningRPEval Single, Explicit Memory
Ignorance Score0.02
16
Discriminative TaskRPEval Implicit Memory, Multi-Preference
IA (Macro)0.44
16
Discriminative TaskRPEval Implicit Memory, Single-Preference
Ignorance Score0.02
16
Personalized Response GenerationRPEVAL
Macro Accuracy24
4
Showing 6 of 6 rows