Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PRISM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Preference PredictionPRISM (test)
Accuracy66.62
51
PersonalizationPRISM
Personalization Win Rate81.62
45
Personalized Reward ModelingPRISM Personalized
Accuracy68.06
44
Cultural AlignmentPrism
Rating4.627
24
LLM-as-a-judgePRISM
Accuracy59.38
20
Preference AlignmentPRISM
Win-Rate (DPO)74.5
20
text-to-image generationPRISM
Alignment Score87.1
14
LLM as a JudgePRISM (test)
Accuracy58.9
14
Emotion and Micro-expression AnalysisPRISM
Macro-expression Accuracy80.2
13
Phone RecognitionPRiSM Multilingual Datasets
PFER (DRC)16.8
12
Phone RecognitionPRiSM Accented English Datasets
PFER (Timing)8.3
12
Personalized Reward ModelingPRISM Overall
User-level Accuracy65.3
11
Personalized Reward ModelingPRISM Unseen
User-level Accuracy0.652
11
Personalized Reward ModelingPRISM Seen
User-level Accuracy65.3
11
Pluralistic Reward Model LearningPRISM
Accuracy59.6
10
Preference Alignment EvaluationPRISM (test)
BT Score (Mean)0.331
10
User Simulation Intrinsic EvaluationPRISM
First-Turn Diversity94.55
8
Population property estimationPRISM
Bias (MAE)0.46
8
Reward ModelingPRISM Overall
Accuracy61.8
7
Reward ModelingPRISM Unseen
Accuracy61.6
7
Reward ModelingPRISM Seen
Accuracy62.1
7
Full-body motion estimationPRISM (test)
PA-MPJPE31.29
6
Systems OptimizationPRISM
Final Score26.26
5
Model Selection EvaluationPRISM
Actual Score (per type)93.2
5
Preference AlignmentPRISM 1.0 (test)
Borda Average2.393
5
Showing 25 of 30 rows