Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PRISM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Preference PredictionPRISM (test)
Accuracy66.62
51
Personalized Reward ModelingPRISM Personalized
Accuracy68.06
44
Preference AlignmentPRISM
Win-Rate (DPO)74.5
20
text-to-image generationPRISM
Alignment Score87.1
14
LLM as a JudgePRISM (test)
Accuracy58.9
14
Emotion and Micro-expression AnalysisPRISM
Macro-expression Accuracy80.2
13
Personalized Reward ModelingPRISM Overall
User-level Accuracy65.3
11
Personalized Reward ModelingPRISM Unseen
User-level Accuracy0.652
11
Personalized Reward ModelingPRISM Seen
User-level Accuracy65.3
11
Preference Alignment EvaluationPRISM (test)
BT Score (Mean)0.331
10
Model Selection EvaluationPRISM
Actual Score (per type)93.2
5
Preference AlignmentPRISM 1.0 (test)
Borda Average2.393
5
Preference AlignmentPRISM normalized-step (test)
Borda Avg2.328
5
Preference AlignmentPRISM 1.0 (full)
Borda Avg Score2.459
5
Consensus RankingPRISM Llama-3.2-1B
Exact Match94
1
Showing 15 of 15 rows