Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AITA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Annotator ModelingAITA situation
Accuracy68.1
19
Verdict PredictionAITA 1.0 (author split)
Accuracy85.6
14
Judgment PredictionAITA verdict
Accuracy85.9
14
Sycophancy EvaluationAITA
Sycophancy Score (S) PD-L0.54
6
Causal Effect EstimationAITA comments
ΔATE3.43
6
Causal EstimationAITA anger
ΔATE154.61
6
Showing 6 of 6 rows