Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Translation on Translation misgendering evaluation set into English zero-shot SynthBio v3 (test)
Loading...
97.2
Accuracy (Overall)
PaLM 540B
92.34
94.77
97.2
99.63
May 17, 2023
Accuracy (Overall)
Accuracy ('he')
Accuracy ('she')
Accuracy (Language Worst Case)
Accuracy (Eval Set Worst Case)
Accuracy (Disaggregated Worst Case)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (Overall)
Accuracy ('he')
Accuracy ('she')
Accuracy (Language Worst Case)
Accuracy (Eval Set Worst Case)
Accuracy (Disaggregated Worst Case)
PaLM 540B
mode=zero-shot
2023.05
97.2
100
94.4
86.4
91.4
50
PaLM 2 (L)
mode=zero-shot, checkp...
2023.05
97.2
99.9
94.5
87.3
89.5
58.7
Feedback
Search any
task
Search any
task