Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Interactive Question Answering on IQA-EVAL MMLU-derived (TextDavinci)
Loading...
4.6
Helpfulness
Human
2
2.675
3.35
4.025
Aug 24, 2024
Helpfulness
Fluency
# Queries
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Helpfulness
Fluency
# Queries
Accuracy
Human
2024.08
4.6
4.35
1.78
69
IQA-EVAL-GPT3.5
Evaluator Backbone=GPT...
2024.08
4.3
4.47
1.57
63
IQA-EVAL-Claude
Evaluator Backbone=Claude
2024.08
4.13
4.47
2.2
67
IQA-EVAL-GPT3.5
Evaluator Backbone=GPT...
2024.08
3.93
3.97
2
53
IQA-EVAL-GPT4
Evaluator Backbone=GPT-4
2024.08
3.67
4.77
1.57
87
Human
2024.08
3.52
3.22
2.66
48
IQA-EVAL-Claude
Evaluator Backbone=Claude
2024.08
3
3.23
2.07
57
IQA-EVAL-GPT4
Evaluator Backbone=GPT-4
2024.08
2.1
3.03
2.37
67
Feedback
Search any
task
Search any
task