Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Ranking on Main Experiment
Loading...
1.37
Mean Rank
WAG
1.3204
1.6552
1.99
2.3248
Apr 10, 2026
Mean Rank
Updated 14d ago
Evaluation Results
Method
Method
Links
Mean Rank
WAG
Evaluator=E1
2026.04
1.37
WAG
Evaluator=LLM
2026.04
1.41
WAG
Evaluator=Human Avg
2026.04
1.47
WAG
Evaluator=E2
2026.04
1.49
WAG
Evaluator=E3
2026.04
1.56
Rag
Evaluator=E3
2026.04
1.8
Rag
Evaluator=E2
2026.04
1.9
Rag
Evaluator=Human Avg
2026.04
1.9
Rag
Evaluator=LLM
2026.04
1.98
Rag
Evaluator=E1
2026.04
2.01
Base
Evaluator=E2
2026.04
2.33
Base
Evaluator=E3
2026.04
2.41
Base
Evaluator=Human Avg
2026.04
2.45
Base
Evaluator=E1
2026.04
2.6
Base
Evaluator=LLM
2026.04
2.61
Feedback
Search any
task
Search any
task