Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Functional Diversity Evaluation on Functional Diversity Taxonomy Study Category F
Loading...
0.87
Spearman Correlation
Gemini-2.5-Flash
0.6412
0.7006
0.76
0.8194
Sep 25, 2025
Spearman Correlation
Inter-Rater Agreement
Updated 1mo ago
Evaluation Results
Method
Method
Links
Spearman Correlation
Inter-Rater Agreement
Gemini-2.5-Flash
Taxonomy-guided=true
2025.09
0.87
-
Claude-4-Sonnet
Taxonomy-guided=true
2025.09
0.87
-
Embedding Diversity
2025.09
0.79
-
GPT-4o
Taxonomy-guided=true
2025.09
0.76
-
Compression Diversity
2025.09
0.72
-
Vocabulary Diversity
2025.09
0.69
-
Novelty-Bench Functional Diversity
2025.09
0.65
-
Human Inter-Rater Agreement
2025.09
-
0.85
Feedback
Search any
task
Search any
task