Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework

About

Large language models often generate homogeneous outputs, but whether this is problematic depends on the specific task. For objective math tasks, responses may vary in terms of problem-solving strategy but should maintain the same verifiable answer. Whereas, for creative writing tasks, we often expect variation in key narrative components (e.g. plot, setting, etc.) beyond mere vocabulary diversity. Prior work on homogenization rarely conceptualizes diversity in a task-dependent way. We address this gap with four contributions: (1) a task taxonomy with distinct notions of functional diversity -- whether a user would perceive two responses as meaningfully different for a given task; (2) a small user study validating that the taxonomy aligns with human perception of functional diversity; (3) a task-dependent sampling technique that increases diversity only where homogenization is undesired; (4) evidence challenging the perceived diversity-quality trade-off, showing it may stem from mis-conceptualizing both diversity and quality in a task-agnostic way.

Shomik Jain, Jack Lanchantin, Maximilian Nickel, Candace Ross, Karen Ullrich, Ashia Wilson, Jamelle Watson-Daniels• 2025

Related benchmarks

TaskDatasetResultRank
Functionally Diverse Response GenerationTask Category A
Functionally Diverse Responses Count3.83
35
Quality AssessmentChecklist
Metric A3.76
35
Functionally Diverse Response GenerationFunctional Diversity Prompt Set Categories A-H 1.0 (test)
Score Category A4.26
21
Correlation with Human Majority VoteNovelty-Bench Human Majority Vote
Spearman Correlation (All)0.88
7
Functional Diversity EvaluationFunctional Diversity Taxonomy Study All Categories
Spearman Correlation0.94
7
Functional Diversity EvaluationFunctional Diversity Taxonomy Study Category A
Spearman Correlation1
7
Functional Diversity EvaluationFunctional Diversity Taxonomy Study Category B
Spearman Correlation1
7
Functional Diversity EvaluationFunctional Diversity Taxonomy Study Category C
Spearman Correlation1
7
Functional Diversity EvaluationFunctional Diversity Taxonomy Study Category E
Spearman Correlation1
7
Functional Diversity EvaluationFunctional Diversity Taxonomy Study Category F
Spearman Correlation0.87
7
Showing 10 of 19 rows

Other info

Follow for update