Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following on NoveltyBench
Loading...
40.1
Lexical Dominance
Aligned
-0.772
9.839
20.45
31.061
Nov 7, 2025
Lexical Dominance
Lexical Coverage
Semantic Coverage
Semantic Dominance
Overall Coverage
Overall Dominance
Updated 22h ago
Evaluation Results
Method
Method
Links
Lexical Dominance
Lexical Coverage
Semantic Coverage
Semantic Dominance
Overall Coverage
Overall Dominance
Aligned
2025.11
40.1
27.3
12.8
17.2
20
28.6
BACO best
selection_criteria=bes...
2025.11
31
49.5
45.2
48.8
47.4
39.9
Base
2025.11
9.8
14.2
14.2
13.1
14.2
11.4
Prompting best
selection_criteria=bes...
2025.11
8
-
-
6.5
-
7.3
Nudging
2025.11
6.8
19.2
16.1
7.6
17.6
7.2
Ensemble best
selection_criteria=bes...
2025.11
3.4
-
-
5.8
-
4.6
Decoding
2025.11
0.8
-
-
1
-
0.9
Feedback
Search any
task
Search any
task