Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Semantic Parsing on GeoQuery
Loading...
77.9
Template Accuracy
GRPO-Composite
49.2584
56.6942
64.13
71.5658
May 6, 2026
Template Accuracy
Execution Accuracy
Length Accuracy
Updated 27d ago
Evaluation Results
Method
Method
Links
Template Accuracy
Execution Accuracy
Length Accuracy
GRPO-Composite
Base Model=Qwen-2.5-7B...
2026.05
77.9
-
33.76
GRPO-Binary
Base Model=Qwen-2.5-7B...
2026.05
75.27
-
32.14
SFT
Base Model=Qwen-2.5-7B...
2026.05
73.65
-
28.21
GRPO-Binary
Base Model=Llama-3.1-8...
2026.05
52.48
-
45.92
GRPO-Composite
Base Model=Llama-3.1-8...
2026.05
52.07
-
46.17
SFT
Base Model=Llama-3.1-8...
2026.05
50.36
-
43.21
Previous Work
# ICL examples=32, # r...
2023.05
-
86.1
-
Standard Prompting
# ICL examples=32, # r...
2023.05
-
96.8
-
Grammar Prompting
# ICL examples=32, # r...
2023.05
-
97.9
-
Grammar Prompting (w. oracle grammar)
# ICL examples=32, # r...
2023.05
-
98.6
-
Feedback
Search any
task
Search any
task