Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
QNP Abstraction Generation on Delivery Domain (evaluation)
Loading...
100
Coverage Rate
GPT
-4
23
50
77
Feb 11, 2026
Coverage Rate
Updated 3mo ago
Evaluation Results
Method
Method
Links
Coverage Rate
GPT
Automated Debugging=true
2026.02
100
GPT
Automated Debugging=false
2026.02
70
Gemini
Automated Debugging=true
2026.02
50
Gemini
Automated Debugging=false
2026.02
50
DeepSeek
Automated Debugging=true
2026.02
50
DeepSeek
Automated Debugging=false
2026.02
0
Qwen
Automated Debugging=true
2026.02
0
Qwen
Automated Debugging=false
2026.02
0
Feedback
Search any
task
Search any
task