David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?
About
Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always better for hardware design? Our work tests this by evaluating Small Language Models coupled with a curated agentic AI framework on NVIDIA's Comprehensive Verilog Design Problems(CVDP) benchmark. Results show that agentic workflows: through task decomposition, iterative feedback, and correction - not only unlock near-LLM performance at a fraction of the cost but also create learning opportunities for agents, paving the way for efficient, adaptive solutions in complex design tasks.
Shashwat Shankar, Subhranshu Pandey, Innocent Dengkhw Mochahari, Bhabesh Mali, Animesh Basak Chowdhury, Sukanta Bhattacharjee, Chandan Karfa• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| cid004: RTL – Code Modification | CVDP non-agentic 1.0 | Pass@120 | 13 | |
| cid002: RTL – Code Completion | CVDP non-agentic 1.0 | Pass@124.47 | 10 | |
| cid003: RTL – Natural Language Specification to Code | CVDP non-agentic 1.0 | Pass@130.77 | 10 | |
| cid007: RTL – Code Improvement (Linting / QoR) | CVDP non-agentic 1.0 | Pass@151.25 | 10 | |
| cid016: Design Verification – Debugging / Bug Fixing | CVDP non-agentic 1.0 | Pass@122.86 | 10 | |
| Code comprehension | CVDP (Code Verilog Design Problems) (test) | Total Passes61 | 8 |
Showing 6 of 6 rows