David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?

About

Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always better for hardware design? Our work tests this by evaluating Small Language Models coupled with a curated agentic AI framework on NVIDIA's Comprehensive Verilog Design Problems(CVDP) benchmark. Results show that agentic workflows: through task decomposition, iterative feedback, and correction - not only unlock near-LLM performance at a fraction of the cost but also create learning opportunities for agents, paving the way for efficient, adaptive solutions in complex design tasks.

Shashwat Shankar, Subhranshu Pandey, Innocent Dengkhw Mochahari, Bhabesh Mali, Animesh Basak Chowdhury, Sukanta Bhattacharjee, Chandan Karfa• 2025

Related benchmarks

Task	Dataset	Result
cid004: RTL – Code Modification	CVDP non-agentic 1.0	Pass@120	13
cid002: RTL – Code Completion	CVDP non-agentic 1.0	Pass@124.47	10
cid003: RTL – Natural Language Specification to Code	CVDP non-agentic 1.0	Pass@130.77	10
cid007: RTL – Code Improvement (Linting / QoR)	CVDP non-agentic 1.0	Pass@151.25	10
cid016: Design Verification – Debugging / Bug Fixing	CVDP non-agentic 1.0	Pass@122.86	10
Code comprehension	CVDP (Code Verilog Design Problems) (test)	Total Passes61	8

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord