VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing
About
Vision Language Models (VLMs) have demonstrated remarkable potential in multimodal reasoning, yet they inherently suffer from spatial blindness and logical hallucinations when interpreting densely structured engineering content, such as analog circuit schematics. To address these challenges, we propose a Vision Language Model-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing (VLM-CAD) designed for robust, step-by-step reasoning over multimodal evidence. VLM-CAD bridges the modality gap by integrating a neuro-symbolic structural parsing module, Image2Net, which transforms raw pixels into explicit topological graphs and structured JSON representations to anchor VLM interpretation in deterministic facts. To ensure the reliability required for engineering decisions, we further propose ExTuRBO, an Explainable Trust Region Bayesian Optimization method. ExTuRBO serves as an explainable grounding engine, employing agent-generated semantic seeds to warm-start local searches and utilizing Automatic Relevance Determination to provide quantified evidence for the VLM's decisions. Experimental results on two complex circuit benchmarks demonstrate that VLM-CAD significantly enhances spatial reasoning accuracy and maintains physics-based explainability. VLM-CAD consistently satisfies complex specification requirements while achieving low power consumption, with a total runtime under 66 minutes, marking a significant step toward robust, explainable multimodal reasoning in specialized technical domains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Amplifier Optimization | Amplifier with a complementary input and a class-AB output stage 180nm PTM node 1.0 | -- | 3 | |
| Amplifier Optimization | Amplifier with a complementary input and a class-AB output stage 90nm PTM node 1.0 | -- | 3 | |
| Circuit Optimization | Two-stage Miller operational amplifier 45nm technology node | -- | 3 |