Physics-Based Benchmarking Metrics for Multimodal Synthetic Images
About
Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constrained Multimodal Data Evaluation (PCMDE) metric combining large language models with reasoning, knowledge based mapping and vision-language models to overcome these limitations. The architecture is comprised of three main stages: (1) feature extraction of spatial and semantic information with multimodal features through object detection and VLMs; (2) Confidence-Weighted Component Fusion for adaptive component-level validation; and (3) physics-guided reasoning using large language models for structural and relational constraints (e.g., alignment, position, consistency) enforcement.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-Text Alignment Scoring | Car synthetic 70 images (test) | Mean Score79 | 5 | |
| Image-Text Alignment Scoring | Animal synthetic images 70 images (test) | Mean Score81.6 | 5 | |
| Image-Text Scoring Variability Analysis | Car image dataset 70 synthetic (test) | Range45.6 | 5 | |
| Image-Text Alignment Scoring | Aircraft synthetic images 70 images (test) | Mean Alignment Score74.8 | 5 | |
| Image-Text Scoring Variability Analysis | Aircraft 70 synthetic images (test) | Range42.5 | 5 | |
| Image-Text Scoring Variability Analysis | Animal image dataset 70 synthetic images (test) | Range42 | 5 | |
| Physical Plausibility Assessment | PCMDE Human Agreement Aircraft (case-study) | PCMDE Human Matches Rate67 | 1 | |
| Physical Plausibility Assessment | PCMDE Human Agreement Car (case-study) | PCMDE Matches Accuracy66 | 1 | |
| Physical Plausibility Assessment | PCMDE Human Agreement (Animal case-study) | PCMDE–Human Matches Rate67 | 1 | |
| Physical Plausibility Assessment | PCMDE Human Agreement (Overall Case-study) | PCMDE Human Matches Count200 | 1 |