Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Physics-Based Benchmarking Metrics for Multimodal Synthetic Images

About

Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constrained Multimodal Data Evaluation (PCMDE) metric combining large language models with reasoning, knowledge based mapping and vision-language models to overcome these limitations. The architecture is comprised of three main stages: (1) feature extraction of spatial and semantic information with multimodal features through object detection and VLMs; (2) Confidence-Weighted Component Fusion for adaptive component-level validation; and (3) physics-guided reasoning using large language models for structural and relational constraints (e.g., alignment, position, consistency) enforcement.

Kishor Datta Gupta, Marufa Kamal, Md. Mahfuzur Rahman, Fahad Rahman, Mohd Ariful Haque, Sunzida Siddique• 2025

Related benchmarks

TaskDatasetResultRank
Image-Text Alignment ScoringCar synthetic 70 images (test)
Mean Score79
5
Image-Text Alignment ScoringAnimal synthetic images 70 images (test)
Mean Score81.6
5
Image-Text Scoring Variability AnalysisCar image dataset 70 synthetic (test)
Range45.6
5
Image-Text Alignment ScoringAircraft synthetic images 70 images (test)
Mean Alignment Score74.8
5
Image-Text Scoring Variability AnalysisAircraft 70 synthetic images (test)
Range42.5
5
Image-Text Scoring Variability AnalysisAnimal image dataset 70 synthetic images (test)
Range42
5
Physical Plausibility AssessmentPCMDE Human Agreement Aircraft (case-study)
PCMDE Human Matches Rate67
1
Physical Plausibility AssessmentPCMDE Human Agreement Car (case-study)
PCMDE Matches Accuracy66
1
Physical Plausibility AssessmentPCMDE Human Agreement (Animal case-study)
PCMDE–Human Matches Rate67
1
Physical Plausibility AssessmentPCMDE Human Agreement (Overall Case-study)
PCMDE Human Matches Count200
1
Showing 10 of 10 rows

Other info

Follow for update