Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation

About

The rise of chronic diseases related to diet, such as obesity and diabetes, emphasizes the need for accurate monitoring of food intake. While AI-driven dietary assessment has made strides in recent years, the ill-posed nature of recovering size (portion) information from monocular images for accurate estimation of ``how much did you eat?'' is a pressing challenge. Some 3D reconstruction methods have achieved impressive geometric reconstruction but fail to recover the crucial real-world scale of the reconstructed object, limiting its usage in precision nutrition. In this paper, we bridge the gap between 3D computer vision and digital health by proposing a method that recovers a true-to-scale 3D reconstructed object from a monocular image. Our approach leverages rich visual features extracted from models trained on large-scale datasets to estimate the scale of the reconstructed object. This learned scale enables us to convert single-view 3D reconstructions into true-to-life, physically meaningful models. Extensive experiments and ablation studies on two publicly available datasets show that our method consistently outperforms existing techniques, achieving nearly a 30% reduction in mean absolute volume-estimation error, showcasing its potential to enhance the domain of precision nutrition. Code: https://gitlab.com/viper-purdue/size-matters

Gautham Vinod, Bruce Coburn, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu• 2026

Related benchmarks

Task	Dataset	Result
Volume Estimation	MetaFood3D	MAE (mL)59.09	29
Volume Estimation	OmniObject3D	MAE (mL)70.49	7
Energy Estimation	MetaFood3D 1.0 (test)	MAE163.7	5
Volume Estimation	MetaFood3D v1.0 (test)	MAE (mL)61.24	5

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord