Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

About

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

Hugo Lauren\c{c}on, L\'eo Tronchon, Victor Sanh• 2024

Related benchmarks

TaskDatasetResultRank
Screenshot-to-codeDesign2Code
Block-Match55.9
20
Widget ReconstructionWidget2Code (test)
Margin Score32.99
13
Design-to-code generationDesign2Code
SSIM75.1
7
UI-to-CodeDesign2Code (test)
CLIP Similarity0.812
6
Showing 4 of 4 rows

Other info

Follow for update