Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

About

Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. In comparison to the models that are finetuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.

Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge• 2023

Related benchmarks

TaskDatasetResultRank
Science Question AnsweringScienceQA (test)
Average Accuracy86.11
208
Multimodal Science Question AnsweringScienceQA v1.0 (test)
Accuracy (Natural Language Component)89.3
31
Figure CaptioningSciCap SciTune info
BLEU6.4
2
Showing 3 of 3 rows

Other info

Follow for update