Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

About

Aspect sentiment quad prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores almost up to par with those obtained with state-of-the-art fine-tuned models and exceeding previously reported zero- and few-shot performance. In the 20-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 51.54, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were close to fine-tuned models, achieving 68.93 on Rest16 in the 30-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.

Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian Wolff• 2025

Related benchmarks

TaskDatasetResultRank
Target Aspect Sentiment DetectionRest 2016
F1 Score68.53
31
Aspect-Sentiment-Query-Pair ExtractionRest15
F1 Score41.74
21
Aspect-Sentiment-Query-Pair ExtractionRest 16
F1 Score51.1
21
Target Aspect Sentiment DetectionRest15
F1 Score62.12
21
Target Aspect Sentiment DetectionFlightABSA
F1 Score64.6
9
Aspect-Sentiment-Query-Pair ExtractionFlightABSA
F1 Score48.37
9
Target Aspect Sentiment DetectionCoursera
F1 Score41.69
6
Target Aspect Sentiment DetectionHotels
F1 Score56.51
6
Showing 8 of 8 rows

Other info

Code

Follow for update