Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction
About
Aspect sentiment quad prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores almost up to par with those obtained with state-of-the-art fine-tuned models and exceeding previously reported zero- and few-shot performance. In the 20-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 51.54, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were close to fine-tuned models, achieving 68.93 on Rest16 in the 30-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Aspect Sentiment Quad Prediction | Rest15 | F1 Score52.07 | 93 | |
| Aspect Sentiment Quad Prediction | Rest16 | F1 Score58.79 | 93 | |
| Target Aspect Sentiment Detection | Rest15 | F1 Score65.94 | 63 | |
| Target Aspect Sentiment Detection | Rest16 | F1 Score72.15 | 42 | |
| Target Aspect Sentiment Detection | FlightABSA | F1 Score68.11 | 32 | |
| Target Aspect Sentiment Detection | Rest 2016 | F1 Score68.53 | 31 | |
| Target Aspect Sentiment Detection | Coursera | F1 Score46.83 | 29 | |
| Target Aspect Sentiment Detection | Hotels | F1 Score66.92 | 29 | |
| Aspect Sentiment Quad Prediction | FlightABSA | F1 Score56.9 | 23 | |
| Aspect Sentiment Quad Prediction | Coursera | F1 Score32.02 | 23 |