An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs

About

Word Sense Disambiguation (WSD) remains a key challenge in Natural Language Processing (NLP), especially when dealing with rare or domain-specific senses that are often misinterpreted. While modern high-parameter Large Language Models (LLMs) such as GPT-4-Turbo have shown state-of-the-art WSD performance, their computational and energy demands limit scalability. This study investigates whether low-parameter LLMs (<4B parameters) can achieve comparable results through fine-tuning strategies that emphasize reasoning-driven sense identification. Using the FEWS dataset augmented with semi-automated, rationale-rich annotations, we fine-tune eight small-scale open-source LLMs (e.g. Gemma and Qwen). Our results reveal that Chain-of-Thought (CoT)-based reasoning combined with neighbour-word analysis achieves performance comparable to GPT-4-Turbo in zero-shot settings. Importantly, Gemma-3-4B and Qwen-3-4B models consistently outperform all medium-parameter baselines and state-of-the-art models on FEWS, with robust generalization to unseen senses. Furthermore, evaluation on the unseen "Fool Me If You Can'' dataset confirms strong cross-domain adaptability without task-specific fine-tuning. This work demonstrates that with carefully crafted reasoning-centric fine-tuning, low-parameter LLMs can deliver accurate WSD while substantially reducing computational and energy demands.

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough• 2026

Related benchmarks

Task	Dataset	Result
Word Sense Disambiguation	42D	F1 Score78.48	19
Word Sense Disambiguation	hardEN	F1 Score54.19	19
Word Sense Disambiguation	FEWS (test)	F1 Score76.52	19
Word Sense Disambiguation	FEWS	Noun WSD Accuracy81	12
Binary classification of sense ID	Fool me if you can (Set 4)	F1 Score85.2	10
Binary classification of sense ID	Fool me if you can (Set 1)	F1 Score97	10
Binary classification of sense ID	Fool me if you can (Set 2)	F1 Score97.2	10
Binary classification of sense ID	Fool me if you can (Set 3)	F1 Score84.7	10

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord