Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Job Skill Extraction via LLM-Centric Multi-Module Framework

About

Span-level skill extraction from job advertisements underpins candidate-job matching and labor-market analytics, yet generative large language models (LLMs) often yield malformed spans, boundary drift, and hallucinations, especially with long-tail terms and cross-domain shift. We present SRICL, an LLM-centric framework that combines semantic retrieval (SR), in-context learning (ICL), and supervised fine-tuning (SFT) with a deterministic verifier. SR pulls in-domain annotated sentences and definitions from ESCO to form format-constrained prompts that stabilize boundaries and handle coordination. SFT aligns output behavior, while the verifier enforces pairing, non-overlap, and BIO legality with minimal retries. On six public span-labeled corpora of job-ad sentences across sectors and languages, SRICL achieves substantial STRICT-F1 improvements over GPT-3.5 prompting baselines and sharply reduces invalid tags and hallucinated spans, enabling dependable sentence-level deployment in low-resource, multi-domain settings.

Guojing Li, Zichuan Fu, Junyi Li, Faxue Liu, Wenxia Zhou, Yejing Wang, Jingtong Gao, Maolin Wang, Rungen Liu, Wenlin Zhang, Xiangyu Zhao (1) __INSTITUTION_11__ City University of Hong Kong, (2) Renmin University of China)• 2026

Related benchmarks

TaskDatasetResultRank
Job-skill ExtractionSKILLSPAN
Precision75.22
15
Job-skill ExtractionKOMPETENCER
Precision63.51
15
Job-skill ExtractionSAYFULLINA
Precision78.66
15
Job-skill ExtractionFIJO
Precision73.13
15
Job-skill ExtractionGREEN
Precision63.84
15
Job-skill ExtractionGNEHM
Precision50.93
15
Showing 6 of 6 rows

Other info

Follow for update