Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TaoSR1: The Thinking Model for E-commerce Relevance Search

About

Query-product relevance prediction is a core task in e-commerce search. BERT-based models excel at semantic matching but lack complex reasoning capabilities. While Large Language Models (LLMs) are explored, most still use discriminative fine-tuning or distill to smaller models for deployment. We propose a framework to directly deploy LLMs for this task, addressing key challenges: Chain-of-Thought (CoT) error accumulation, discriminative hallucination, and deployment feasibility. Our framework, TaoSR1, involves three stages: (1) Supervised Fine-Tuning (SFT) with CoT to instill reasoning; (2) Offline sampling with a pass@N strategy and Direct Preference Optimization (DPO) to improve generation quality; and (3) Difficulty-based dynamic sampling with Group Relative Policy Optimization (GRPO) to mitigate discriminative hallucination. Additionally, post-CoT processing and a cumulative probability-based partitioning method enable efficient online deployment. TaoSR1 significantly outperforms baselines on offline datasets and achieves substantial gains in online side-by-side human evaluations, introducing a novel paradigm for applying CoT reasoning to relevance classification.

Chenhe Dong, Shaowei Yao, Pengkun Jiao, Jianhui Yang, Yiming Jin, Zerui Huang, Xiaojiang Zhou, Dan Ou, Haihong Tang, Bo Zheng• 2025

Related benchmarks

TaskDatasetResultRank
E-commerce Relevance ClassificationTaobao dataset (offline)
F1 Score (Class 1)67.34
10
Search RelevanceManual Annotation Queries Q&A 2,000 queries
GSB16.62
1
Search RelevanceManual Annotation Queries Alternative 2,000 queries
GSB Score34.43
1
Search RelevanceManual Annotation Queries Negative 2,000 queries
GSB Score10.92
1
Search RelevanceManual Annotation Queries Knowledge 2,000 queries
GSB18.45
1
Showing 5 of 5 rows

Other info

Follow for update