LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model

About

In recent years, the rapid proliferation of open-source large language models (LLMs) has spurred efforts to turn general-purpose models into domain specialists. However, many domain-specialized LLMs are developed using datasets and training protocols that are not aligned with the nuanced requirements of real-world applications. In the legal domain, where precision and reliability are essential, this lack of consideration limits practical utility. In this study, we propose a systematic training framework grounded in the practical needs of the legal domain, with a focus on Korean law. We introduce LegalMidm, a Korean legal-domain LLM, and present a methodology for constructing high-quality, use-case-driven legal datasets and optimized training pipelines. Our approach emphasizes collaboration with legal professionals and rigorous data curation to ensure relevance and factual accuracy, and demonstrates effectiveness in key legal tasks.

Youngjoon Jang, Chanhee Park, Hyeonseok Moon, Young-kyoung Ham, Jiwon Moon, Jinhyeon Kim, JuKyung Jung, Heuiseok Lim• 2026

Related benchmarks

Task	Dataset	Result
General Knowledge Evaluation	HAERAE	Accuracy70.3	13
Legal Machine Reading Comprehension	Legal Task - MRC (test)	Rouge-L57.5	5
Legal Multiple Choice Question Answering	Legal Task MC (test)	Accuracy65	5
Legal Question Answering	Legal Task QA (test)	ROUGE-L17.74	5
Legal Summarization	Legal Task Summary (test)	ROUGE-L47.94	5
Legal text generation	Legal Task Complaint (test)	ROUGE-L67.67	5
Legal text generation	Legal Task Petition (test)	Rouge-L14.46	5
General Knowledge Evaluation	KMMLU	Accuracy44.75	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord