LEGAL-BERT: The Muppets straight out of Law School

About

BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos• 2020

Related benchmarks

Task	Dataset	Result
Clause Classification	Illegal Clauses	Macro F177	63
Clause Classification	Dark Clauses	Macro F175	23
Document Classification	EURLEX	Macro F124.4	21
Clause Classification	Gray Clauses	Macro F167	20
Legal Case Retrieval	COLIEE 2023	P@54.64	19
Legal Case Retrieval	COLIEE Top-5 2022	P@54.47	19
Case holding classification	CaseHOLD (test)	Mean macro F176.1	12
Deontic Classification	REGOBLIGATION (test)	F1 Score84.6	12
Gap Detection	GAPBENCH (test)	F1 Score71.3	12
Named Entity Recognition	REGOBLIGATION (test)	F1 Score82.1	12

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord