Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

About

Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the reports is tedious and costly. We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1.1M sentences with gold XBRL tags. Unlike typical entity extraction datasets, FiNER-139 uses a much larger label set of 139 entity types. Most annotated tokens are numeric, with the correct tag per token depending mostly on context, rather than the token itself. We show that subword fragmentation of numeric expressions harms BERT's performance, allowing word-level BILSTMs to perform better. To improve BERT's performance, we propose two simple and effective solutions that replace numeric expressions with pseudo-tokens reflecting original token shapes and numeric magnitudes. We also experiment with FIN-BERT, an existing BERT model for the financial domain, and release our own BERT (SEC-BERT), pre-trained on financial filings, which performs best. Through data and error analysis, we finally identify possible limitations to inspire future work on XBRL tagging.

Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras• 2022

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionNER--
40
Numerical Question AnsweringFinQA (test)
Execution Accuracy66.02
33
Sentiment AnalysisFOMC--
26
Financial ReasoningFinQA--
19
XBRL taggingFiNER-139 1.0 (dev)
μ-Precision84.8
10
XBRL taggingFiNER-139 1.0 (test)
Micro Precision81
10
Financial Entity RecognitionFiNER
F1 Score82.35
9
Question AnsweringFinQA
Prog Acc53.18
9
ClassificationHeadline
F1 Score90.52
9
Sentiment AnalysisFinancial PhraseBank (FPB)
Accuracy84.37
9
Showing 10 of 11 rows

Other info

Code

Follow for update