Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MathWriting: A Dataset For Handwritten Mathematical Expression Recognition

About

Recognition of handwritten mathematical expressions allows to transfer scientific notes into their digital form. It facilitates the sharing, searching, and preservation of scientific information. We introduce MathWriting, the largest online handwritten mathematical expression dataset to date. It consists of 230k human-written samples and an additional 400k synthetic ones}. This dataset can also be used in its rendered form for offline HME recognition. One MathWriting sample consists of a formula written on a touch screen and a corresponding LaTeX expression. We also provide a normalized version of LaTeX expression to simplify the recognition task and enhance the result quality. We provide baseline performance of standard models like OCR and CTC Transformer as well as Vision-Language Models like PaLI on the dataset. The dataset together with an example colab is accessible on Github.

Philippe Gervais, Anastasiia Fadeeva, Andrii Maksai• 2024

Related benchmarks

TaskDatasetResultRank
Handwritten Mathematical Expression RecognitionCROHME 2014--
47
Handwritten Mathematical Expression RecognitionCROHME 2016
Expression Rate58.79
40
Handwritten Mathematical Expression RecognitionCROHME 2019
ExpRate60.51
39
Handwritten Mathematical Expression RecognitionCROHME 2023 (test)
Expression Rate58.3
11
Mathematical Expression RecognitionMathWriting 1.0 (test)
CER5.49
9
Mathematical Expression RecognitionMathWriting 1.0 (val)
CER4.52
9
Showing 6 of 6 rows

Other info

Follow for update