- HOME
- DATAEST
- EVALUATION
- CHALLENGE TASKS
- SCHEDULE
- ORGANIZERS

## COMPETITION UPDATES

Jan 17, 2023 The CodaLab site for submission is ready at this link. Please register with the

**submission account**in your registration email.Feb 1, 2023 Test_A set images available in Google Drive, submission open.

Mar 10, 2023 Test_B set images available in Google Drive, submission open.

## CODE SUBMISSION

The participants who enter the Test_B set need to submit the following items.

- Inference code
- README file
- Checkpoint for the
**Test_B**set - Requirement.txt indicating the environment.

**file link**and your**team name**to M202172425@hust.edu.cn, before**23:59, March 13, 2023 (AoE).**## FREQUENTLY ASKED QUESTIONS

### Pretrain Models

Using public or private pre-trained OCR models is prohibited. Using public or private pre-trained NLP models such as BERT is also prohibited. Only using pre-trained models on ImageNet is acceptable, such as Resnet-50 pre-trained on ImageNet.### Code Submission

To ensure the fairness of the competition, participants who enter the Test_B set need to submit the inference code, checkpoint, and README file for reproduction. We will add "code submission" section on the official website in the future.### Synthetic Data

Using synthetic data build from our MLHME-38K is prohibited, no matter it is pre-training, fine-tuning or any other form of usage.### Usage of External Corpus or Textual data

Use of data other than the provided dataset is prohibited.### Deadline for Registration

March 1, 2023 (AoE). Team information can be updated after registration finished, but it needs to be determined before March 9, 2023 (AoE).## INTRODUCTION

Mathematical expressions play an important role in scientific documents and are indispensable for describing problems and theories in math, physics and many other fields. Thus, automatically recognizing handwritten mathematical expressions in images has been receiving increasing attention. Existing datasets (CROHME, HME100K) only focus on single-line mathematical expression. The multi-line mathematical expressions also often appear in our daily life and are import in the field of handwritten mathematical expression recognition. Moreover, the structure of multi-line mathematical expressions is more complicated which makes this task more challenging. We hope that the dataset and task could greatly promote the research in handwritten mathematical expression recognition.

Figure 1. Some multi-line mathematical expressions from our MLHME-38K dataset.

## DESCRIPTION OF THE PROBLEM

### Task: Recognition of Multi-line Handwritten Mathematical Expressions

Mathematical expressions play an important role in scientific documents and are indispensable for describing problems and theories in math, physics and many other fields. Thus, automatically recognizing handwritten mathematical expressions in images has been receiving increasing attention. The aim of this task is to recognize the handwritten mathematical expressions in images and output in latex format. The registration and submission information can be foundhere.

## DATASET

We present a large-scale handwritten mathematical expression recognition dataset named MLHME-38K, as it focuses on Multi-Line Handwritten Mathematical Expressions. It totally includes 38,000 labeled images, in which 9,931 images are multi-line mathematical expressions and 28,069 images are single-line. All these images were uploaded by users from real world scenarios. Consequently, our dataset MLHME-38K becomes more authentic and realistic with variations in color, blur, complicated background, twist, illumination, longer length, and complicated structure.

Every image in the dataset is annotated with a string of LaTeX sequence denoting the mathematical expression. Annotations for images are stored in a txt file with the following format:

filename | LaTeX string |

train_0.jpg | \begin{matrix} 9 x - y = a + 3 \\ 2 x + y = 5 a \end{matrix} |

train_1.jpg | \left\{ \begin{matrix} \\ x ^ { 2 } - 5 x y + 6 y ^ { 2 } = 0 \end{matrix} \right. |

## EVALUATION

The participants are required to output the latex format of each mathematical expression for each image. The submitted LaTeX strings will be normalized before compared with the grounding truth. Since there are numerous ways to express one mathematical expressions in LaTeX format. The following indicators are utilized to evaluate the performance.

**The ranking results for both test_A and test_B set on the leaderboard will be given according to the expression recall. After the submission of test_B set due, the final ranking will be given based on the combination of expression recall and character recall.**1.Expression recall: The percentage of predicted LaTeX formula sequences matching ground truth (ignore space).

$S_{\text {recall }}=\frac{S_{\text {right }}}{S_{\text {sum }}}$

2.Character recall:$C_{\text {diff }}$ is the sum of edit distances for all images and $C_{\text {sum }}$ is the number of characters for all labels.$C_{\text {recall }}=1-\frac{C_{\text {diff }}}{C_{\text {sum }}}$

3.The combination of the **expression recall (higher priority)**and character recall:$\text { Better }=\left\{\begin{array}{c}S_{r 1}, \text { if } S_{r 1} \geq S_{r 2}+0.1 \% \\S_{r 1}, \text { if }\left|S_{r 1}-S_{r 2}\right|<0.1 \% \text { and } 0.9\left(S_{r 1}-S_{r 2}\right)+0.1\left(C_{r 1}-C_{r 2}\right)>0 \\S_{r 1}, \text { if } S_{r 1}>S_{r 2} \text { and } 0.9\left(S_{r 1}-S_{r 2}\right)+0.1\left(C_{r 1}-C_{r 2}\right)=0 \\S_{r 2}, \text { otherwise }\end{array}\right.$

Where $S_{r 1}$ and $S_{r 2}$ denotes the expression recall and the character recall from one team.While $C_{r 1}$ and $C_{r 1}$ denotes the expression recall and the character recall from another team.

## SCHEDULE

Milestone | Date (AoE) |

Registration open, training images and labels available | January 10, 2023 |

Test_A set images available, submission open | February 1, 2023 |

The submission of test_A set close | March 9, 2023 |

Test_B set images available, submission open | 00:00, March 10, 2023 |

The submission of test_B set due | 23:59, March 10, 2023 |

Announce results online | March 16, 2023 |

Competition paper due | March 26, 2023 |

## CONTACT INFORMATION FOR THE ORGANIZERS

**Chenyang Gao**Huazhong University of Science and Technology, M202172425@hust.edu.cn

**Yuliang Liu**Huazhong University of Science and Technology, ylliu@hust.edu.cn

**Shiyu Yao**Tomorrow Advancing Life Education Group (TAL), yaoshiyu@tal.com

**Jinfeng Bai**Tomorrow Advancing Life Education Group (TAL), jfbai.bit@gmail.com

**Xiang Bai**Huazhong University of Science and Technology, xbai@hust.edu.cn

**Lianwen Jin**South China University of Technology, eelwjin@scut.edu.cn

**Cheng-Lin Liu**Institute of Automation, Chinese Academy of Sciences, liucl@nlpr.ia.ac.cn