A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages

Description

We contribute a Multidimensional Quality Metric (MQM) dataset for Indian languages created by taking outputs generated by 7 popular MT systems and asking human annotators to judge the quality of the translations using the MQM style guidelines. Using this rich set of annotated data, we show the performance of 16 metrics of various types on evaluating en-xx translations for 5 Indian languages. We provide an updated metric called Indic-COMET which not only shows stronger correlations with human judgement on Indian languages, but is also more robust to perturbations.

Downloads
Resource namelink
DatasetIndicMT-Eval - Sheets
Details

Overview

We contribute a Multidimensional Quality Metric (MQM) dataset for Indian languages created by taking outputs generated by 7 popular MT systems and asking human annotators to judge the quality of the translations using the MQM style guidelines. Using this rich set of annotated data, we show the performance of 16 metrics of various types on evaluating en-xx translations for 5 Indian languages. We provide an updated metric called Indic-COMET which not only shows stronger correlations with human judgement on Indian languages, but is also more robust to perturbations.

Please find more details of this work in our paper (link coming soon).

MQM Dataset

The MQM annotated dataset collected with the help of language experts for the 5 Indian lamguages (Hindi, Tamil, Marathi, Malayalam, Gujarati) can be downloaded from here (link coming soon).

An example of an MQM annotation containing the source, reference and the translated output with error spans as demarcated by the annotator looks like the following: MQM-example

More details regarding the instructions provided and the procedures followed for annotations are present in the paper.

Setup

Load the data

The easiest method to access / view the data is to visit this link More details in data folder

cd data

Citation

If you find IndicMTEval useful in your research or work, please consider citing our paper.

@article{DBLP:journals/corr/abs-2212-10180,
  author       = {Ananya B. Sai and
                  Tanay Dixit and
                  Vignesh Nagarajan and
                  Anoop Kunchukuttan and
                  Pratyush Kumar and
                  Mitesh M. Khapra and
                  Raj Dabre},
  title        = {IndicMT Eval: {A} Dataset to Meta-Evaluate Machine Translation metrics
                  for Indian Languages},
  journal      = {CoRR},
  volume       = {abs/2212.10180},
  year         = {2022}
}

@article{singh2024good,
  title={How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?},
  author={Singh, Anushka and Sai, Ananya B and Dabre, Raj and Puduppully, Ratish and Kunchukuttan, Anoop and Khapra, Mitesh M},
  journal={arXiv preprint arXiv:2406.03893},
  year={2024}
}