Arabic tokenizer used by Azure Translator

Arianna Mercuriali 0

Dear all,

I am writing to ask you about the type of tokenizer used for processing the Arabic language when translating and testing MT models via Azure Custom Translator Console.

I trained an English-to-Arabic MT model via Azure Custom Translator and got a BLUE score of 58.82, then I downloaded the test set to do some further testing with other MT engines. When I got all the results, I recalculated the automatic scores through Pyton and the BLUE score for the Microsoft custom engine I had trained via Azure turned out to be way lower (50 compared to 58.8). That got me thinking about the tokenizer used to process a non-Latin language like Arabic. Could this difference in scores be due to the usage of a different tokenizer? What kind of tokenizer are you using for Arabic? Are there different tokenizers for non-Latin languages? I am also testing Hebrew, Hindi, Russian and Ukrainian, and I am therefore wondering whether I shall use a different and maybe customized tokenizer for these languages.

Thank you!

Arianna

romungi-MSFT 45,961 Reputation points Microsoft Employee

2023-10-12T08:49:12.2133333+00:00

@Arianna Mercuriali As per my knowledge the AI services offering of Translator do not provide the details of tokenizer or inner workings of a model in public domain. If you are interested to share your feedback about the difference in BLEU score to the product team, I would suggest sharing the findings through a support case from Azure portal with details of the dataset and the process you used to score the same dataset through python. Thanks!!
Arianna Mercuriali 0 Reputation points

2023-10-12T12:47:27.7+00:00

Hi, thanks a lot for your reply!!

Do you mean there's no way to get to know which tokenizer was used for MT training and testing?

Through which portal shall I safely share the findings?

Arianna
Arianna Mercuriali 0 Reputation points

2023-10-12T13:01:33.01+00:00

Hi again!

I'm sorry if this is a little off-topic, but I would also need to know if there's a specif portal/helpdesk where to ask for technical assistance. I have reported a few issues with training a model, and I would like to report them to one of your specialists. How should I proceed? Thank you!

Arianna
romungi-MSFT 45,961 Reputation points Microsoft Employee

2023-10-12T16:36:50.61+00:00

@Arianna Mercuriali To raise a support case, you need to login to azure portal and navigate to Help + Support portal and raise a support case from this page. You subscription needs to have a support plan to raise a case from this page. Thanks!!

Share via

Arabic tokenizer used by Azure Translator

Your answer