Effective use of trained machine-translation systems requires a serious preliminary "cleaning" of the training corpora. These bilingual text arrays shouldn’t contain untranslated segments, incorrect translations, extraneous characters, hypertext markup, etc. It’s preferable that the corpus contains uniform terminology, but there’s a big problem with that.

The accumulated arrays of translation memories are most often used as the source of training corpora. Due to gradual changes in terminology, the so-called translation accumulator gradually loses its uniformity. For example, it may contain outdated and updated translations of the same terms. The translator's main tool — the CAT system — lets you connect the current glossary and verify the terminology in the hints from the legacy translation memory. However, automation is essential as the means to check and unify terminology in a huge corpus. Accordingly, we’ve developed a solution based on our Rigora Studio software tool (http://www.rigora.studio/). It lets you perform a variety of quality checks on translations, including large arrays of translation memories.

Usually, in Rigora, after automatically checking the translations against the glossary, you need to go through the log of potential errors in the translation of terms and manually correct the real errors. We’ve extended this functionality with the ability to export all segments, excluding segments with potential errors in terms without manually checking each error log record. So, when checking the translation memory against the glossary, the size of the cleared memory will decrease somewhat. However, the consistency of terminology will be ensured quickly and at low cost.

Rigora already has approximately 650 different technical and linguistic checks, many of which have options for user customization. It’s even possible to program and combine the rules on their own, creating the checks the user needs. This flexible mechanism lets you use the power of regular expressions and implement almost any type of text cleanup.

Translation-memory cleanup in the Rigora program is part of a comprehensive enterprise translation automation solution based on neural machine-translation systems and other modern technologies brought to you by the Logrus Global company. For more information, see: https://cloud.logrusglobal.com/.

22 March 2021

