Machine-learning technology, just as nuclear energy, is to be handled with care; instructions still need to be provided!
The greatest trick AI ever pulled was convincing the world it exists.
MIT Technology Review 1
December 8, 2020, by Serge Gladkoff
Humankind, in the middle of the past century, discovered nuclear power. People were trigger-happy to create a bomb and build nuclear power plants despite the lack of real knowledge and understanding of the consequences, and the nature of it has taught us some very hard lessons. Today, with artificial intelligence, we're in a similar situation. We're using AI before we actually took a deep look at what it is and what the consequences are, whether there's a price for the use of this technology, and what processes should be considered appropriate. So, here we are, trying to take a realistic look at what we're dealing with and how to handle it.
An overview of ML algorithms and their fundamental properties
It is important to have a basic understanding of the ideas that the current state of the art for NMT engines are based on.
It was in 2013 that Tomas Mikolov, a Czech graduate student, invented the word "embedding" algorithm, which represented words with vectors of 512 data points representing the probability of encountering a given word in a very large text corpus. (It became known as the word2vec algorithm.)
Surprisingly, the embeddings (vectors) created by the word2vec algorithm demonstrated certain "semantic" properties of encoded words, as if they carried some information about their "meaning." In a classic example, embedding of the word "man" related to embedding of the word "woman" similar to how the word "uncle" related to embedding of the word "aunt," and more than that, this property was carried over to the results of linear algebra operations of the embeddings, for example word "queen" was close to the matrix difference of the embedding of the word "man" and the word "king."
For practical purposes of translation studies, it’s worth noting that no semantic analysis of the words is done by the embeddings algorithm; it "only" statistically captures the "meaning" as reflected in various usage examples of the training corpus. In a sense, it is a method that allows capturing traces of the meaning through its usage but not the meaning itself. (Think about a way to study traces of the bunny on the snow. You'll probably be able to understand certain behavioral traits but not what a bunny is or even looks like.)
What is actually captured by embeddings? This isn't a philosophical argument because it relates directly to the practicalities of machine-translation applications. For one thing, it captures certain aspects that correlate to "meaning" through word usage in a training corpus.
Modern state-of-the-art ML algorithms capture usage surprisingly well. Other sophisticated techniques have been developed to manipulate with embedding results on subsequent layers of encoder, including the "multi-head attention" algorithm. The idea is to focus on other words in the sentence in order to better encode the word in a specific position of a sentence. For this, eight "attention heads" evaluate every word in an input sentence in relation to the word in question.
These very clever (and other less spectacular) techniques resulted in a miracle of modernity: a state-of-the-art MT producing mostly smooth translation language which fascinates all lovers of technology.
It is, however, very important to note the following key aspects of ML technology:
- It takes an awful lot of content, electricity and computational resources to train models on large volumes of data. The BERT model is trained for several hours, and for the GPT-3 model (with 7.5 billion parameters, five times more than GPT-2) it would take 500 years to train on Nvidia V100. GPT-3 was trained on a supercomputer and the largest corpus that could be found.
- With that in mind, the trained model is a fixed collection of word-usage traces taken from a specific corpus. Given the nature of language, it more capably handles the most commonly used words, but it's worse with rare words and their meanings, including terminology. 2
- The MT algorithm runs sentences through a pre-trained model to get translated sentences, which nowadays looks smooth in many cases, provided that the source sentence is correct.
All this is good for a miracle of getting mostly grammatically correct sentences. The temptation to use it is impossible to resist, but what is happening in real world?
Getting down to the practicalities of deployment
The following are hard truths about the current state of MT:
- MT analyzes source text for translation only one sentence at a time. In doing so, it does NOT even look at previous or subsequent sentences, not to mention the broader context of paragraph, document and subject-matter area.
- MT works only with the linguistic form of a data as opposed to the meaning, but the correct linguistic form doesn't guarantee meaning transfer. Statistically, the translated sentence is correct much less often than it is linguistically correct. Thus, the post-editor has to verify the meaning transfer of the entire MT output, reviewing all MT proposals. "It can't reason. It doesn't understand the language it generates . . . ." 3 Even the largest, most advanced model doesn't have a clue: "GPT-3 is unreliable and susceptible to basic errors that an average human would never commit."
- Due to computational limitations, the current technology doesn't promise significant improvement in quality without the need for completely new approaches, which are likely to hit the computational wall. (GPT-3 is already trained on supercomputer with all linguistic data they could find.)
- NMT provides the statistically most frequent usage, and based on the nature of the algorithm, the MT output is always literal. (This is hard truth for proponents of translation that's "good enough.")
The practicalities of the current NMT miracle mean that:
- NMT output reads better than the work of a very poor translator, which often fools people. They read NMT output and say, "Oh, how exciting! This is very good. I can understand it!" It's true, of course. What is less obvious, however, is the fact that the nice language mechanics of the MT output cover severe errors in meaning transfer, and those errors are more difficult to find than they would otherwise be.
- NMT can’t follow terminology. On the basis of design, it's impossible. The Transformer 4 algorithm is a black box, so it can't be tweaked in applications. The next two NMT sentences will use a different synonym for exactly the same term. The entire NMT output must be verified for terminology.
- Terminology still must be maintained separately because the importance of terminology doesn't correlate with the frequency of usage. (This is why all terminology extractors based on word frequencies are so bad.) A very important term may be present in the corpus a few times, but it's likely to be absent from the public training corpus.
- It isn't possible to customize the pre-trained model very deeply. An NMT model that's trained for a specific purpose can't be "over-trained" through retraining that tries to push it too far. The quality of MT output will quickly drop to the level of garbage.
- All NMT output is literal. It is, by definition of the algorithm, a word-for-word translation. Unfortunately, in the technical languages of different cultures, expressions often don't translate literally from one into another. Often such literal "translations" are false friends of the translator, be it a human or machine. Most such cases can be untangled only by analyzing the real meaning of what the source says as opposed to the form of the word used in a corpus, in which case literal translation must not be used.
Many clients have embraced the NMT hype. They attempt to use NMT as a primary translation tool while relying on TM only as the provider of the initial training corpus. Others maintain the TM+MT paired approach by providing pre-translated content colored with locked yellow for 100% matches, yellow for imperfect matches better than 75%, and blue AT (automatic translation) for the new source sentences that don't have any matches from the translation database. In the "improved traditional approach," the translator isn't supposed to touch 100% matches. Instead, the translator is supposed to edit high fuzzy matches and AT (NMT output), which is offered for new sentences and low fuzzy matches.
Such an approach seems to be reasonable. However, the consequences of NMT usage are now evident, based on the real-world scenarios where translators have to deal with post-editing and previously post-edited NMT output:
- As the saying goes, "Laziness was born before us." (It definitely predates humankind.) Moreover, "smooth language mechanics" makes it difficult to find the minor and major errors in meaning transfer that are hidden in literally every literal sentence of machine translation. Only highly skilled translators are good editors, and those who are less proficient are reluctant to improve the literal translation let alone recognize the need to read the source and then the translation to see whether the latter fits the purpose. Post-editing is incorrectly regarded as a low-skill activity, and unqualified post-editors miss many errors of various kinds. As a result, many incorrect translations make their way into the final text and then to the TM.
- MT, by the nature of its design, isn't able to follow terminology. Thus, in the editing of NMT it's always essential to make the terminology right.
- The need to review the entire corpora of AT translations while looking into the source makes the process very tiresome.
- It takes extra effort to correct literal translations, and it's even more difficult to discern whether a literal translation distorts or obscures the meaning.
The consequences of hype
What is the productivity increase of post-editing versus translation from scratch, specifically in the context of new sentences and low fuzzy matches? Is there any productivity increase at all, considering that in order to eliminate the deficiencies one would have to run the work through two people?
Perhaps it is faster to review and edit than to construct translation sentence from scratch, but the speed does come at a price. We are now seeing TMs that have been polluted by past non-edited MTs and non-reviewed errors. Such a TM isn't a pretty sight. Actually, it looks like garbage.
This leads to a significant decrease in the quality of the company’s linguistic assets. Such polluted TMs aren't suitable for further MT training, and eventually the degenerated knowledge base won't be much better than stock translation.
Again, the consequences are severe:
- Quality decrease is inevitable without additional measures, and errors quickly make their way to delivery.
- With NMT it's necessary to review 100% of AT because NMT output can (and has!) errors IN EVERY SENTENCE. Instead of saving on 100% matches, we'll eventually have to review 100% of the NMT output for accuracy of meaning transfer.
- TM corpus degradation happens very quickly. When it occurs, the benefits of TM approach will cease to exist. (Refer again to item B.)
- The need for terminology work is increased, incurring a great expense for terminology creation, maintenance and verification.
- Quality MT requires custom training and maintenance, which constitute additional cost.
- The productivity increase isn't significant and is probably canceled completely by the A, B, C, and D above.
The hype surrounding artificial intelligence obscures the fact that modern NMT isn't "intelligence," artificial or otherwise. The meaning isn't analyzed or processed. NMT is a series of linear transformations on linguistic usage within a trained corpus. The form is handled, but the meaning isn't. To make matters worse, when language professionals try to point out these very fundamental things, they're labeled as "change-resistant". Moreover, the hype has already made its way into the corner office, hitting the point of no return with potentially severe consequences.
We, in our own practical work, see how quickly corporate translation memories degrade in quality due to the errors in post-editing meaning transfer, terminology and language mechanics that have made their way into "approved" translation and then into TMs. Amid the hype overestimating the value of MT and undervaluing the work of professional translators, unrealistic cost savings have been built into budgets and it’s next to impossible to step back.
Alas, the clever and even miraculous but decidedly not "intelligent" MT will have to be pushed back into the more manageable and reliable reality. That, of course, is the reality in which humans are much more proficient and infinitely more intelligent than tricks of linear algebra that produce unverified minefields of dull, error-filled language.
We humans should value ourselves better!
The effect on professional translation work and service
Despite the fact that MT is now widely used, the aforementioned facts and consequences push researchers to further study post-editing as an activity and to more rationally consider what it entails.
Felix do Carmo and Joss Moorkens recently (in October 2020) published a paper in which they suggest a more realistic view of post-editing as translation with just another input. 5
They challenge the contemporary view of post-editing as a low-cost, low-skill revision task, arguing that the addition of machine translation to translation workflows requires even more specialization on the part of translators.
As our real-world experience shows, we need to really focus on meaning transfer, terminology and language mechanics and style to meet quality expectations. Certainly these are higher-order tasks than the swapping of words, because they're essential in order to improve the usability and acceptability of post-editing outcome.
Recent research as well as the experience of major translating organizations such as the European Union points in a direction where the use of NMT in the CAT-tool environment seems to be more efficient than full MT workflows with support from traditional post-editing.
It is also evident for real-world practitioners that the role of the professional translator has by no means been eliminated. On the contrary, it has become more complex but also more productive.
Deep, practical research is necessary in order to more specifically define the aspects of real-world MT usage. Just as scientists of the 1950s discovered that we need to handle radioactive material with care so as to avoid serious health issues, today with MT we face an even greater challenge. What we're talking about now affects the entirety of humankind: communication, science, meaning and knowledge or, in other words, everything that makes us intelligent creatures.
- Terminology is rare words and word combinations based on their occurrence frequency.
- GPT-3 is Amazing – And Overhyped. Forbes: https://www.forbes.com/sites/robtoews/2020/07/19/gpt-3-is-amazingand-overhyped/?sh=e2be0c71b1cb
- The Transformer, a deep-learning model introduced in 2017, is primarily used in the field of natural language processing (NLP). As with recurrent neural networks (RNNs), so-called transformers are designed to handle sequential data, such as natural language, for tasks associated with translation and text summarization.
- See “Translation Revision and Post-editing, Industry Practices and Cognitive Processes," Chapter 2, “Differentiating editing, post-editing and revision," https://www.taylorfrancis.com/books/e/9781003096962