Managing terminology properly in the translation process saves translation companies time and money, and improves quality and customer satisfaction. The most important, and also most resource-demanding, task is identifying the terms that need to be carefully handled in a translation job, such as frequent expressions, proper nouns, technical terms and marketing slogans. Translation project managers must be able to quickly identify such terms in the client’s documents and decide which ones to pre-translate. The benefits of this process -- known as term extraction – are widely acknowledged.

Yet term extraction is rarely undertaken in translation companies -- it just takes too much time. Some CAT systems now offer a tool to automatically extract terms. These term extraction tools all produce a certain amount of so-called “noise”, that is, irrelevant, unwanted terms. The output is useless unless the noise is removed… manually. In most cases, there is so much noise that the tool is deemed impractical.

Logrus Global, a technology-savvy LSP and localization technology developer, is very proud to announce its state-of-the-art terminology extraction tool: Prospector.

Prospector uses a combination of proprietary linguistic algorithms and semantic relevancy measures to effectively identify terms, and advanced stemming technology to convert plurals and inflections to the base form. The properly adjusted, semantically relevant terms are arranged, in descending order of importance, on separate sheets of an Excel file: new terms, acronyms, and proper nouns.

One distinguishing feature is that Prospector uses the Corpus of Contemporary American English (COCA) as a “reference corpus”, which improves term ranking. Maintained by Brigham Young University, COCA is the world’s largest corpus of the English language.

Dr. Kara Warburton, a prominent terminologist experienced in term extraction, was invited to evaluate Prospector. She conducted a comprehensive study that included four other well-known term extraction tools. Each tool was used to extract terminology from a 1.3 million word corpus – an actual translation job for a global NGO. The output of each was evaluated linguistically using key performance indicators, and finally compared to a glossary that had been manually prepared for this specific job by seasoned linguists. Overall, Prospector ranked the highest. Most importantly, Prospector produced the least amount of noise. The study is available upon request.

The study also demonstrated that a good term extraction tool can effectively identify many important terms that are missed by linguists when they create glossaries manually. It can even identify inconsistencies in source content, which can subsequently be addressed before translation even starts.

User feedback is also quite remarkable, with one Prospector user commenting “about 4000 terms extracted in no time, many longer expressions extracted correctly, almost no junk, almost no incomplete entries - probably less than 1%!”

With Prospector, automatic term extraction that does not require extensive cleanup work is finally possible. Logrus Global will continue to innovate this technology, which promises to deliver real productivity benefits for terminologists and translators.

16 October 2017

