Canadian linguistic company Termologic issued a report on a competitive evaluation of market-leading term extraction tools.
A term extraction tool scans a set of documents (referred to as a corpus) and identifies "term candidates". Rejected term candidates are referred to as "noise", and missing important terms are called "silence". The less noise and silence that a term extraction tool produces the better it is.
There are three approaches for term extraction: statistical, grammatical, and hybrid. Tools that use the hybrid method tend to have the best performance.
Five market-leading term extraction tools tested by Termologic are:
- Relevant Topic Finder (RTF) by Temis,
- Synchroterm by Terminotix,
- Termostat by University of Montreal,
- Term Extract by MultiTrans,
- Prospector by Logrus Global.
All of them use the hybrid approach except Term Extract, which uses the statistical approach.
The following properties were evaluated: term length, correspondence to the gold standard glossary, quality of high-frequency terms, levels of noise, and extracting acronyms. Of the five tools evaluated, Prospector performed the best, with RTF in close second place. One advantage of RTF is that it finds acronyms and matches them to their full forms, whereas Prospector finds acronyms but not the full forms. On the other hand, Prospector has slightly lower levels of noise.
Kara Warburton, CEO of Termologic, said: "From an availability perspective, Prospector has a clear advantage. It is available as a Web service from the Logrus Global web site. RTF, on the other hand, is internal to the Temis company and has not yet been offered as a standalone product".