By Mike Dillinger, PhD
& Laurie Gerber,
By Translation Optimization Partners
This time, we look at the "parallel universe" of government translation work and how machine translation and some variants are employed there. Many of the new developments reported in this series came from the AMTA (Association for Machine Translation in the Americas) conference held October 21-25, 2008 in Hawaii. That event was noteworthy among AMTA conferences for the excellent Government MT Users program track. Nick Bemish of the U.S. Defense Intelligence Agency, organized the government program track, and has agreed to do so again for the MT Summit conference, to be held this coming August 26-30 in Ottawa, Canada (summitxii.amtaweb.org). If you are interested in government uses of machine translation and you missed the conference in Hawaii, it will be worthwhile to put the Ottawa MT Summit on your calendar now!
In the first article in this series, I described the differing characteristics of translation for assimilation and dissemination. Whereas commercial translation is overwhelmingly for dissemination, government translation is overwhelmingly for assimilation – information gathering purposes. There is also significant translation for communication. Certainly many government agencies do dissemination, translating public service information aimed at non-English speakers in the U.S. and abroad, but it is the need to assimilate information and communicate on the ground that has put government-focused L-3 Communications on Common Sense Advisory’s "top 20" translation company list. These areas also drive the use of machine translation in the U.S. Government.
Parallel Universe
If you were a fan of the original Star Trek series, you may remember an episode in which viewers were introduced to a parallel universe in which the familiar characters’ personalities were the opposite of what we knew. In the government translation world has a similar relationship to the commercial translation world: the focus is on languages of the developing world, and languages of conflict rather than languages of commerce – so there are regular requirements for African, middle-eastern, and pacific region languages. Because the majority of translation is into English from languages that few Americans have learned (Pashto, Tigrinya), human translation is often done by source native linguists, rather than target native translators.
While the commercial world embraced translation memory first and is only now getting comfortable with machine translation, the opposite is true in government translation. The characteristics of disseminating product documentation into multiple languages that made translation memory so effective in the commercial world are absent in an information assimilation task. The texts to be translated rarely contain sentence-level repetitions. Formatting, which is a significant part of the value of many texts being translated commercially, does not get the same attention, since it is often discarded so that translations can be searched and digested automatically. In addition, because of the volume of materials to be scanned, and the need to find "nuggets" of information within them, few government agencies have used their human translators to do full text translations. A government linguist’s job is often to analyze a foreign language text and provide an abstract or commentary, or perhaps to select just a few passages to translate verbatim. For this reason, and because of legal/security issues surrounding many of the texts translated, the government has not accumulated large bilingual corpora, in spite of the volume of "translation" work going on.
Machine translation has found its primary market in the government historically because of the characteristics of assimilation work. It is often necessary for analysts to evaluate materials of uncertain value. Only when the analyst can scan a rough translation do they know if any part of the information merits an authoritative human translation. In addition, analysts frequently come across documents or snippets of information in foreign languages of unknown urgency. Again, machine translation can help to clarify this and guide subsequent actions. In law enforcement and intelligence, the value of a text, and justification for a polished translation is often in the presence of information about people, places and organizations of interest. So machine translation may be combined in sequence with other text analytic tools. Information extraction software may identify and extract names and numbers from a text. Once extracted into a database, data mining software may be used to detect connections among the entities. In fully automated text analytics pipelines like this, sometimes no human ever looks at a full text translation.
When software vendors try to approach the US Government, there are mysterious security hurdles, and few clear sales targets. Aside from the highly specialized language technology components, translation and text processing workflow and collaborative systems used in the government are often developed and maintained by the familiar and trusted government contractors. If you consider yourself familiar with language tools and vendors but have only been to commercial conferences, you might indeed feel you have landed in a parallel universe at a government language tools and technology conference when you find a well-populated tradeshow with few or no familiar vendors or tools!
Software Solutions in Government Environments
This section introduces the most common and widespread applications that incorporate machine translation for U.S. Government use. Note that the developers of applications mentioned below typically do not develop their own machine translation software, but incorporate commercial translation software, most often from Apptek, Language Weaver, Sakhr and Systran.
Ad Hoc Translation
Many government agencies have internally developed and hosted enterprise machine translation services available for ad-hoc translation of individual documents or cut-and-paste texts. Typically these services aggregate MT engines from multiple vendors and government sources, making them accessible via a standard dashboard.
DOCEX
Shorthand for DOCument EXploitation, DOCEX systems enable users to translate hardcopy documents. Generally speaking, the documents must be machine printed (not handwritten). DOCEX systems include a scanner, and a computer with OCR and machine translation software. Other text processing software, workflow management and archiving capabilities are often part of such systems. DOCEX systems may be designed for large scale "document conversion" at a permanent installation, but there are also portable versions that enable soldiers or law enforcement to quickly assess papers encountered in the field. The primary developers of DOCEX systems are CACI and Northrop Grumman.
Broadcast Monitoring
Broadcast monitoring systems enable digital exploration of television and radio broadcasts. Broadcast monitoring systems typically include receivers for satellite signals, video decoding processors, speech recognition, machine translation, information extraction (identification of names) and multilingual search software. In a relatively well-publicized example, the U.S. military’s CENTCOM Open Source Intelligence unit uses the broadcast monitoring system developed by BBN to create twice-daily reports on events and public opinion that emerge in television and web-based news sources in Arabic. Once the speech signal is isolated in the broadcast, it is automatically transcribed with speech recognition software to produce digitized Arabic text. The information extraction software identifies mentions of personal, place and organization names in the Arabic transcript. The entire text is then translated automatically into English in near real time. At CENTCOM and other places where such systems are used, broadcast monitoring provides a complete searchable archive of broadcasts being monitored. Rather than dedicating an Arabic-speaking analyst to watch every minute of all broadcasts that might be of interest in order to capture the one or two minutes per day that constitute important new information, English-speaking analysts can search and skim the transcripts, and then enlist the help of a linguist to assess the segments that may be of interest. Virage, now a division of Autonomy, offered the first broadcast monitoring systems and still has excellent products. Apptek recently developed some innovative and varied offerings along these lines.
Communication
The U.S. military has had to confront the well-known language challenges of operating in foreign countries, plus new cross-language communication challenges with the extensive international military coalition at work together in the Middle East.
Chat
Real-time Chat/Instant Messaging incorporating machine translation has been employed by the U.S. military coalition for several years to enable communication among coalition forces. Chat is used for operational communication, as well as informal fraternizing. The main systems have been built by Mitre Corporation from commercial IM and MT components under various names (Trans-Lingual Instant Messaging or "TrIM", Warfighter Chat, etc. )
Computer Assisted Interpretation
I credit Commonsense Advisory with coining the term Computer Assisted Interpretation, and it is an apt analogy. Computer Assisted Interpretation is typically embodied in a handheld device, and enables one-way translation. Like translation memory, computer assisted translation enables reuse of previously created authoritative translations. The Voxtec Phraselator is the most widely used system. Versions of the Phraselator are available preprogrammed with the phrases needed in a variety of situations from military checkpoints to medical intake. Phrases are designed to elicit an action or gesture (rather than spoken) response, so that the one-way translation is quite useful and interactive. In a face-to-face communication, the user utters a phrase or combination of phrases that they know to be among the material in the interpretation system. The device retrieves the translation and plays it aloud. This is especially important in communications where reading and writing are not practical, such as medical intake, when communicating in the dark, and when dealing with illiterate people. Another system is the Voice Response Translator by Integrated Wave Technologies, which allows users to say the name of a common "announcement" (for example, the "Miranda rights"). The entire announcement will be played in the desired language. Both are used by law enforcement as well as military.
Speech to Speech translation
The current generation of speech-to-speech translation systems are enabled by impressive leaps in speech recognition and machine translation technology, as well as user interface design. They are being used in the field primarily by the military. Such systems allow free flowing conversation between any two speakers of the source and target languages. Reportedly they are being used for communication between the U.S. military and Iraqi security forces in Iraq. The most advanced systems that have been deployed were developed and evaluated in the context of the DARPA TRANSTAC program which aimed at unrestricted communication between a native speaker of American English and a native speaker of Iraqi Arabic. BBN, IBM and SRI are noted developers of such systems.
Beyond Government Use
While production translation is extremely important in the current global business environment, you can see that there are a host of tools and technologies that enable translation in many more environments. I hope that this account of the alternate universe of government translation technologies will inspire some of you to explore commercial uses of some of these tools!
Author Bio
Laurie Gerber has worked in the field of machine translation for over 20 years, including system development, research, and business development. Laurie is also one half of Translation Optimization Partners, an independent consultancy that specializes in translation processes and technologies together with Mike Dillinger, a frequent collaborator and co-author of industry-related articles. Contact: gerbl [at] pacbell . net
Published - April 2009
ClientSide News Magazine - www.clientsidenews.com
Corporate Blog of Elite - Professional Translation Services serving ASEAN & East Asia
Monday, May 18, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment