By Mike Dillinger,
PhD & Laurie Gerber,
Translation Optimization Partners
This is the final part of the first in a new series of articles on how to achieve successful deployments of machine translation in various use cases. Different types of source documents and different uses for the translations lead to varying approaches to automation. In the first part of this article, we talked about why it is so important to automate translation of knowledge bases.
Pioneering companies have shown that automating translation is the best way to make product knowledge bases available to global markets. Customers consistently rate machine translated and English technical information as equally useful. A typical installation for automatic translation weaves together stored human translations that you already paid for and machine-translated new sentences to get the best of both approaches.
Steps to Success
Set your expectations. The documents in knowledge bases have distinctive characteristics when compared to other product support documentation, starting with the fact that they are written by engineers. These engineers may be experts in a technical domain, but they haven’t ever been trained in technical writing and are often not native speakers of English.
High-speed, high-volume translation simply cannot be perfect, no matter what mix of humans and machines we use. This is why emphasis in evaluation has shifted to measuring translation "usefulness", rather than absolute linguistic quality. The effective benchmark is no longer whether expert linguists detect the presence or absence of errors. The new, more practical criterion is whether non-expert customers find a translation to be valuable, in spite of its linguistic imperfections. We see time and time again that they most certainly do. You’ll confirm this with your own customers when you do beta testing of your installation.
Set realistic expectations for automatic translation: there will be many errors, but customers will find the translations useful anyway.
Start small. Start with only one language and focus on a single part of your content. Success is easier to achieve when you start with a single "beachhead" language. Starting small has little to do with machine translation and much more to do with simplifying change management: work out the details on a small scale before approaching a bigger project.
In our consulting practice, we’ve seen two main ways of deciding where to start: focusing on customer needs or on internal processes. For the customer-needs approach, your decision is guided by questions like: Which community of customers suffers most from the lack of local- language materials? Which community costs you the most in support calls? In translation expenses? Which has the least web content already translated? The decision is guided by the most important customer support issues.
For the internal-process approach, your decision is guided by questions like: Which languages are we most familiar with? Which do we have most translations for?
What languages are our staff strongest in? Which in-country groups collaborate best? The decision in this case is to build on your strengths.
Start small to build a robust, Scalable process.
Choose an MT vendor. The International Association for Machine Translation sponsors a Compendium of Translation Software that is updated regularly. In it, you can find companies large and small that have developed a range of products for translating many languages. You will see companies such as Language Weaver, Systran, ProMT, AppTek, SDL, and many others. How can you choose between them?
Linguistic quality of the translations is the first thing that many clients want to look at. Remember that you won’t offer to your customers what you see during initial testing. And even a careful linguistic analysis of translation output quality may not tell you much about whether the system can help you achieve your business goals. Evaluation of translation automation options is much more complex than having a translator check some sentences. You may want to hire a consultant to help with evaluation, while bringing your staff up to speed on the complexities of multilingual content.
For knowledge-base translation, scalability and performance are important issues to discuss with each vendor. Most vendors can meet your criteria for response time or throughput, but they may need very different hardware to do so.
You can narrow down or prioritize the list of vendors by using other criteria:
* Choose vendors who can translate the specific languages that you are interested in. If you want to translate into Turkish or Indonesian, you won’t have as many options as into Spanish or Chinese.
* Check that you have what the vendor needs. Some MT systems (from Language Weaver, for example) need a large collection of documents together with their translations. If you aren’t translating your documents by hand already, then you may not have enough data for this kind of system. Other MT systems (from Systran or ProMT, for example) can use this kind of data, but don’t require you to have it.
* Check how many other clients have used the product for knowledge base translation – to judge how much experience the vendor has with your specific use case. The best-known vendors have experience with dozens of different installations, so try to get information about the installations that are most similar to yours. Ask, too, for referrals to existing customers who can share their stories and help prepare you better for the road ahead. MT is changing rapidly, so you shouldn’t reject a product only because it’s new. But the way that these questions are addressed or dismissed will give some insight into how the vendor will respond to your issues.
* Think through how you will approach on-going improvements after your MT system is installed. If you want to actively engage in monitoring and improving translation quality, some MT vendors (Systran of ProMT, for example) offer a range of tools to help. Other MT vendors (Language Weaver, for example) will periodically gather your new human translations and use them to update the MT system for you, with some ability to correct errors on your own.
Of course, price and licensing terms will be important considerations. Be aware that each vendor calculates prices differently: they may take into account how many servers you need, how many language pairs (ex: English>Spanish and Spanish>English is one language pair), how many language directions (ex: English>Spanish and Spanish>English are two language directions), how many people will use the system, how many different use cases, additional tools you may need, the response times or throughput that you need, etc. Experience shows that the best approach is to make a detailed description of what you want to do and then ask for quotes.
Adapt the MT system to your specific needs before you go live. Whatever MT system you choose, you or the vendor (or both) will have to adapt it to your specific vocabulary and writing style. Just as human translators need extra training for new topics and new technical vocabulary, MT systems need to have the vocabulary in your documents to translate them well. Some vendors call this process of adapting the MT system to your specific needs training, others call it customization.
An MT system starts with a generic knowledge of generic English. Your knowledge base, on the other hand, has thousands of special words for your unique products as well as the jargon that your engineers and sales people have developed over many years. The goal is to bridge this linguistic gap between your organization’s writing and generic English.
Different vendors take different approaches to bridging this gap. Some MT systems ("statistical MT" – from Language Weaver, for example) take large amounts of your translated documents and feed them into tools that quickly build statistical models of your words and how they’re usually translated. If you don’t have a sizeable collection of translated documents, though, it’s difficult to build a good statistical MT system. All MT systems can make use of your existing terminology lists and glossaries with your special words and jargon. And many MT systems, from Systran or ProMT, for example can use your translated documents to extract dictionaries directly from translated documents. Hybrid MT systems, which are just emerging in the market, also build statistical models, to combine the best of both techniques. Hybrid MT systems are more practical when you don’t have a sizeable collection of translated documents to start from.
Go live. Do this in stages, starting with an internal test by the main stakeholders. Then move into "beta" testing with a password-protected site for a handful of real product users. Be sure to have a disclaimer that openly announces that the document is an automated translation and may contain errors. (At the same time, you will want to promote the availability of the content in the user’s language as a new benefit.) Actively seek out their feedback to identify specific problems, and address the ones that they cite most frequently. At this stage, your users may mention that there are errors in the translation; try to get them to identify specific words and/or sentences.
In knowledge-base deployments, a small proportion of the content (<10%) is widely read and the vast majority of the content is rarely read. The current best practice is to establish a threshold of popularity or minimum hit rate that will trigger human translation of the few most-popular articles for a better overall customer experience.
This is the time to do a reality check: offer a feedback box on each translated page. It is most helpful if you ask for the same feedback on your source-language pages for comparison. If the translated page is rated much lower than the original page, then the difference may signal a problem in translation.
Keep improving quality. Inevitably, products and jargon will change and you will identify recurring errors. Translation quality management is an on-going activity with two main parts: managing quality of the original documents and managing the parts of the MT system.
We’ll leave discussion of document quality management for a future article. When engineers respond to emergent problems with knowledge-base articles, it is not practical to impose stringent authoring guidelines. But you can encourage them to work from a standard terminology list (terms that the customers know, which may be different from terms that the engineers use). This will make the source-language documents easier to understand, and will improve the translations, as well.
For rule-based or hybrid MT systems, you will want to manage (or outsource management of) key components like the dictionary. As errors or changes arise, updating the dictionary will improve translation quality. For statistical MT systems, you will want to manage carefully any human translated content and "feed" it into the system. The more data you use, the better these systems get.
Repeat for another language. With the first language, you will work out the kinks in your process. Once you see how very appreciative the customers are for content in their own language, you can get to work on the next language. Now you know the drill, you know the tools, and you know what to look for. The next language will take you only 25% of the effort you put into deploying the first one.
Links
Will Burgett & Julie Chang (Intel). AMTA Waikiki, 2008. The Triple-Advantage Factor of MT: Cost, Time-to-Market, and FAUT.
Priscilla Knoble & Francis Tsang (Adobe). Hitting the Ground Running in New Markets: Do Your Global Business Processes Measure Up? LISA San Francisco, 2008.
Chris Wendt (Microsoft). AMTA Waikiki, 2008. Large-scale deployment of statistical machine translation: Example Microsoft.
Authors:
Mike Dillinger, PhD and Laurie Gerber are Translation Optimization Partners We are an independent consultancy specialized in translation processes and technologies. Both Principals are leaders in translation automation and past Presidents of the Association for Machine Translation in the Americas, with 40 years’ experience in creating and managing technical content, developing translation technologies, and deploying translation processes. We develop solutions in government and commercial environments to meet the needs of translation clients and content users. Our offices are in Silicon Valley and San Diego. Contact us for further information:
Mike Dillinger mike [at] mikedillinger . com
Laurie Gerber gerbl [at] pacbell . net
Mike needs more places to grind this axe: Authors and authoring are often treated as an unimportant afterthought, in spite of the central role of high-quality content in brand management, marketing, sales, training, customer satisfaction, customer support, operational communications, and everything else.
Published - April 2009
ClientSide News Magazine - www.clientsidenews.com
Corporate Blog of Elite - Professional Translation Services serving ASEAN & East Asia
Friday, May 22, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment