Wednesday, January 7, 2009

The Interpretive Model and Machine Translation

By Mathieu Guidere
Master in Arabic language and literature and Ph.D in Translation Studies and Applied Linguistics from the University of Paris-Sorbonne,
Lyon 2 University - France Saint-Cyr Research Centre, France
mathieu.guidere@univ-lyon2.fr
http://perso.univ-lyon2.fr/~mguidere

For a long time, translation formed part of linguistic studies (see G. MOUNIN’s works). However, during the last few decades, it has been institutionally associated with “Language Sciences”, which represent a vast and very dynamic field in which interdisciplinarity plays a key role.

This association has led to the burgeoning of a translation science (traductology or translation studies) within the field of Language Sciences which does not deal specifically with “translation” but with “translation operations and process”, thus reflecting the change in perspective adopted to approach the study object.

Our aim is to put forward an epistemological analytical grid of the field in question i.e. the works related to the analytical study of translation and its natural processing as a prelude to machine translation or computer-assisted translation. However, delimiting a field requires one or several perspectives in order to define its axes, issues, methods and aims.

Therefore, a broad outline of the theoretical conflict between the issues of meaning and translation will first be laid down. We will then explain how this conflict transcends logical formalization. The aim of devising a theory is to set the translation pedagogy free from the “interpretive model”. Finally, these issues will be reexamined in order to reuse the data for natural language processing (machine translation and computer-assisted translation).

To set up the analytical grid, we will have recourse to three basic fields related to scientific methodology: the observation field, the hypothesizing field and the validation field (see Auroux’s works). The purpose here is not to compare the observed approaches or to express any value judgment concerning them; but to tackle them from the natural language processing perspective as a step prior to translation, because this perspective as well as its implementation are part of an “objective” process, meaning that they merely draw up assessments about specific data. In other words, we are first and foremost concerned with the observation, well sustained by descriptions and validated data, so as to put these works into perspective and to draw up a specialization field in the light of the principles introduced below.

The Methodological Choices

The methodological choices concern the perspectives selected to analyze works on translation in this study. These choices put the discipline at the crossroads of theoretical linguistics and scientific empiricism, on the basis that the effects of a theory are commensurate with the resulting application. For this reason, the study object, translation, will be tackled in a descriptive way, i.e. as it is practiced and evolves professionally. But this object must be defined by new analytical protocols and imperatively has to move away from the prevailing interpretive models.

By taking these postulates into consideration, we will only describe attested works (the corpora of translated and published texts) and use what are regarded, in these works, as empirical elements which can be subject to a “corroborative” or validation test. But this aim does not rule out the possibility of making observations of a different type using information not contained in our corpora and works.

The data for analysis can be divided into three main categories. First of all, electronic texts associated with observations. Secondly, a computerized system of hypotheses and indications. Finally, validation applications relating hypotheses and linguistic data arising from observation.

Our work is essentially based on three tools: electronic texts grouped into machine-readable corpora, work tools for observing and classifying linguistic data and corroborative tools to validate the observation results.

Corpora used in this study had to match Sinclair’s terms. The observation of linguistic data should lead to the constitution of a study object in accordance with a specific and sustained extraction protocol. Results arising from the observation must be “remarkable”, meaning that they should reveal high frequency usages and occurrences in the reference corpus.

Consequently, our attention was turned to real final works (texts, sentences, expressions, terms) and not to practices relevant to language usage (speaking, writing, memorizing). The idea was that these speech practices cannot be subject to the rigorous imperatives of data examination and that only observed works allow the application of objective procedures. But this does not mean at all that what is observed does not reveal what is happening in the speaker’s mind[1].

This separation between “data” and “practices” finds its counterpart in the field of computer science, in the separation between the “declarative” and the “procedural[2]”. For the moment, we have to decide which type of data must be observed and, specifically, which phrases and terms are potential subjects for a systematic study of translation.

Until now, our approach has been based on the empirically verified postulate that the corpus texts used for examining data represent well-formed and subsequent phrases respecting specific constraints, therefore allowing us to distinguish a discourse construction from an anarchic set of phrases lacking coherence and consistency.

This starting point is important because it puts a great deal of emphasis, in the observation and analysis, on the significance of textual linguistics in comparison with theoretical and general linguistics. This means that we are pursuing several objectives: first, recognizing a text from a series of phrases with no logical or semantic link between them, secondly, tagging the content of the text from a typological point of view (technical, journalistic, etc.) and finally, classifying the information extracted according to a previously defined protocol and linguistic criteria.

To achieve these objectives, not only must an observation methodology be adopted but results should also be expressed in appropriate language. Therefore, in a text, we must learn to observe, on the one hand, phrases according to the three levels of analysis (morphological, semantic, syntactic) and, on the other, relationships between phrases according to the discourse type (argumentative model or textual anaphora).

Once the methodology has been adopted, some work hypotheses can be made while referring to three main axes: firstly, the type of formalism used, secondly, the linguistic extension or portability[3] and finally the aim or objectives of the analysis.

Concerning the first axis, the choice was made to make the results more explicit while setting up hypotheses in a form which could be computerized, i.e. likely to be represented by an algorithm and read by a machine. This is the peculiarity of “formalization”[4] that we wanted to be specific to the constraints of the translation process.

In this respect, there are two ways of “formalizing” linguistic data: one is totally independent from the computerized tool which processes data afterwards and uses explicit instructions in the form of standard rules; the other is based on the formal possibilities of machine-readable algorithms to represent the linguistic information. However, both ways are often so complementary that we should start with the first way before tackling the second. In both cases, machine-readable linguistic formalisms are obtained at the end of the procedure.

Concerning the second axis, we decided to choose, as a starting point, a source text (ST) and, as a finishing point, a target text (TT) in order to examine, in a contrastive way, their interactions through a range of structures of varying complexity which needed to be described and extracted. Once the structure is applied to the ST, in accordance with a specific protocol, it is simply searched for and validated in the TT. Hence, this is a “source-oriented” point of view of the linguistic extension.

It should be mentioned at this point that translation studies distinguish two points of view in the practice and analysis of translations: the “source-oriented” point of view which favors the specificities and requirements peculiar to the source text (faithfulness, literality) and the “target-oriented” point of view which favors the target text (rewording, adaptation).

Concerning the third axis (the aim of the analysis), it should be noted that we already have the “inputs” and the “outputs”, i.e. we already know the results of the operation before even starting the formalization and implementation of the program because we are working on text corpora which have previously been translated and synchronized. The goal of this application is to show that the program runs in accordance with given specifications. In other words, the program implementation is mainly a validating procedure for the observation results[5].

In the light of these elements, it ought to be mentioned that in the field of machine translation (MT), the issue of “linguistic extension” is essential and requires that we dwell upon it. It can be stated as such: in a linguistic A system, the information associated with a subgroup of translation units (sentences and expressions) shows a certain regularity and coherence likely to be systematized and computerized. The question is to know whether the properties of the A system, while maintaining the same underlying coherence, can be extended to a B system in such a way that the source units of translation have adequate equivalents in the target language. If this is possible and the modifications to be introduced do not affect the internal coherence of the B system, we may then say that the A system and its subgroup of units are linguistically extensible, meaning that they are transferable by a computerized translation.

Let us take an example likely to be described adequately by a grammar in spite of its complexity and semantic ambiguity. The sentence “The Minister of Education met his Interior counterpart” can easily be translated by a human in any language. However, to be translated by a machine, its linguistic properties must be extensible to the system which will receive it. In this particular case, for example, the “the possessive phrase” (the genitive construction in Arabic) should be transferable and the ellipsis in the recurrent phrases (the minister of something) should be acceptable in both cases without major modifications. Moreover, the issue of “predication” poses thorny problems concerning “portability” in two different linguistic systems like English and Arabic.

To avoid making problems of correspondence between languages insurmountable, very detailed linguistic indications must be provided to reach the next level of formalization as a prelude to computerization. A machine-readable system of equivalences is thus a set of linguistic formulae in which every formula specifies at least one pair of phrases (see the holistic perspective of translation).

As an example, let us take a set of expressions (SES) in a source text (TEX) so that every expression (EXP) can be associated with one or several indications (IND) similar for all expressions (EXP) of the set (SES) in the text (TEX). This gives the following formula: SES = {IND, EXP,TEX}.

On the basis of this formula, an equivalent formula, valid for the target text, can be obtained: SES1 = {IND1, EXP1, TEX1}. This formula is justified with regard to a set of expressions with relevant linguistic features in common in the target text without necessarily being equivalent to those of the source text on the structural level. There is no systematic projection of the properties of one system onto another. If there is projection, it must inevitably be done in accordance with a grammatical principle whose formulation is subject to the calculation (formal or algorithmic) which underlies all expressions in the text. In this way, a linguistic property can or cannot be projected, in the same way as a system can or cannot be portable, regarding the possibility or not of translating sequences from one language to another.

By adopting this “formalist” point of view in translation, explicit criteria for the comparison of texts are laid down, each dissected and expressed in the form of adequate equations. According to this method of analyzing translation, there is no “equivalence” between languages but only “correspondence” of structures and linguistic features. As opposed to “equivalents” which can be analyzed according to the similarity criterion, “correspondents” are pairs of objects different on the form level but comparable on the function level.

The featuring of these “correspondents”, which include semantic imprecision, mainly derives from the choices made during the observation stage. Which comparison elements should be adopted? Of course, we exclude from our criteria any subjective consideration concerning the “beauty” or the “elegance” of the translation to be used for machine translation.

An Outline of the Adopted Approach

Our approach can be associated, from a theoretical point of view, with textual linguistics with significant recourse to the principle of contrastivity and formalization.

In the framework of this approach, texts taken as a study backup are classified according to the sources which have produced and distributed them (for instance a paper or an official body) and according to their denotative field based on explicit semantic considerations (for instance, texts about law or health issues).

Once the field and the type of the text have been well defined, observations focus, on the one hand, on its segmentation and on the constituents of its syntax (the “chunks”), and on the other, on the links between those constituents from a morphological and semantic point of view.

Underlying calculations ensure the validation of this approach from a theoretical and practical point of view. Thus, the choice of textual units to be analyzed and formalized must be made according to specific concepts such as those of “recurrence”, “coverage” and “precision”. Statistics is used to detect the most frequent linguistic and translational usages of a structure in a study corpus and to form the description which must tell us about the most relevant elements.

Hence, observation deals with what is immediately accessible in the phrases under study, while semantics is not tackled at this point. The use of training corpora and the induction of descriptions are at the heart of the textual approach. The main stages of analysis are the following (reasoning from particular facts to a general conclusion):

1) Segmentation and morphological analysis;

2) Disambiguation of morphological categories;

3) Local and textual syntactic analysis;

4) Analysis of functional syntactic relations.

The main difficulty of the analysis before translation is still the disambiguation of the original textual context. This difficulty is essentially related to the problem of sentence delimitation in order to eliminate the potential syntactic relations for a given type of rules (i.e. the morphosyntactic rules or “chunking rules”). This problem becomes much more salient during machine analysis of texts because difficulties resulting from the ambiguities of morphosyntactic tagging combine with those of segmentation). With current formalisms, it is difficult to automatically reduce the generation of “intrusive analyses” which will inevitably be a problem during translation (see Chanod’s works).

Nevertheless, research into textual linguistics is opening the way to an inductive process of translation. It is becoming possible to formulate inductive generalizations like those of linguistic “correspondences” which are actually observed. However, to advance research, it is imperative to implement systematically corroborative tests able to measure the validity of adopted rules.

Limits of Interpretation in Machine Translation

One of the fundamental issues regarding the translation approach is still that of principles allowing the interpretation of the meaning to be translated. The perspective adopted here for analyzing translations deems there to be a specific translation mechanism which intervenes in the interpretation of phrases and general principles associated with interpretation to be insufficient. However, this mechanism should be amended to take into consideration linguistics marks (tense, mood, linking word, verbal and nominal lexicon) contributing to the interpretation of phrases and speeches to be translated. We lay out here a general framework of the formal representation, the theory of translational formalisms, and an interpretive translation model, the model of contextual deductions, to specifically examine the question of translational equivalences. We will demonstrate how this approach could be applied to natural language processing as a prelude to translation (CAT and MT).

In fact, a few years ago, new directions in linguistics and semiotics began redefining interpretation in translation and regarded it as an act of cognition passing through a comparative process of possible equivalences. The idea of setting the record straight about interpretation in translation meets the need to adjust practical observations to these new theoretical directions.

To establish the elements of the debate, we must start with texts from Umberto Eco’s book Les Limites de l’interprétation. The author notes, in his introduction, that “some pushed too far the interpreter’s initiative that the problem today is to avoid falling in a misinterpretation”. And he later adds in his book: “All in all, to say that a text has no end does not mean that {every} act of interpretation has a happy ending”. This is why the author strives to restore a certain dialectic between the rights of the reader-translator and the rights of the translated-to-be text.

Using the message “Dear friend, in this basket brought by my slave, there are thirty figs I send you as a gift”, Umberto Eco gives a range of significations and referents, but he asserts that we do not have the right to say that the message could mean anything. It could mean a lot of things but it would be hazardous to suggest any meanings. Asserting this fact means admitting phrases have a literal meaning: “I know how heated is the controversy in this respect, but I still maintain that, within the limits of a given language, there is a literal meaning for the lexical items, the one dictionaries mention first”. Eco says we must set out to define a kind of swinging, an unstable balance, between the interpret initiative and faithfulness to the text. The functioning of a text can be understood by taking into consideration the part played by the addressee in the process of its comprehension, realization and interpretation as well as the way the text itself projects the participation of the reader.

The debate on interpretation in translation is based on two approaches: on the one hand, searching for what the author meant to say in the text[6]; on the other, searching for what the author says in the text, regardless of his intentions, either by relying on textual coherence or on the signification systems of the addressee. However, in all cases, one must use the literal meaning to develop a translation.

Translation criticism tries to explain the reasons why the text gives the former meaning or the latter. The number of versions a translator can come up with is potentially unlimited but, at the end of this process, each one of them should be tested with respect to the textual and linguistic coherence, thus rejecting precarious or approximate translations. Therefore, a text lends itself to numerous readings without allowing all possible translations. If we cannot tell which translation is the best for a text, we can, however, tell which are incorrect. Every act of translation is a difficult transaction between the translator’s competence and the type of competence a given text needs to be translated in a rigorous and coherent way. Within the unreachable author’s intention, what he meant to say, and the arguable intention of the reader-translator, his interpretation, there is the transparent meaning of the text which refutes any inadequate or unacceptable translation.

It is difficult to determine what is wrong and what is authentic in a translation, because definitions depend on the issue in question. Nevertheless, in all cases, the condition sufficient to have an incorrect meaning is the assertion that phrases from the source text have many equivalents in the target text. Thus, translation is not erroneous because of its internal properties but due to a pretended multiple identity between the source and the target.

Therefore, the sentence “All translators love foreign languages”, for example, does not have many parallel meanings but it accepts in practice several possible translations[7]. On the other hand, it is impossible to reasonably conclude that all these equivalences are identical, structurally speaking, and regardless of the subjective perception of individuals who have produced them.

These different translations are not only different wording of the same idea. Each structure stylistically expresses a different meaning. Consequently, we cannot say that a nominal sentence and a verbal sentence convey the same idea and express the same meaning, even if the words used are identical in the two structures. We know predication is not the same in both cases because the nominal sentence emphasizes the noun whereas the verbal sentence focuses on the process or the action. To declare that two structurally different translations are equivalent to a third original structure is to simply ignore the specificities of the linguistic structures in expressing nuances and meaning subtleties.

To be convinced of the validity of these observations, “retro-translation” could be used as a discriminating criterion between translations. “Retro-translation” means, in fact, retranslating to the source language, without resorting to the original the version already translated into the target language. Translating the version translated backwards and “blindly” often allows us to notice that the equivalent structure was not the one taken as a starting point for translation, demonstrating the inaccuracy of the aforementioned translation.

The notion of “possible equivalence” is useful for a translation theory because it helps to decide which meaning interests the translator in his work and what he wants to convey through language. But we must be aware of the fact that, among possible translations, there are inevitable translations, improbable translations, and inadmissible translations.

In a sentence such as: “All translators love foreign languages”, the translator must think of the best way of rendering it in the target language. He will first think in relation to the three levels of language: morphological, semantic, and syntactic. The inevitable translation will take into consideration these levels while being linguistically correct and culturally appropriate. The improbable translation will move away from literal accuracy in an over-translation of the original or create a certain stylistic effect. Finally, the inadmissible translation will give a semantically different version of the original while being linguistically accurate.

In this regard, a distinction should be made between “semantic translation” and “critical translation”. The first is the result of the technique adopted by the translator, when faced with the linear progression of a text, of giving a certain meaning in accordance with the lexicon of its phrases, whereas the second is a metalinguistic activity aimed at describing and explaining, on the formal level, why a given text gives a given translation, with the exception of all others, however sensible they are.

An exemplary translator is not only required to be precise and meticulous but also to pay great attention to the stylistic subtleties of both his work languages according to the principle that every wording has its own meaning and aim in the linguistic system using it (the “economy of language” principle). If the exemplary translator acts as such, he will produce a consensual translation without any subjective value judgment. Otherwise, he will be compelled to search in vain for possible meanings and potential ways of rendering them.

Some translators may wonder: “why be so rigorous if the meaning is understood and conveyed?” However, such translators, indulgent or careless depending on each case, will not be exemplary translators, because they seek the exact meaning and the inevitable translation, the one likely to be taken and modeled for language natural processing. But how can we achieve this goal when faced with so many readings and interpretations?

According to the semiotician Peirce, the meaning interpretation is an action involving the cooperation of three subjects: the sign (ex.: the word rose), its object (the real tangible flower) and its interpretant (the concept of the red flower). What is important in the definition of Peirce is that it does not take into consideration an interpreter or conscious subject. Hence, it should be remembered, in accordance with the analyses of Peirce and Eco, how important the distinction between the meaning system (the sign system) and the process of communication is (requiring the presence of an interpreter).

The meaning system is a series of elements with a combinatory rule governing the disposition of elements between them (its syntax). The acceptable sequences of a syntactic system associated with another system can be transferable from one language to another (ex.: w+a+t+e+r = water = “drinkable transparent liquid” is transferable in any language in the world without recourse to human interpretation).

In a semiotic system, any content could become a new expression likely to be interpreted or translated by another expression in another language. Abduction is a form of inference which tries to accurately interpret the meaning of a phrase and to establish a rule using a word and its context. Recognizing a series of words as a coherent sequence (i.e. as a text) means finding a textual theme able to create a coherent connection between different data with no link between them. The identification of a textual theme is an example of an abduction. Every translator makes abductions to choose between numerous possible readings of a text. The economy of language criteria compel us to always choose the easiest option in the absence of any other selection tool.

For a Corpus-based Translation Methodology

The method adopted is the method of formal linguistics but the approach suggested here[8] is based on three postulates of corpus linguistics:

Firstly, all translation solutions already exist in translated texts.

Secondly, all translational equivalences are subject to analysis and formalization.

Thirdly, all formalizations are systematized and computerized.

This approach is primarily applied to specialized texts (with a controlled or closed vocabulary) and secondarily to general texts (with a connoted and polysemous vocabulary). The first category comprises the vast majority of literature translated nowadays, whereas literary (or poetic) language represents a tiny part of the discursive usage in textual corpora.

This approach is aimed at determining translations likely to be formalized. To identify the most relevant translation solutions, we have recourse to the calculation of occurrence frequency: the more an equivalent is frequent in translated texts, the more it is regarded as an inevitable solution; the less frequent it is in translation, the more it is regarded as marginal.

We can mention, for example, for the sentence “All translators love foreign languages”, five different ways of translating it into Arabic. The most frequent wording would be considered as inevitable, regardless of its intrinsic quality because we deem its recurrence to be proof of its validity, not to mention its legitimacy. The goal is not really to evaluate the quality of translations but rather to take note of the translational usage.

Thus, corpus-based translation is based on three principles or main presuppositions:

1) The immanence principle: each pair of texts forms the same composite element of signification; the analysis examines both texts but only as translations of each other; it does not rely on external data such as dictionary information or grammars.

2) The composition principle: The only true meaning is through and in the relationship between the two texts, especially the correspondence relationship between translation units; the analysis of “bitexts” consists, therefore, of establishing the correspondence network between different elements, a network which will be the basis for the text translation.

3) The structuring principle: every translation respects a discursive logic and a grammar, i.e. a certain number of linguistic rules and basic structures. In a set of units named “translations” there are different levels of correspondence, each with their own grammar.

Therefore, the global content of a translation can be analyzed on three different levels:

1) The translation level: in a translation, we study the changes which convey the meaning of the source text to the target text. At the end of a translation process, the analysis seeks to redraw the various stages, logically related to one another, which mark the transformation of a sentence into its equivalent. In each stage, we specify the links between the functions of some of the phrasal elements which determine the meaning and produce the transformations.

2) The discursive level: the analysis involves three operations: (a) identifying and classifying sequences i.e. significant elements in a text; (b) establishing equivalents to each element in the text in order to determine how this element was translated in the text; (c) finding why elements, in a given text, are translated in such and such a way.

3) The logic-semantic level: it is the most abstract level of analysis. It works on the postulate that logic and meaningful forms underlie translations of any speech. At this level, analysis means specifying the logic which manages fundamental articulations of translation units. To do so, we must have recourse to formalization and representation of relations within and between sentences.

The thinking in translation studies currently seems limited to two correlative paradoxes. On the one hand, the pragmatism of the “interpretive model” which obviously tends to over-reduce the method and sacrifice precision for the sake of communication, and accuracy for the sake of rapidity. On the other, the opposition between the logic paradigm and the hermeneutic paradigm reduces translation pedagogy to a kind of sophisticated mnemonics without any real applicable or metalinguistic dimension.

In this tense field, our translation theory evolves between the two paradigms (the requirement for interpretation and the necessity for formalization). It questions interpretive practices in the process of translation. In fact, the interpretation issue seems today to be the linking point between text theories and translation theories. In our discipline, this issue is nowadays the main controversial element for establishing a new applicable translation methodology.

We suggest below a preliminary draft of the translation work which could be requested from novice translators.

1) Alignment and Criticism of translated corpora

Aligning corpora means matching every “translation unit” of the source corpus to an equivalent unit of the target corpus. In this case, the term “translation unit” covers long sequences like chapters or paragraphs as well as shorter sequences such as sentences, phrases or simply words.

The translation unit selected depends on the point of view chosen for the linguistic analysis and on the type of corpus used as a database. If the translated corpus is very faithful to the original, we will proceed with a close alignment of the two corpora with the sentence or even the word, as the basic unit, whereas if the corpus used is an adaptation rather than a literal translation, we will align larger units such as paragraphs or even chapters.

It is obvious that the initial postulate, which allows an educational use of such corpora, is to establish correspondence between the content of examined units and their interconnections. So-called “free” translations must lead to well-sustained thinking on missing sequences, changes in the text order, content modification, meaning adaptation, etc. All these operations are common in everyday translation practice but their frequency varies according to the fields of study.

Furthermore, there are important structural differences between English and Arabic which prevent rigorous sequential processing. Due to the huge linguistic difference between the two systems, we often notice that the sentence order has been modified and sometimes omissions or additions occur between two texts which are nonetheless a translation of each other. These aspects must be examined from a stylistic point of view and, if possible, systematized.

All these observations lead us to consider parallel corpora not so much as a set of equivalent sequences but rather as corresponding text databases. At any level (text, paragraph, sentence, phrase or word), the examined corpus should be regarded as a lexical and translation database. In other words, we suggest submitting it to a search technique similar to the one used in information searching systems.

Thus, the main goal will be to highlight structural equivalences between the two languages, and, more pragmatically, to search for the closest T2 (the target text) unit to the “request” represented by a T1 (the source text) unit.

2) The Linguistic and Stylistic Analysis of the Corpus

The different levels of linguistic analysis serve as a basis to study translation examples:

- Firstly, morphological analysis identifies equivalent words or morphemes in the corpus.

- Secondly, syntactic analysis identifies corresponding phrases and structures in both texts.

- Finally, semantic analysis identifies the meaning of units and eventual ambiguities in every text.

The usefulness of such a corpus goes beyond the limited framework of translation. While the main goal is translation criticism, other useful applications may also be considered such as generating bilingual terminology lists, extracting examples for pedagogic purposes, enhancing current dictionaries or even for the induction of grammar rules.

The suggested approach allows us to optimize thinking in translation studies regarding bilingual texts.

The general idea of this approach is to associate equivalent “translation units” (words, sentences, syntactic structures) when the corpus sequences are identified.

The main goal of such an approach is to allow the pairing mechanism to be divided into two different parts:

1) Identifying the potentially associable “units” in the two corpora.

2) Calculating the probability of suggested units by submitting them to the bilingual corpus data.

By dividing the procedure into two phases, relatively easy translation models can be put in place in order to identify units likely to correlate the theoretical analysis with actual translations observed in the corpus.

One of the possible ways of devising operational systems is to develop analysis methods based on the data stored in training corpora. But such methods, based on model training, depend on the amount of a priori available information.

In this respect, a distinction can be made between two types of situations:

Situation 1: A parallel corpus of analyzed and annotated translation units is available a priori, i.e. a corpus for which a syntactic scheme representing the structure of a unit has been selected for each unit, given its meaning.

This first situation, in which a significant amount of information is available to evaluate parameters of the equivalence model, will be referred to as a training situation and will or will not be used, depending on its occurrence frequency in the annotated corpus.

Situation 2: Relatively little information is available, meaning it is a raw corpus. In this case, hypotheses should be made on the basis of iterative re-estimation of corpus data. For example, all units starting with the sequence “except that” will be grouped in order to compare their translations.

It is interesting to know in this respect that one of the advantages of the statistical model, compared to more theoretical approaches of contrastive linguistics, is that it considerably reduces the number of possibilities of approximate translations while evaluating the quality of available corpora.

Hence, an examination of translation possibilities available in our corpus leads to the following observations concerning the nature of equivalences:

- cases of strong equivalence in which the number of words, their order and their meanings in the (bilingual) dictionary are the same.

Example:

P1: “The rise in unemployment in March worries officials”.

T1: “izdiyâd al-bitâla fî mâris yuqliq al-mas’ûlîn”

Literally: “(The) rise (in) the unemployment in March worries the officials”.

- cases of approximate equivalence in which the number of words and their meanings are the same but their order is different.

Example:

P1: “The President of the Republic received his Syrian counterpart”

T1: “istaqbala ra’îs al-jumhûriyya nazîrahu al-sûrî”

Literally: “received (the) President of the Republic his counterpart Syrian.

- cases of weak equivalence in which the order and number of words are different but their meanings in the dictionary are the same.

Example:

P1: “Rains are expected in the North of the country”.

T1: “yutawaqqa‘u an tumtira fi al-shamâl”

Literally: “It is expected that it rains in the North”

In our bilingual corpus, this last case accounts for the majority of translation equivalences.

A decreasing alignment of the bilingual corpus is used to ensure the greatest possible reliability for the searching operation, from the largest translation units (chapters and paragraphs) to the smallest ones (sentences followed by phrases and words). Thus, the field of analysis is tightened by performing a “shrinking” alignment of the corpus units and by focusing the search on gradually smaller units.

Conclusion

From a methodological point of view, combining a linguistic approach with a stylistic approach makes it possible to fine-tune alignment and enhance translation criticism.

However, some aspects deserve particular attention in order to ensure training efficiency.

On the one hand, the type of data used, i.e. the bilingual parallel texts, may pose a problem if the quality of the corpus is poor or if its translation quality has not been subject to strict control.

On the other hand, the sharpness of criticism and the precision of extracted information concerning translation depend on the volume of available training data.

For all the aforementioned reasons, there will be a need for a long training period with a great amount of diverse textual data. Once this stage is completed, the mechanisms observed by the trainee on corpus can be reactivated to infer different kinds of already tested translation solutions.

Indicative Bibliography

Chanod, J.-P., 1993, « Problèmes de robustesse en analyse syntaxique », in Actes de la conférence « Informatique et langue naturelle ». IRIN, Université de Nantes.

Cori, M., Marandin J.-M., 2001, « La linguistique au contact de l’informatique : de la construction des grammaires aux grammaires de construction », Histoire, Epistémologie,

Langage, 23 (1), pp. 49-79.

Eco, U., 1992, Les Limites de l’interprétation, Paris, Grasset.

Gazdar, G., & Mellish Ch., 1989, Natural language processing in LISP, an introduction to computational linguistics, Addison-Wesley.

Guidère, M., 2002, Manuel de traduction français-arabe, Paris, Ellipses

Guidère, M, 2000, Publicité et traduction, Paris, L’Harmattan.

Guidère, M. 2001, “Toward Corpus-Based Machine Translation for Standard Arabic”, in Translation Journal, n°1, vol. 6, http://accurapid.com/journal/19mt.htm

Kamp H., & Reyle U., 1993, From discourse to logic, introduction to model theoretic semantics of natural language, formal logic and discourse representation theory, Dordrecht, Boston : Kluwer Academic Publishers.

Lederer, M., 1994, La traduction aujourd’hui : le modèle interprétatif, Paris, Hachette.

Mounin G., 1978, Linguistique et traduction, Bruxelles, Mardaga.

Peirce, Ch.-S., 1978, Ecrits sur le signe, Paris, Editions du Seuil.

Seleskovitch, D., Lederer, M., 2001 (4è éd.), Interpréter pour traduire, Paris, Didier Erudition.

Sinclair, J.M., Payne, J., Perez Hernandez, C. (eds.), 1996, Corpus to Corpus : a Study of Translation Equivalence, IJCL 9.3.

Tognini-Bonelli, E. 2001, Corpus Linguistics at Work, Amsterdam / Philadelphia, John Benjamins Publishing.

Wichmann, A., Fligelstone, S., Knowels, G., Eds. 1997, Teaching and Language Corpora, London / New York, Longman.

[1] Regarding the psycholinguistic and cognitive aspects, we subscribe to the position of Kamp and Reyle (1993, p.10-11): “the only access which the theorist seems to have to the language of thought is via the languages we speak. Looking into people’s heads [...] is an option that is simply not available.”

[2] To justify this separation on the formal level, we refer to Cori and Marandin (2001, p.61-63).

[3] The concept of “linguistic extension or portability” concerns the possibility of using the same calculation, either formal or algorithmic, to process different languages or different aspects of the same language. It is interesting to mention in this respect that one of the main arguments for justifying the transformational generative option is related to the non-portability of grammars out of context to deal with a range of constructions.

[4] Formalization consists of using a pool of explicit linguistic data to infer a set of logic formulae through a calculation formalism implemented by software in order to obtain inferred equivalences.

[5] To illustrate this, the best example in our field would be the use of ATN (Augmented Transitions Networks). In fact, writing an ATN is practically writing a program, because ATN is a procedural formalism which favours an exploration of the input from left to right and a strategy from top to bottom (see Gazdar and Mellish 1989, p. 96).

[6] In the classical rhetoric tradition, this “vouloir-dire” can be divided into three intentions: intentio auctoris, the author’s intention; intentio operis, the work intention; and intentio lectoris, the reader’s intention.

[7] “kull / jamî‘ al-mutarjimîn yuhibbûn al-lughât al-ajnabiyya ; Al-mutarjimûn jamî‘uhum yuhibbûn al-lughât al-ajnabiyya ; Yuhibbu jamî‘u al-mutarjimîn al-lughât al-ajnabiyya ; etc.”

[8] By “method” we mean a set of scientific procedures implemented to explain translations. We call “approach” a research oriented according to a specific point of view.

No comments: