• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Rule-based vs. SMT: Idiomatic expressions and collocations

Extracts from this document...


´╗┐Rule-based vs. SMT: Idiomatic expressions and collocations 17 juin 2011 Abstract Languages complexity is reinforced by the use of elements such as collocations and idiomatic expressions. Since the creation of Machine Translation systems (MT), those collocations and idioms generate complications in translating data, due to their syntax as well as their omnipresence. In this paper, we describe how two different commercial MT Systems are managing them and the results arising from their methods. The procedures examined will be the rule-based approach (Systran) and the statistical one (Google) for the language pair French-English. The obtained results show evidence concerning the influence of adjacency and insertion of alien elements in segments, in the achievement of a quality output as much as the impact of colorful and metaphorical elements. INTRODUCTION E. Wehrli and D. Anastasiou pointed out that current Machine Translation systems are facing difficulties both rule-based systems and statistical ones, in tackling the issue of idioms and collocations. The key element for a proper translation is the identification of those elements, their extraction in the source-text in order to propose a correct output. As they fail to do so, the target text is often too literal, compositional-like (the term?s meaning are summed up to form a sentence) and sound unnatural. As a result, bilingual resources are highly needed. In Section 2 is briefly exposed what collocations and idioms are and which problem they are setting while in Section 3 will be tackled our set of idiomatic expressions and collocations through a rule-based MT Tool and a statistical one, and in the fourth Section, stands the conclusions. COLLOCATIONS AND IDIOMS Collocations and idioms are subclasses of multiword expressions, in a given syntactic relation. ...read more.


is mistakenly translated ?struck coffee? The system just added the meaning each word has separately to propose a translation. A refuse to translate, a different meaning is conveyed: e.g.: chercher des noises (pick a quarrel with) for which only the verb was translated: the system didn?t recognize the word ?noises? and let it in its French form and which has another meaning in English: a sound, something audible while in French it means ?problem?. CONCLUSION In this article, it has been shown how the rule-based system and the statistical one were treating collocations. It appeared that in the process of translating continuous segments, Google statistical approach is more efficient than the Systran rule-based approach. In fact, out of 38 sentences, Systran failed to find a non-literate translation for 28 of them while Google managed to translate 16 of them. The fact that most of them were colorful expressions, metaphorical or that the words used were polysemous caused troubles. Google?s better outcomes result from the fact that it is corpus based, and it may be confronted with collocations with the large amount of texts it requires for its translations. Systran is not well equipped to succeed with only a grammatical approach. The words association in collocations gives a meaning to the segment that cannot be expressed with other words: if the systems do not have the expression, collocation, idiom, registered in their data-base, the probability that they will provide a non-literate translation is very remote. The systems will tend to stick to the source language in order to be able to convey the text. The meanings of each sole word will be added and this is the major problem for MT Systems. ...read more.


Correct translation : Both countries should strike a fair balance. Systran proposes: The two countries should find a right balance Google proposes: The two countries should strike a Ø balance Donne une chance = give a chance L’objectif était de donner à toutes les entreprises de la région les mêmes chances. Literal translation : The objective was to give to all the enterprises the same chances. Correct translation: The objective was to give all the undertakings in the regions the same chances. Systran proposes: The objective was to give to all the companies of the area the same chances. Google proposes: The objective was to provide all area businesses the same opportunities Constituer une menace = pose a threat Cela constitue une dangereuse menace à la santé individuelle. Literal translation : That constitutes a high grade threat to individual health Correct translation: It poses a serious threat to individual health. Systran propose: That constitutes a dangerous threat with individual health. Google propose: This is a dangerous threat to individual health Assurer la présidence = hold the presidency Le premier ministre Irlandais assure l’actuelle présidence tournante du Conseil européen. Literal translation: The first minister Irish assures the actual rotative presidency of the European Council. Correct translation: The Irish Prime Minister holds the current rotating presidency of the European Council. Systran proposes: Irish the Prime Minister takes the current rotating presidency of the European Council. Google proposes: The Irish Prime Minister assures the current rotative presidency of the European Council NOTE: Taking of one word in the source segment (actuelle) enables Google to translate correctly The Irish Prime Minister holds the rotative presidency of the European Council. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our University Degree Argumentative or Persuasive Essays section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related University Degree Argumentative or Persuasive Essays essays

  1. What are the principles of X-bar theory?

    It is worth mentioning, however, a crucial difference between adjuncts and complements when they are used in X-bar theory: adjuncts are iterative (that is, they show the property of recursivity) and can thus generate infinite strings of nodes, whereas complements do not possess this quality (as on the left side

  2. Discuss the strengths and weaknesses of both the Prescriptive and Descriptive approaches to language. ...

    It depends on what kind of document is being written. It would be better not to use the contracted form in case a business letter or a legislative act is being written. However, if it is a personal letter or a piece of literature, using "it's" is perfectly acceptable.

  1. A sociolinguistic study on Singaporean teenagers use of language on Facebook a ...

    It thus has developed into a significant platform on which social interaction takes place on the World Wide Web among teenagers, and thus it is very likely that a significant new variety of English has emerged which can be analysed.

  2. The Influence of Anne Fisher

    the seventeenth century for the use of genitive singular forms and in the eighteenth century for the genitive plural forms" (Moessner, 2000: 395). This view contrasts to that of Johnson, who appears to accept the irregularities within the language rather reluctantly, stating that "Every language has its anomalies, which though

  1. Bruner and Wittgenstein: Language Learning

    And yet why should we engage in ostensive definition at all, having accepted the larger picture of language as self-referential? More seriously, why should it be needed as a preparation for understanding? Epistemological-metaphysical doubt in this setting is misplaced, as both Bruner and Wittgenstein agree.

  2. Comparing the pesentation of a political story in two British newspapers.

    The texts also show two very different writing personas, the first one is more expert and impartial and the second one less specialist and to some degree interpersonal. The Daily Telegraph using neutral specialist language ( although shared situation assumed), while second one focuses solely on negatives and passes judgements using colloquial language, many metaphors and personal references.

  1. In Carver's Jerry and Molly and Sam, the main character, Al, is a ...

    His encounters in life are not graceful at all. He is not happy with his wife, Betty, and stay in marriage just because of his kids; he is not happy with his family dog, Suzy, and wants Sam, the dog he used to have, back; he is worrying about being

  2. What is the usefulness of comparing human language with other communication systems? How similar ...

    and we were to compare human language with the early stages of communication of children, or pathological disturbances, we are also able to learn about the development of language, and even speculate about the structural organisation of the human brain.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work