Faithfulness to the source language - fluency in the target language
The quality of the conveyed texts depends on the pair of languages used: Spanish-French or Dutch-German will be easier to handle than the French-German pair. As Google Translate proposes a statistical matching rather than a dictionary/grammatical approach, oddities can occur: swapped terms, obvious errors, nonsensical sentences… Such a system requires profuse high quality parallel corpora, data being notably expensive. "Solid base for a usable statistical machine translation system: bilingual text corpus of more than 1million words + two monolingual corpora of each more than 1billion words"
However, if we compare it with the rule-based system Systran -which we will introduce later-, Google achieve better results than its concurrent. For discontinuous segments, Google reaches 44,4% of success (33% for Systran) and concerning continuous segments, its success rate is of 42,11% compared to 33% for Systran.
Translating discontinuous segments turn out to be a difficult task to accomplish because elements can seep in collocations. Taking our example “to hold presidency” = “assurer la présidence”, Google cannot propose a correct translation of the sentence :
Le premier ministre Irlandais assure l’actuelle présidence tournante du Conseil européen.
= literally ‘the first minister Irish assures the actual rotative presidency of the European Council’
The Irish Prime Minister holds the current rotating presidency of the European Council.
Google proposes: The Irish Prime Minister assures the current rotative presidency of the European Council
The statistical system didn’t identify the collocation in the source text “assurer présidence”. It may be because it doesn’t dispose of this collocation in its bilingual corpora, as a result, it stuck by the French vocabulary, saying, word.
Modifying the sentence a little bit leads to a different result:
Le premier ministre Irlandais assure la présidence tournante du Conseil Européen.
= literally “The first minister Irish assures the rotative presidency of the European Council”
The Irish Prime Minister holds the rotating presidency of the European Council.
Taking of one of the adjectives placed in-between the base element and the collocation one, Google managed to propose a proper output as the distance between the verb and the object is reduced.
SYSTRAN TRANSLATION TOOL
SYSTRAN MT system is contrary to GOOGLE, a rule-based system driven by linguistic rules and subject-specific dictionaries.
The European Commission, the US Intelligence Community, Global corporations, and internet portals … notably take advantage of it to convey documents from one of the source language it proposes to a target one. It proposes over 35 language pairs and 20 vertical domains.
As presented earlier, GOOGLE shows better results than SYSTRAN in translating both discontinuous and continuous segments.
Within the continuous collocations and segments they are some categories for which its results are not remarkably different from Google ones.
For the category Verb+ noun, Systran’s error rate if of 86,6% and Google one’s of 80%
The main errors, as for Google, are:
An inadequate translation : e.g. : temps libre (spare time) is translated “free time”
The meaning is transmitted, but it is not the more natural way to convey this idea for natural speakers.
A literal translation: e.g. : café frappé (iced coffee) is mistakenly translated “struck coffee”
The system just added the meaning each word has separately to propose a translation.
A refuse to translate, a different meaning is conveyed: e.g.: chercher des noises (pick a quarrel with) for which only the verb was translated: the system didn’t recognize the word “noises” and let it in its French form and which has another meaning in English: a sound, something audible while in French it means “problem”.
CONCLUSION
In this article, it has been shown how the rule-based system and the statistical one were treating collocations. It appeared that in the process of translating continuous segments, Google statistical approach is more efficient than the Systran rule-based approach. In fact, out of 38 sentences, Systran failed to find a non-literate translation for 28 of them while Google managed to translate 16 of them. The fact that most of them were colorful expressions, metaphorical or that the words used were polysemous caused troubles. Google’s better outcomes result from the fact that it is corpus based, and it may be confronted with collocations with the large amount of texts it requires for its translations. Systran is not well equipped to succeed with only a grammatical approach.
The words association in collocations gives a meaning to the segment that cannot be expressed with other words: if the systems do not have the expression, collocation, idiom, registered in their data-base, the probability that they will provide a non-literate translation is very remote. The systems will tend to stick to the source language in order to be able to convey the text. The meanings of each sole word will be added and this is the major problem for MT Systems.
Both systems are unable to provide adequate translations to the discontinuous segments above. Out of 9 sentences, 4 contain a correctly interpreted segment (other mistakes in the sentence can be found).
Google provides for discontinuous segments better results than Systran as for continuous segments.
We can as a result conclude that the statistical approach is more efficient in the process of translating idioms and collocations.
The main problems for machine translation software are:
the non-identification of collocations having dramatic consequences on the output text.
It is the main point on which attempts are being made “to devise accurate techniques for collocation extraction from corpora”
the frequency of appearance (1 collocation per sentence in average)
the opacity of idioms, collocations (e.g: kick the bucket)
the possible discontinuity due to alien elements insertion, leading the components to undergo a change in the word order.
The polysemy of words
Wehrli and Anastasiou researches showed that the ability of Machine Translation systems notably relies on the adjacency of words.
E. Wehrli stated that the insertion of three (and more) words between a verb and its object were leading to worse results for the three systems he worked with.
D. Anastasiou pointed out the fact that both systems were unable to process discontinuous items and that the solution to this problem would be the addition of more entries to the dictionaries idioms and the enrichment of corpora with more “continuous and mainly discontinuous idioms, in order to set high standards to face the difficult task of automated idiom matching and translation”
ANNEXES
CONTINUOUS COLLOCATIONS :
SYSTRAN GOOGLE
ADJECTIVE + NOUN/ADJECTIVE:
Petits boulots correct translation correct translation
=Odd jobs
Chinoiseries administratives administrative complications correct translation
=Red tape
Faux jeton false toke false token
=Hypocritical
NOUN + ADJECTIVE/NOUN:
Temps libre
= Spare time free time correct translation
Soupe populaire correct translation correct translation
=Soup kitchen
Odeur épouvantable terrible odor awful smell
=An appalling smell
Suspense insoutenable insupportable correct translation
=Unbearable suspence
Accueil chaleureux correct translation correct translation
=Warm welcome/ hearty welcome
Ciel étoilé starry sky starry sky
= Open sky
Sourire radieux to smile radiant radiant smile
=Bright smile
Tia Juana's bright smile lighten's up the home.
Marrons chauds correct translation chestnuts
=Roast chestnuts
Along with boiled or roasted chestnuts
Savant fou scientist gives correct translation
= A mad-doctor/scientist
Café frappé struck coffee correct translation
= Iced coffee
Petit noir small black small black
=Strong black coffee
Voix suave correct translation sweet voice
= Suave voice/ dulcet tones
Omelette baveuse omelet slobbery correct translation
= Moist/ Runny Omelette
Soupe au lait soup with milk milk soup
= have a short fuse
Bâtonnets de poisson correct translation correct translation
=Fish sticks
Fish sticks, French fries, potato chips are typocal foods that contain trans fat.
VERB + NOUN
Etre chocolat be chocolate be chocolate
= be uncoath with something you expected
Se lever du pied gauche to rise left foot rise of the left foot
= get out of the wrong side of bed
Donner un cours make a course lecture
= teach a course
Raconter des salades to tell salads bullshit
= Jive talk
Avoir le cerveau comme une passoire to have a braine like a strain to have a brain like a sieve
= To brain like a sieve
Passer son tour pass it’s turn skip turn
= Pass on one’s turn
Atteindre un compromis correct translation correct translation
= Reach compromise
Tenir la chandelle hold the candle hold a candle
= play the gooseberry
Tenir la jambe hold the leg hold the leg
= to bend the ear of someone
Avoir un violon d’Ingres having a hobby correct translation
= Have a hobby (horse)
Jeter les dés throw the dices throw the dice
= Roll the dice
Chercher des noises to seek noises mess with
= Pick a quarrel with
Rappeler quelque chose Does that point something out? Does it remember something?
= Ring a bell
Does that ring a bell?
Prendre une pause take a pause correct translation
= Take a break
Epargner de l’argent correct translation saving money
Save money
VERB+ ADJECTIVE
Être marron be maroon be brown
= Be stuck
Voyager léger correct translation correct translation
= To travel light
Gareth travels lightly.
Faire une fausse couche to make miscarriage correct translation
Have a miscarriage
OTHER:
Sur un petit nuage on a small cloud on a cloud
= On a cloud nine
La prunelle de mes yeux : the pupil of my eyes correct translation
= the apple of my eye
DISCONTINUOUS COLLOCATIONS
Rendre fou
= Drive mad
La solitude peut rendre n’importe quel être doté de sensations complètement fou
Literal translation :
The loneliness can make any being endowed with feelings completely insane.
Correct translation
Loneliness can drive any sentient being completely mad.
Systran proposal:
Loneliness can return any being equipped with feelings completely insane.
Google proposal:
Loneliness can be fitted to any sensations crazy
Battre un record
= Break a record
Les Italiens ont battu le record de longévité
Literal translation :
The Italians have beaten the record of longevity
Correct translation:
Italians broke the record for longevity
Systran and Google’s proposal (identical):
The Italians broke the record for longevity
Combler le fossé
= Bridge the gap
L’énorme challenge de combler le fossé.
Literal translation :
The enormous challenge of filling in the ditch
Correct translation:
The enormous challenge of bridging the gap
Systran proposal:
The enormous challenge to fill the ditch
Google’s proposal:
The enormous challenge of bridging the gap
Infliger des dégâts
= Wreak havoc
La famine ne peut continuer à infliger des dégâts si déplorables
Literal translation :
The famine can’t continue to inflict on damage so regrettable
Correct translation:
Famine can’t continue to wreak such deplorable havoc
Systran proposal:
The famine can’t continue to inflict so deplorable damage
Google proposal:
Famine can not continue to inflict appalling damage if
Conclure un accord
= conclude an agreement
Le conseil a conclu un accord de paix
Literal translation:
The council has concluded an agreement of peace
Correct translation :
The Council concluded a peace agreement
Systran proposes:
The Council concluded a peace agreement
Google proposes:
The board has concluded a peace agreement
Trouver l’équilibre/ un équilibre
= strike a balance
Les deux pays devraient trouver un juste équilibre
Literal translation :
The two countries should find a fair balance.
Correct translation :
Both countries should strike a fair balance.
Systran proposes:
The two countries should find a right balance
Google proposes:
The two countries should strike a Ø balance
Donne une chance
= give a chance
L’objectif était de donner à toutes les entreprises de la région les mêmes chances.
Literal translation :
The objective was to give to all the enterprises the same chances.
Correct translation:
The objective was to give all the undertakings in the regions the same chances.
Systran proposes:
The objective was to give to all the companies of the area the same chances.
Google proposes:
The objective was to provide all area businesses the same opportunities
Constituer une menace
= pose a threat
Cela constitue une dangereuse menace à la santé individuelle.
Literal translation :
That constitutes a high grade threat to individual health
Correct translation:
It poses a serious threat to individual health.
Systran propose:
That constitutes a dangerous threat with individual health.
Google propose:
This is a dangerous threat to individual health
Assurer la présidence
= hold the presidency
Le premier ministre Irlandais assure l’actuelle présidence tournante du Conseil européen.
Literal translation:
The first minister Irish assures the actual rotative presidency of the European Council.
Correct translation:
The Irish Prime Minister holds the current rotating presidency of the European Council.
Systran proposes:
Irish the Prime Minister takes the current rotating presidency of the European Council.
Google proposes:
The Irish Prime Minister assures the current rotative presidency of the European Council
NOTE:
Taking of one word in the source segment (actuelle) enables Google to translate correctly
The Irish Prime Minister holds the rotative presidency of the European Council.
REFERENCES
MT2 Lieve Macken- Why computer translation is hard
1 collocation or idiom at least/ sentence in average Cf. Sinclair 1991/ Howarth & Nesi 1996
Cf. Nakhimovsky and Leed 1979
A research from Goldman et al. (2001, 62) established that up to 30 words may separate the collocation items of a sentence. (cf. Wherli)
Och, Franz Josef (2005-09-12), "Statistical Machine Translation: Foundations and Recent Advances" (PDF), The Tenth Machine Translation Summit, Phuket, Thailand, retrieved 2010-12-19
Cf. Werhli Collocations in a Rule-Based MT System: A Case Study Evaluation of Their Translation Adequacy
Cf. Anastasiou Identification of Idioms by Machine Translation