Preliminary remarks
An important source of ambiguity in natural language is the polysemy of lexical elements, and this is certainly one of the most intricate problems of semantic description. It is easy to observe that many words in many sentences can be understood quite differently, but it is notoriously difficult to give a systematic account of this phenomenon in a description of a language. Bloomfield was so aware of the problem that he despaired of any satisfactory treatment of semantics. I will restrict myself in this paper to some preliminary observations on the polysemy problem, in connection with the description of ambiguity.
Let me point out at the beginning that ‘polysemous’ and ‘ambiguous’ will not be regarded here as equivalent notion. In the field of lexical description also, terms like ‘polysemous’, ‘ambiguous’, and ‘having more than one interpretation’ are often used as if they were interchangeable. Thereby they become cover terms that indicate linguistically quite different kinds of ambiguity. However, as I have stated in the previous section, Their origin and nature touch upon a number of central issues that must be addressed by and theory of semantics. It is, by any means, the task of a linguistic description to make the differences between kinds of ambiguity more explicit.
To achieve this goal, it is necessary first of all to maintain a basic distinction between the content of a sentence and its interpretation. By the content of a sentence I will understand: the inherent semantic structure of a sentence as a type, such as it is specified in a linguistic description. By the interpretation of a sentence I will understand the various ways in which one and the same sentence can be understood in each unique case of language use, or, in Katz and Fodor’s terms, the different ‘readings of a sentence’.
As a corollary to this distinction, I will distinguish between (i) the inherent meaning of a lexical element—its full specification in the lexicon, (ii) the possible further specification of its inherent meaning in the context of a particular sentence, (iii) the possible further specification in the interpretation of a sentence in language use. Some lexical elements that can be understood in more than one way will accordingly be represented in the lexicon with distinct entries corresponding their various senses. But the mere fact that a lexical element can, on different occasions, be understood in more than one way, is not in itself a sufficient reason to represent it as having more than one distinct ‘sense’. To cut the question short, I will henceforth use the term ‘inherently polysemous’ only to refer to lexical elements for which more than one entry is given in the lexicon.
There are three types of lexical ambiguity: polysemy, homonymy, and categorial ambiguity. Any practical natural language understanding system must be able to disambiguate words with multiple meanings, and the method used to do this must necessarily work with the methods of semantic interpretation and knowledge representation used in the system.
Polysemous words are those whose several meanings are related to one another. e.g.
Ambiguity describes the linguistic phenomenon whereby expressions are potentially understood in two or more ways; an ambiguous expression has more than one interpretation in its context
One of the most pervasive phenomena in natural language is that of ambiguity. This is a problem which confronts language learners and natural language processing systems alike; by that token, it confronts linguists compiling a lexicon for a languae. The notion of context enforcing a certain reading of a word—i.e. selecting for a particular word sense—is central both to global dictionary entry design (this is the question of breaking a word into word senses) and local composition of individual sense definitions. However, current dictionaries reflect a particular ‘static’ approach to dealing with this problem: the numbers of, and distinctions between, senses within an entry are ‘frozen’ into the lexicon at compile time; furthermore, definitions hardly make any provisions for the notion that boundaries between word senses may (and do, as we show below) shift with context.
All natural languages have two types of ambiguities: both syntactic and semantic. The syntactic ambiguities affect the shape of parse trees and are therefore called structural ambiguities. There are four major kinds:
1. Multiple parts of speech for a single word;
2. Different parse trees for the same sentence;
3. Unresolved referents for pronouns and definite noun phrases;
4. Unclear scopes of quantifiers and negation.
Semantic ambiguities, also called lexical ambiguities since they depend on the meanings of words, have been largely neglected in formal theories. Their origin and nature, however, touch upon a number of central issues that must be addressed by any theory of semantics. There are two major kinds of lexical ambiguities:
1. Homonymy, where two or more historically distinct words happen to acquire the same pronunciation and often the same spelling as well;
2. Polysemy, where a word has a number of closely related meanings.
Examples of homonymy include page in a book vs. page as an attendant or ball as a rounded object vs. ball as a dance. Polysemy is a more common kind of lexical ambiguity where the differences between senses tend to be small, subtle, and hard to distinguish. One example of polysemy is the word support with its multiple meanings that were discussed earlier. Another example is the verb yield in the following sentences: Two molecules of H2 and one molecule of O2 yield two molecules of H2O. Vehicles approaching from the entrance ramp must yield to oncoming traffic.
What distinguishes homonymy from polysemy is a clear break in the range of meanings. For polysemous words, different dictionaries usually list different numbers of meanings, with each meaning blurring into the next. For homonyms, however, dictionaries usually agree upon the number of distinct groups of meanings. The word ball as a rounded object, for example, is derived from an Old English word with similar meaning; the word ball as a dance was borrowed from French in the 17th century. The page in a book comes from the Latin pagina, and the page as an attendant comes from the Italian paggio. Each of these homonyms has polysemous variants, but there are no intermediate meanings of ball or page that blur the distinction between the homonym. As these examples illustrate, homonyms arise from distinct word forms that accidentally come together, either because of borrowing (as with ball) or because of sound changes that lose distinctive features (as with the merger of pagina and paggio to form page).
Unlike homonyms, which result from linguistic processes of borrowing and sound change, polysemous variants result from the complexities of mapping language to the world. As an example of polysemy, consider the term oil well. Most dictionaries give only one meaning for the term, and most MT systems would have no difficouty in translating it to another language. Yet one oil company found a serious ambiguity in its definition. In their geological database, an oil well was defined as any hole in the ground drilled or dug for the purpose of obtaining oil, whether or not the hole proved to be dry. In their financial database, however, an oil well was defined as a pipe connected to one or more holes in the ground that produce oil. The financial database therefore ignored all the dry holes and omitted details about individual holes that were grouped with others in a single ‘oil well’. The discrepancy was unimportant as long as the two databases were kept separate. But when management wanted to correlate rock formations with production, they found that they could not merge the two databases.