Computational Linguistics - How Important Is Semantics? Compared to What?

Semantics has to do with meaning, the nature of which has been the subject of much philosophical debate: what is it exactly and how can it be represented? This essay is concerned with semantics from the perspective of Computational Linguistics, which is essentially concerned with building or attempting to build computational models of natural language. These Natural Language Processing or Understanding systems have a wide range of practical applications as well as providing insight into human understanding and perception.

The prominance of semantics in the title can be taken as suggesting that it is indeed highly relevant to Computational Linguistics. However it is also implied, quite correctly, that semantics is not the only relevant area of interest. Natural Language systems usually have a number of different components and briefly outlining them here will help to put semantics in context.

Phonetics is concerned with the analysis of spoken language. It is generally considered to be a specialized area of research and many centres for Computational Linguistics deal mainly with written language.

Morphology involves the analysis of the composition and (and also meaning) of individual words. Often at this stage that syntactic categories are assigned to words, since interpretation of affixes may depend on the category of a word. For example, drinks could be either a plural noun or a first person singular verb.

Syntactic analysis imposes structure on a flat string of words according to the grammatical categories of words. The resulting structure is referred to as a parse. Ambiguity is a major problem for parsing and for Natural Language Processing in general, since it leads to multiple parses of a single sentence many of which can be later rejected

Semantics is generally concerned with assigning meaning to the structures created by the syntactic parse. If no meaning can be assigned to the structure it is nonsensical and can be rejected; for example: "Colourless green ideas sleep furiously" (Chomsky 1957) is syntactically well-formed but semantically ill-formed.

Pragmatics can crudely be defined as an inferential process which relates a sentence to the context in which it occurs, in order to understand the conveyed meaning as opposed to the truth-conditional meaning. Both semantics and pragmatics need recourse to knowledge about the world or the domain being modelled. The difficulties that this requirement poses will be discussed below.

Although the processes introduced above are often seen as constituting modules, which implies that they are discrete and self-contained, this perhaps reflects the computational requirements of a system rather than reality, since the boundaries are often fuzzy.

In order to measure the importance of semantics in comparison to the other linguistic components of an NPL system we can look at what a system which stops short of incorporating semantic processes is capable of in comparison to one which does actually exploit semantics. Secondly to demonstrate the limitations of semantics we can briefly consider how pragmatics could extend the applications of a system.

Syntax and parsing is the most mature field of study in Computational Linguistics. Parsing uses a grammar with rules and a parser which matches rules to the sentence to infer structure into it. There are a number of different strategies: top-down and bottom-up, either of which can be combined with techniques for choosing between alternate paths. The resulting structure provides a basis for compositional semantic analysis. Not all systems carry out syntactic parsing. Direct semantic parsing is however computationally expensive since the semantic component has to choose its own constituents and a significant amount of inferencing is invoked. Furthermore it is not always possible to extract meaning without refering to grammatical facts.

(1) The sun orbits the earth

In (1) syntactic facts produce the correct interpretation in which the sun revolves around the planet earth despite this seeming semantically anomolous. Syntactic processing can deal with important linguistic generalisations about word order, number and case agreement

Lexical and structural ambiguity and the problems they cause are linked to the issue of knowledge representation. Lexical ambiguities arise when alternate meanings can be assigned to a word and structural ambiguities arise when there is more than one structure which can be assigned to a sentence.

(2) The results are represented on the table

...

This is a preview of the whole essay

(1) The sun orbits the earth

(2) The results are represented on the table

-piece of furniture

-way of presenting written results

(3) She made the dress with her new sewing machine

She made the dress with a low neck-line

NP VP

V NP PP

she made the dress with her new sewing machine

*with a low neck-line

NP VP

V NP

NP PP

she made the dress *with her new sewing machine

with a low neck-line

A syntactic processor would have no means of determining which lexicon entry for table to select or for ruling out the incorrect structure assignments for the second example. Some form of semantic knowledge is required.

Sublanguage systems (in MT) may represent an area of NLP application which functions adequately without incorporating semantic knowledge. Sublanguages characteristically have a well defined, restricted grammar and vocabulary. Consequently words only have the relevant (unambiguous) reading for the domain in question entered in the dictionary and the grammatical structures are more predictable and limited. This overcomes the problem of ambiguity. Sublanguages which are resticted enough are hard to find and texts rarely remain completely within the confines of the sublanguage. If multiple or incorrect parses do arise there is no way of choosing between or eliminating them.

This brings us on to a discussion of semantics which as already hinted is vital to enable disambiguation. Semantics provides a way of selecting among competing syntactic analyses of sentences, since it has access to knowledge about what makes sense and consequently also what does not. It generally achieves this by providing a mechanism for filtering out inappropriate parses. This mechanism is often implemented in the form of semantic features made use of by selectional restrictions. Semantic features can be associated with every sense of a word in the lexicon and in addition to placing conditions on the features of other lexical items with which the word can combine they typically specify aspects of the meaning.

(4) ball (spherical object / social event)

(i) she danced at the ball

(ii) I caught the ball

(iii) I held a ball (I organised a social event / I held a spherical object)

The social event interpretation of ball could be given the feature spatio-temporal and at could be marked in the lexicon as requiring a complement with the spatio-temporal feature. This would ensure the correct mapping between ball and its social event sense in the above example. A similar process could be used to ensure the correct sense of ball would be chosen in (ii). Sentence (iii) is problematic since, even using semantic features, two semantic interpretations are possible. This is not an unusual situation; sentences frequently contain a number of ambiguous lexical items leading to an explosion of different possible sentence interpretations. Semantic features only take into account a very limited context.

A further problem arises with features when the meaning of words is extended, that is when words are used metaphorically; a pervasive phenomena in laguage.

(5) My car drinks petrol

If the verb drink is specified in the lexicon as requiring an animate subject the sentence will be rejected as nonsensical, when in fact it is not. Wilks (1975a) introduced the idea of preference semantics, whereby dispreferred readings can be allowed when the preferred readings are not present However, metaphors can be shown to be systematic and preference semantics does not exploit this fact.

Semantic features are a rather crude technique for resolving ambiguity and have significant problems, such as determining the correct set of features in the first place and the need to introduce increasingly finer grained features. Furthermore, very new word may require a new feature to be added to the lexicon. As the number of features grows the lexicon becomes unwieldy. Some more elegant way of representing meaning or knowledge is required.

Semantics and Knowledge are inextricably linked. The amount of knowledge needed to understand even a simple children's story is emense and the problems of knowledge representation are of central concern to Computational Linguistics and Artificial Intelligence. No uniform semantics representation language has emerged and consequently there are different competing classes of representation. Each has advantages and disadvantages and useful areas of application.

First order logic and equivalent expressions have become very popular as a way to represent natural languge meanings. Logic oriented programming languages such as Prolog reflect or are a contributing cause of this and first-order logic can also be used by systems for database retrieval. Opponents to logisism argue that logic is too neutral to represent natural language adequately and that many natural language meanings cannot be captured in logic. Nowhere is this more evident than in the area of quantification, where even with the help of the lambda operator representations become horifically complex and counter intuitive. We will return to this point at the end of the section.

A central concept to this approach is that of compositionality, a brainchild of the philosopher Frege, whereby the meaning of the whole is equivalent to the sum of the meanings of the parts. Montague developed this idea and demonstrated its usefulness for natural language analysis. Although his ideas were not directly suitable for implementation they have provided inspiration (c.f. new style categorial grammars). In compositional approaches syntactic and semantic processing are separate, but every syntactic processing rule has a corresponding semantic one (not all formalisms adhere to this). Given a syntactically parsed sentence the meanings of individual words are looked up in the lexicon and then recursively recombined to form the meaning of the whole sentence. Quantified expressions cannot be accounted for naturally in strictly compositional systems. The scope of the quantifier in the example can be interpreted in different ways and is thus ambiguous.

(6) Every man loves a woman

$x ["x | man(x) (r) love(x, y)]

"x [$x | man(x) (r) love(x, y)]

To generate the different meanings compositionally from the parse we need two separate parses. This requires extra effort and there is no syntactic motivation for doing so. Alternatively a non-compositional algorithm can generate alternate scopes from a single parse. Adding extra bits and pieces onto a formalism does not, however, constitute a very elegant solution.

This compositional approach was assumed in the earlier discussion of ambiguity, where multiple parses were allowed and a semantic filter was applied to rule out nonsensical analyses. This seems to represent an inefficient approach. However it is also possible to apply a semantic interpretation to each sysntactic constituent as it is formed, i.e. incrementally. On the one hand this allows semantics to prune constituents that are syntactically valid, but make no sense. On the other hand this approach is expensive when syntactic processing builds consituents that will be rejected later as being syntactically unacceptable, regardless of their semantic acceptability. For example, garden path sentences.

(7) The horse raced past the barn fell down

Arguments exist for producing parses of complete sentences before semantic interpretation. For example, in quantification, where large constituents need to serve as the basis of semantic actions. Heuristics can be used to create an intermediate approach, although usually the extremes are used.

Network-based systems constitute a further class of approaches to semantic representation. Semantic nets which support simple property inheritance represent knowledge as a graph whose nodes represent concepts and links between nodes represent relationships between concepts.

Conceptual dependency, another network-based notation, allows syntactic and semantic knowledge to be combined in a single interpretation. Primitive actions, which theoretically should be able to be combined to represent any event in the world, have slots which are instantiated differently for each intance of an action.

Network-based systems have been absorbed by more structured representations such as frames (Minsky) and partitioned semantic networks. They were unable to account for anything more than superficial prototype phenomena and could not account for expectations invoked by familiar situations. In addition there is evidence that they are equivalent to propositional logic since they do not support quantification. Propositional logic is weaker than first-order logic.

Semantic grammars combine syntactic and semantic knowledge into a single set of rules. They are used directly to produce a semantically oriented parse. The advantages of such grammars are that thet reduce the number of processing stages; ambiguities do not arise in the same way as they do in syntactic parses; and syntactic issues that do not affect the semantics can be ignored. Serious problems arise by their failure to capture syntactic generalizations which increases the number of rules and consequently the cost of the parsing process.

As in the case of syntactic systems, those which also incorporate semantic knowledge operate best in restricted domains. The semantic knowledge needed to solve the limitations of syntactic processing, i.e. ambiguity, has come to pose a significant problem in its own right. How do we representenough of it? Knowledge is always in a state of flux and needs to be updated continuously. There has, however, been reasonable success with systems operating restrcted and relatively static domains which enable the meaning of words and of phrases to be restricted. Encoding the semantics of words and phrases for a particular application domain represents a significant cost and knowledge acquisition procedures to reduce this cost would have a great impact on the applicability of the technology.

Although semantics broadens the applications of a system semantic meaning is relatively context independent and does not look beyond the confines of single sentences. To understand texts and dialogues we need to enter into the rhelm of pragmatics, which makes use of linguistic and nonlinguistic contexts. To do this a large amount of knowledge is required (once again). In the case of linguistic context the meaning of a sentence may depend on preceding sentences, as well as potentially influencing the meaning of following sentences. An example of this is determining the reference of pronouns. Pronominal and other referring expressions have received a lot of attention and are consequently relatively well understood.

(8) (i) Mike had measles. Jane caught it.

(ii) Mike dropped the vase. Luckily Jane caught it.

In the sentences above in order to be able to determine the correct sense of caught we need to know what it refers to. There are different computational approaches to this problem. Heuristics based on recency are easy to implement, but a pronoun often refers to an item which is not in the previous sentence. This gives rise to the need for a more thoughtful analysis. Grosz maintains that discourse is tree shaped rather than linear and a reference may be back to an object which is chronolgically distant, but close in underlying discourse structure. This is strong argument for getting the theory behind the computational implementation correct.

Nonlinguistic contexts provide knowledge about the person who produced the utterance, the goals of the communication and various other things which we use to understand utterances.

(9) Did you see Bob?

(i) The speaker wants to know if the addressee visually perceived Bob

(ii) Bob was wearing purple flares. The speaker knows the addressee saw Bob.

The speaker is commenting on this fact.

The above sentence could have interpretation (i) or (ii) according to alternate contexts.

In a complete system each stage has a part to play. Semantics broadens the applications of a system beyond those made possible by syntax alone, but a broader context than that considered by semantics needs to be taken into account for any serious understanding to take place. Semantics does have a central role to play in NLP systems, but the attendant problems of knowledge representation mean that this role is somewhat inhibited. A further point is that the nonlinear interaction between the different linguistic components of a system needs to be given more attention. The revised framework which would hopefully result from such research would provide a better context within which to re-examine the relative importance of semantics.

Bibliography

Bates, M. & Weischedel, R. 1993. Challenges in Natural Language Processing, Cambridge University Press

Gazdar & Mellish 1989. Natural Language Processing: an Introduction to Computational Linguistics

Hutchins, W. & Somers, H. 1992. An Introduction to Machine Translation, Academic Press

Levinson, S. 1983. Pragmatics, Cambridge University Press

Rich, E. & Knight, K. 1991. Artificial Intelligence

Whitelock, P., Wood, M., Somers, H., Johnson, R., Bennett, P. 1987. Linguistic Theory and Computer Applications, Academic Press