The operating mechanism...might act upon other things besides numbers, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should be also susceptible of adaptations to the action of the operating notation and mechanism of the engine (Lovelace 1961)
The more recent translation of these notions into practise enabled high-level programming languages to be developed, which are oriented towards the needs of different categories of tasks and free the programmer to deal with the problem in the most appropriate manner, rather than having to adapt it to the way the computer fuctions, this being taken care of by interpreters. Consequently a number of different programming styles have developed. There has been a perceptable shift from procedural to more declarative programming styles. The declarative representation of information has the decided advantage of enabling the same knowledge to be used in different ways. The separation of data and processes is very important for NLP. It represents a way of decomposing the task and increasing its tractability. Often linguistic information is declarative. For example, the definition of objects and the relationships which hold between them; whereas the procedural information, that is, the manipulation of these objects corresponds to more computational aspects of the system. This division also aids distinguishing and correcting different types of system errors, that is errors in the algorithm versus data errors.
Lisp and Prolog are two very popular programming languages for NLP. Prolog is a language in the declarative vein:
The prolog approach is rather to describe known facts and relationships about a problem than to prescribe the sequence of steps taken by a computer to solve the problem. (Clocksin & Mellish 1989)
A programmer is able to specify the rules or grammar as data and the procedural elements are left up to the computer, since Prolog has a built in control structure .
Lisp is not purely declarative and has more procedural traits than Prolog. In Lisp operations on symbols and structures can also be expressed directly. When facts and rules are represented as data-structures interpreters must be implemented to use the information for a particular task.
It is also important that programming languages used in NLP should support recursion since this operation is vital. For example an Adjectival Phrase can contain infinitely many adjectives:
AdjP (r) Adj AdjP
AdjP (r) Adj
The big, yellow, round ball
Prolog and Lisp are both languages which do not place restictions on procedures calling themselves.
Modularity concerns the overall design of a system. It makes sense to recursively divide a problem into subproblems, which are simpler to program, since they separate out different aspects of a system. This in turn makes it easier to localize and rectify algorithmic errors. Developing and extending the system is also facilitated, since a programmer can determine the effects that his changes are going to have. Modularity also implies flexability and reusability: a particular module may be utilized in differing ways by separate parts of the system. Similarly, the modules may be recombined and built up to solve different but related problems. New modules can be added to meet new requirements.
The components of an NLP system can be viewed as linguistically motivated modules. Linguistic motivation prevents the modularity of the system being compromised merely because it is computationally convenient. If these components are implemented to operate in sequence then they should dovetail by means of an interface and may need to appeal for assistance to one another, since in reality language is not discrete.
Further aspects of system design include the incorporation of debugging tools to enable faults to be corrected and maintenance to be carried out. A popular method which is increasingly employed for developing systems is rapid prototyping. A trial system is quickly assembled and tested. Revisions suggested by the testing are then implemented and the process is repeted. Incremental design of this type prevents whole systems from being implemented before faults are discovered.
Considerations of real time system operation are of particular significance to interactive NLP systems, which appeal to the user for a response or assistance. Since prototypes are at a reduced scale, it is necessary to calculate the effects on speed and complexity of scaling the system up to the size demanded by commercial applications.
The concept of size and scale lead us on to consider the different knowledge-bases that a successful system needs recourse to. Every system needs some kind of lexicon containing semantic knowledge and may also have knowledge-bases containing real world knowledge. Large lexicons require large amounts of storage space and a sensible idea (employed by the Metal MT system) is to divide the lexicon into parts pertaining to different subdomains of language, e.g. legal jargon. Specialized modules can be loaded as required. Some method is also required to maintain consistencey in a system's knowledge-base.
Parsing is an important aspect of natural language analysis. Before a computer can realistically do anything with a sentence it needs to be given structure. A parser uses a predefined grammar to impose structure on a flat string of words. If no structure can be inferred then the sentence is ungrammatical. The resulting structure can be viewed as a scaffold for supporting semantic analysis, which would otherwise have to choose its own constituents; a costly process. The grammar is an abstract declarative representation of well-formed combinations of words and the parser is an explicit set of instructions for matching these rules against input sentences. Although the two components are semi-independent the choice of representation for the grammar is still important. The commonest mode of representation is a set of production rules, such as the one below:
S (r) NP VP
"a sentence consists of a noun phrase followed by a verb phrase"
Since efficiency is an important computational criteria it seems logical to attempt to find the most efficient parsing technique. There are a number of different techniques each with advantages and disadvantages. Top-down and bottom-up parsing techniques represent two ways of exploiting the grammar. Top-down parsing starts with the start symbol S and applies the rules from left to right until the terminal symbols (words of the sentence) are found. Bottom-up parsing starts with a sentence and applies the rules right to left until the start symbol is encountered. In choosing between these two strategies the branching factor is a major consideration, since the higher the branching factor the lower the efficiency. Large grammars increase the downwards branching factor. Another factor is whether it is possible to use partial information to rule out certain paths at an early stage. It is possible to combine the two approaches in a bottom-up parser with top-down filtering.
Apart from the direction in which a search is conducted the order in which the different paths are explored can also vary. The benefits of each strategy need to be weighed against the costs. Perhaps the best known methods for producing parses of a sentence are breadth-first and depth-first search. In breadth-first search all possible paths are followed in parallel. The major disadvantage of this is that many constituents which must ultimately be discarded are built, which can make it very inefficient computationally. Depth-first search follows one path at a time, recording information which may be needed to backtrack and choose an alternate path if the present path fails to lead to a successful parse. A lot of effort is wasted in recording information which will not be used and the same constituent may be reanalyzed over and over again. Some form of dependency directed backtracking which would enable constituents to be left alone if they are independent of the problem and its cause could remedy this, although the implementation of the parser would then become more complex.
The inherent ambiguity of natural language often means that to find all possible parses of a sentence an exhaustive and thus expensive search needs to be made. This expense can be reduced by accepting the first successful parse and only producing a second one if the initial parse is subsequently found to be incorrect.
A tree, although the simplest data structure for recording a parse, is not the only one. A chart is a particularly useful data-structure for recording successfully parsed constituents which can be reused on alternate search paths. This makes unnecessary the practise of backtracking. Chart parsers are extremely flexible and because of the neutrality of representation can be used to implement a wide range of search strategies. ATN based parsers, which charts superceeded, lacked this flexability and failed to separate the data from processes. This procedural nature prevents the same grammar from being used for analysis and generation. Despite the present emphasis which is placed on differentiating declarative and procedural representations, there is actually no clear distinction since they are simply part of a spectrum whose declarative extreme implies more flexability than does the procedural one.
As is evident from the above discussion, approaches to NLP are influenced and constrained by computational methods and architectural considerations. The history of machine translation highlights the dependancy of NLP on the state of the art of computer science. For an NLP system to be good in computational terms does not mean that it will satisfy other criteria, such as psychological plausability. However, a new set of changes are at present taking place in computer science. Increasing interest in parallel distributed processing may mean that these different criteria will come to be satisfied simultaneously.
Bibliography
Bates, M. & Weischedel, R. 1993. Challenges in Natural Language Processing, Cambridge University Press
Gazdar & Mellish. 1989. Natural Language Processing: an Introduction to Computational Linguistics
Hutchins, W. & Somers, H. 1992. An Introduction to Machine Translation. Academic Press
Rich, E. & Knight, K. 1991. Artificial Intelligence