Const Name = Hello World
Error: The words ‘Hello World’ should be enclosed in quotes
Once you have lexically analysed every line in a program, you end up with a very long string of tokens. This is passed to the syntax analysis stage.
Syntax analysis
The syntax analyzer attempts to make sense of the program. There are two parts to this process:
- Syntax analysis
- Semantic analysis
Syntax Analsis
It is the process of determining whether the sequence of input characters, symbols, items or tokens form a valid sentence in the language. The examination of a stream of tokens and determining if it comprises a sentence in a given language is called parsing that stream. Parsing does not ascribe meaning to the stream of tokens but simply determines whether or not the sequence of tokens within the stream comprises a valid sentence.
Syntax error: a statement in the program violates a rule of the language, this could be a simple misspelling of a keyword, a missed punctuation mark, or a wrongly statement .
Examples of syntax error in a Pascal program:
Writteln(‘This statement contains 2 syntax error);
- Misspelt keyword(writeln ) and missing quote-mark (after error, there must be 1 ‘)
Whil#e (I<1) do
It is the ordering of the words that allows us to ascribe meaning to sentences.Every language has a set of syntactic rules, that defines those orderings of the language that are valid. These syntactic rules are sometimes called the grammar of the language.
Syntax analysis does the following:
- checks the program is grammatically correct.
- The splitting of complex statements into simpler ones
- Creation of a dictionary to hold details of variables (symbol table)
Symbol Table
The symbol table plays a central role in the compilation process. It will contain an entry for every keyword, identifier and assigned value in the program. It will also contain information about each of the entries, typically:
-
The kind of item(simple variable, structured variable such as array, procedure, keyword etc.)
-
The type of item(integer, real, char, etc)
-
The run-time addressi of the item or its value if it is a constant.
- Other information such as the bounds of an array, information about parameters passed to a procedure.
It is possible to have a statement that is syntactically correct but has no meaning.
This table contains an entry for every keyword, variable, constant and operator (called identifiers)
Generating the Symbol Table
Example Program
Dim Radius As Single
Dim Area As Single
Const Pi = 3.1415926536 As Single
Input Radius
Area = Pi * Radius * Radius
Using this table, the program can be tokenised as the lexical string:
1 3 5 4 2 6 3 6 3
Note that the syntax analyser fills in the columns "Kind of Item" and "Data Type". The lexical analyser only adds the "Item Name" and "Run Time Value".
The most common way of organising the symbol table is to use a hash table. The identifier is hashed to a memory location.
Semantic Analysis
It is concerned with the meaning or interpretation of words in the context in which they are used.
For example A:= B may be a correct PASCAL statement, but it is not possible to assign B to A if A is an integer variable and B is a character variable. Semantic analysis checks that the statements have some correct meaning.
Examples:
- Are the parameters and arguments types of a subprogram compatible?
- Do the number of parameters and arguments match?
- In an assignment statement, are the types of the LHS and RHS compatible?
In natual language (for example English) a sentence may also be syntactically correct but sematically meaningless. For example ‘The man eats an apple’ has meaning and obeys the rules of the language whereas ‘The apple eats the man’ obeys the rules of the language(the syntax) but not the semantics.
Syntactic correctness does not imply semantic correctness.
Code Generation
Source code is first passed through a lexical analysis program and then a syntax analysis program. It is then given to the code generation program to actually produce the object code.
A high level language is first LEXICALLY analysed. Then its SYNTAX is analysed. During syntax analysis, the SEMANTICS of the code is checked to see if it makes sense. If any errors are found in these stages the REPORT GENERATOR program springs into action and displays helpful (or not so helpful) error messages. The programmer uses these to aid debugging the program. This is known as translator diagnostics. When the program can be lexically and syntactically analysed without errors, the object code can be generated.
It may include an optimizer that takes the code and modifies is so that it executes more quickly and/or uses less memory.
The problem of examining a sequence of operations for inefficiencies and then transforming that sequence into a more efficient sequence is called the optimisation problem.
In optimisation of a compiled program, the goals are to reduce
- the time necessary to execute the resulting executable (time efficiency)
- the size of the resulting executable(space efficiency)
- both of these
Disadvantages
(a) Compilation time is increased
(b) Unwanted results may be produced
Output of the compilation process:
- A list of syntax errors
- A machine code version of the object program, which may either be in memory, on the file store(Hard disk) or both.
- Cross reference list of variables used
It is also possible to generate machine code that will execute on a different type of computer. This is known as cross-compilation. The result of code generation is a file of machine code, usually called an object file.
Another way of speeding up the re-compilation process is incremental compilation. The compiler maintains machine code for each unit and changes involve only modifications to the existing machine code rather than complete re-generation.
Interpreter
An interpreter executes each line of code as it comes to it in the source program. In order to do this it has the following parts:
- Lexical analysis
- Syntax analysis
- Execution
The lexical analysis and the syntax analysis phases are similar to the compiler but the interpreter does not generate any code.
Advantages of a compiler
- A compiled program will almost always run faster than an interpreted one, as no translation is taking place at the same time.
- The object program(the machine code generated by the compiler) may be saved on disk and run whenever required without being re-compiled or requiring the user to have the compiler
- Commercially produced software can be sold in the form of object code, thus preventing purchasers from listing the source code and making modifications to it.
Advantages of interpreter
- Interpreters are very convenient for program development since making modifications does not mean that the whole program has to be recompiled which can take a considerable time for a large program. Many interpreters will allow a program to run up to the point where an error occurs, let the programmer fix the error and then continue from that point.
- An interpreter is simpler to write and cheaper to buy.
- An interpreter generally occupies much less memory than a compiler.
Cross-compilers and cross-assemblers
Cross-compilers and cross-assemblers are translators which enable a program to be compiled or assembled on one machine and executed on different type of machine. This is sometimes useful if, for example, program development is being done on a mainframe computer for a program which will eventually run on a microcomputer.
Assemblers
A program written in an assembly language is much more readable and understandable than its equivalent in machine code. The problem arises, however, that it is no longer directly executable by the computer. An assembler is a computer program which carries out the necessary translation.
The assembler accepts an assembly language program (the source code) as its data input, processes it and produces as output the required machine code program (the object code).
Different types of Assembler
-
Resident Assembler: This is an assembler that runs only on the target computer. It will only translate the mnemonics into the specific machine code for the processor resident in the particular computer.
-
Cross assembler: Unlike the resident assembler the cross assembler is able to produce the object code that will run on a different machine.
-
Two-pass assembler: When source code is to be assembled this type of assembler will first go through the source code producing the object code, but will ignore forward references such as jumps (i.e. it does not know where to jump to until it has got that far in the assembly process!). This is called the first pass. On the second pass the assembler will go through and resolve all the jump addresses to produce the final object code.
-
One-pass assembler: Unlike the two-pass assembler this one only needs to go through the assembly process once.
-
Meta assembler: This is an assembler that can deal with many different instruction sets.
Debugger
A debugger is often supplied with a compiler or interpreter. A debugger helps the programmer to find logical errors in the program. The compiler or interpreter will establish whether the program has broken any of the rules of the language but is cannot check whether the program is performing the correct task. A debugger can offer the following features:
- Breakpoints-this stops the execution of the program at a predetermined point
- Single step- to run the program one line at a time
- Watches - to allow the programmer to inspect the contents of variables
- Trace – provides a history of the statements executed immediately before the program failed
- Store dump- provides details of the contents of the computer’s memory.
Linking
Programs that make up a system are normally compiled separately and each compilation generates an object file. In order to build a system it is necessary to combine several object files and a linkage editor program performs this task. The result is a machine code file that is often known as an executable file and contains all the required object files linked together.
It is also common practice to place regularly used programs in library files. A library is a file that contains a collection of object files. The linkage editor will manage these files and link them to other programs as necessary.
In order to link object files, the files have to be copied into memory. It is also necessary to copy an executable file into memory before it can be executed. When program code is copied into memory, it is said that the code is loaded into memory. The program that performs this task is called a loader. The loader is usually an integral part of the linkage editor.
Loader
A program is loaded into memory from a library by a program known as a loader. There are two types of loaders:
- Absolute loader
It loads the program into a single fixed area of memory. All address references in the program are fixed at translation time (when the program is assembled or compiled) and it will only work properly when loaded into one specific position in main memory.
- Relocating loader
It can load the program anywhere in main memory because the program has been translated in such a way that all addresses are relative to the start of the program. The start address of the program can be held in a special register called the base register.
There are two basic forms in which a relocatable object program can be prepared. For the first form, static relocation, once the object program has been loaded into main memory relocatability is lost and the process cannot be moved again. For the second form, dynamic relocation, relocatability is retained and a process may be moved to a different memory during its execution (essential in a multi-programming set-up where programs are constantly being swapped in and out of memory). This is made possible by not replacing any logical address references with physical addresses. The logical to physical mapping is done at run-time using base register addressing.
Typical applications of:
Machine code is rarely used for programming a computer, it is used for programming some computer-controlled devices, but even microwaves ovens or central heating systems would be programmed in assembly language.
Assembly language is used if a particular facility, needed by a program, is not available in a high-level language. It is also used if great speed is required, as might be needed for military equipment
Programs are compiled if speed of operation is important. Because the final code is compiled into machine code, this is a convenient way of selling your software, without giving your customers access to the actual high-level language code.
An interpreter is ideal in a learning and development environment, where instant feedback regarding errors is more important than speed or efficiency of execution of the program.
1. Explain why the size of the memory available is particularly relevant to the process of compilation. (4)
2. a) Explain the difference between the two translation techniques of interpretation and compilation. (2)
b) Give one advantage of the use of each of the two translation techniques. (2)
3. State the three stages of compilation and describe, briefly, the purpose of each. (6)
4. Explain, in detail, the stage of compilation known as lexical analysis. (6)
Answers
1 A. The computer must be able to simultaneously hold in memory:
-The compiler software/without which the process cannot be carried out.
-The source code/the code that needs to be compiled
-The object code/because the compilation process produces the program in machine code form.
-Working space/processing area to be used by the compiler during the process.
(4)
Notes: This question is trying to make the candidate think about the implications of the creation of a machine code version of the high-level language program. The significance of the size of the memory is not as significant as it used to be because the memory in a micro is now large enough for the problem not to arise in the main, but in the past the compilation of programs could cause trouble on a micro leading to the standard translation technique of interpretation.
2 A.a) -Interpretation involves the translation of an instruction and the execution of that instruction before the translation of the next.
-The compilation of a program involves creating the machine code version of the program before allowing any of it to be run. (2)
b) -Interpretation provides the programmer with better error diagnostics because the source code is always present and hence can be used to provide a reference whenever the error occurs.
-When a program is compiled no further translation is necessary no matter how many times the program is run, consequently there is nothing to slow down the execution of the program. (2)
Notes: The question is probably not in its best form as there are many responses that could justifiably be given to the difference between the two processes. A perfectly acceptable response here would be that interpretation does not create an object code while compilation does.
3 A. -Lexical analysis
-puts each statement into the form best suited to the syntax analyser.
-Syntax analysis
-Language statements are checked against the rules of the language.
-Code generation
-The machine code (object code) is produced. (6)
Notes: The number of marks for the question plays a big part in this answer. There are only six marks, three of which must be for stating the three stages. This means that there is only one mark each for saying what the purpose of each is. Do not launch into a long essay, you don’t have time in the constraints of the examination room, the examiner is simply looking for an outline description of what the stage does. Be careful about writing down all you know about each stage. There is a danger that the first thing you write down may be wrong. There is only one mark available and, if the answer is very long, the mark will be lost immediately. Also, don’t think that the marks can be carried across from another part. You may not know anything about code generation, but you do know a lot about lexical analysis – sorry, the marks cannot be transferred over in a question like this.
4 A. -Source program is used as the input
-Tokens are created from the individual characters and from…
-the special reserved words in the program.
-A token is a string of binary digits.
-Variable names are loaded into a look up table known as the symbol table
-Redundant characters (e.g. spaces) are removed
-Comments are removed
-Error diagnostics are issued. (6)
Notes: Compare this question with the last one. This one is asking for the details, so it becomes important to say as much as possible. The examiner may be a little more lenient about something in a student’s list that is wrong, but only a little! After all, the main thing about the stages of compilation is to know when each of them is appropriate, so don’t make too many errors.