Compilers and Interpreters

Functions & Purposes of translators _ James Leong Mook Seng

There are two types of translator for converting a high-level language such as Pascal into machine code; compilers and interpreters. These relieve programmers of the tedium and complexity of low-level programming and hence make programming considerably easier and less error prone.

Compiler
This type of translator takes the whole source code and compiles it into object code. The object code (sometimes called machine code) can then be run. Pascal is an example of a programming language that uses compilation.

Interpreter
This type of translator takes the source code and translates the first line of the program, and then executes it. It then does the second line, and the third line, until it gets to the end of the code. BASIC, LISP, Prolog and APL are examples of programming languages that uses interpretation.

Compilers plus Interpreters.

Some programs written in languages such as JAVA are both compiled and interpreted! A program is firstly compiled into an 'intermediate code'. It is then distributed to users who use a wide range of computers such as Mac or PCs. These computers then run their own 'interpreter' to convert the compiled code into a code they can use. Languages such as JAVA are said to be platform-independent, because any program written in that language can run on any machine. These types of languages are ideal for use on the Internet, where you don't know anything about the machine that will be running your code.

Compilation and interpretation compared

Compilation is much faster than interpretation. Once the compilation process has been completed, the object code will run much faster than the same interpreted code. And the program only has to compile the code once.
Object code does not need the compiler to actually run it, only to convert it from source code to object code. Therefore, the object code produced by a compiler can be distributed to other computers without the compiler. One could write a program and then sell the object code on CD to anyone who wants to run the application.
Object code is difficult to 'read' so that it can be distributed without actually revealing the code. This helps protect intellectual rights.
Source code can be compiled in sections to produce object code in sections. You can't do this with interpreters. If you have a problem with memory space, therefore, compilation may be more useful.
If there is an error in interpretation, the program will run successfully up to the point an error occurs. Then it will stop. All of the variables are available for inspection at that point. Because of this, debugging is easier and quicker using interpretation compared to compilation.
Interpreters are far simpler to write than compilers. This is why JAVA works - because it is a relatively easy job to write an interpreter for a particular computer or application.

Compiler

A compiler takes a computer program that is written by the programmer (the source program) and translates it into machine code (the target program). This process comprises three main steps, as illustrated below

Lexical Analysis

The main purpose of the lexical analysis is to convert the source program into a series of tokens that are passed on to the syntax analyzer. The functions that the lexical analyser might perform include the following:

Removing white space. White space is regarded as all the program code that is superfluous to the meaning of the programs and includes comments, spaces, tabs and new-line characters
Identify the individual words, operators, etc (known as syntactic units) in the program
Create a symbol table. The symbol table will contain details of each symbol used in the program. A symbol might be a function name, a variable name, etc. the symbol table will be used by the later stages of the compiler. It will contain the symbol name plus information about the item, e.g an integer, a procedure, etc

Some languages allow the use of include files. These are files of code that can be added to the program before compilation starts. The lexical analyzer has to load these files.
Each reserved word, operator, etc will be converted into a token to be passed to the syntax analyzer.
Some simple error checking is performed. For example:

If the language imposes a limit on the length of identifiers or string constants, the lexical analyzer will determine if the identifier is too long
An illegal identifier such as 2ndyeazr will be flagged as an error
An attempt to assign an illegal value to a constant such as

Const Name = Hello World

Error: The words ‘Hello World’ should be enclosed in quotes

Once you have lexically analysed every line in a program, you end up with a very long string of tokens. This is passed to the syntax analysis stage.

Syntax analysis

The syntax analyzer attempts to make sense of the program. There are two parts to this process:

Syntax analysis
Semantic analysis

Syntax Analsis

It is the process of determining whether the sequence of input characters, symbols, items or tokens form a valid sentence in the language. The examination of ...

This is a preview of the whole essay

Const Name = Hello World

Error: The words ‘Hello World’ should be enclosed in quotes

Once you have lexically analysed every line in a program, you end up with a very long string of tokens. This is passed to the syntax analysis stage.

Syntax analysis

The syntax analyzer attempts to make sense of the program. There are two parts to this process:

Syntax analysis
Semantic analysis

Syntax Analsis

It is the process of determining whether the sequence of input characters, symbols, items or tokens form a valid sentence in the language. The examination of a stream of tokens and determining if it comprises a sentence in a given language is called parsing that stream. Parsing does not ascribe meaning to the stream of tokens but simply determines whether or not the sequence of tokens within the stream comprises a valid sentence.

Syntax error: a statement in the program violates a rule of the language, this could be a simple misspelling of a keyword, a missed punctuation mark, or a wrongly statement .

Examples of syntax error in a Pascal program:

Writteln(‘This statement contains 2 syntax error);

Misspelt keyword(writeln ) and missing quote-mark (after error, there must be 1 ‘)

Whil#e (I<1) do

Misspelt keyword while

It is the ordering of the words that allows us to ascribe meaning to sentences.Every language has a set of syntactic rules, that defines those orderings of the language that are valid. These syntactic rules are sometimes called the grammar of the language.

Syntax analysis does the following:

checks the program is grammatically correct.
The splitting of complex statements into simpler ones
Creation of a dictionary to hold details of variables (symbol table)

Symbol Table

The symbol table plays a central role in the compilation process. It will contain an entry for every keyword, identifier and assigned value in the program. It will also contain information about each of the entries, typically:

The kind of item(simple variable, structured variable such as array, procedure, keyword etc.)
The type of item(integer, real, char, etc)
The run-time addressi of the item or its value if it is a constant.
Other information such as the bounds of an array, information about parameters passed to a procedure.

It is possible to have a statement that is syntactically correct but has no meaning.

This table contains an entry for every keyword, variable, constant and operator (called identifiers)

Generating the Symbol Table

Example Program

Dim Radius As Single
Dim Area As Single
Const Pi = 3.1415926536 As Single

Input Radius

Area = Pi * Radius * Radius

Using this table, the program can be tokenised as the lexical string:
1 3 5 4 2 6 3 6 3

Note that the syntax analyser fills in the columns "Kind of Item" and "Data Type". The lexical analyser only adds the "Item Name" and "Run Time Value".

The most common way of organising the symbol table is to use a hash table. The identifier is hashed to a memory location.

Semantic Analysis

It is concerned with the meaning or interpretation of words in the context in which they are used.

For example A:= B may be a correct PASCAL statement, but it is not possible to assign B to A if A is an integer variable and B is a character variable. Semantic analysis checks that the statements have some correct meaning.

Examples:

Are the parameters and arguments types of a subprogram compatible?
Do the number of parameters and arguments match?
In an assignment statement, are the types of the LHS and RHS compatible?

In natual language (for example English) a sentence may also be syntactically correct but sematically meaningless. For example ‘The man eats an apple’ has meaning and obeys the rules of the language whereas ‘The apple eats the man’ obeys the rules of the language(the syntax) but not the semantics.

Syntactic correctness does not imply semantic correctness.

Code Generation

Source code is first passed through a lexical analysis program and then a syntax analysis program. It is then given to the code generation program to actually produce the object code.

A high level language is first LEXICALLY analysed. Then its SYNTAX is analysed. During syntax analysis, the SEMANTICS of the code is checked to see if it makes sense. If any errors are found in these stages the REPORT GENERATOR program springs into action and displays helpful (or not so helpful) error messages. The programmer uses these to aid debugging the program. This is known as translator diagnostics. When the program can be lexically and syntactically analysed without errors, the object code can be generated.

It may include an optimizer that takes the code and modifies is so that it executes more quickly and/or uses less memory.

The problem of examining a sequence of operations for inefficiencies and then transforming that sequence into a more efficient sequence is called the optimisation problem.

In optimisation of a compiled program, the goals are to reduce

the time necessary to execute the resulting executable (time efficiency)
the size of the resulting executable(space efficiency)
both of these

Disadvantages

(a) Compilation time is increased

(b) Unwanted results may be produced

Output of the compilation process:

A list of syntax errors
A machine code version of the object program, which may either be in memory, on the file store(Hard disk) or both.
Cross reference list of variables used

It is also possible to generate machine code that will execute on a different type of computer. This is known as cross-compilation. The result of code generation is a file of machine code, usually called an object file.

Another way of speeding up the re-compilation process is incremental compilation. The compiler maintains machine code for each unit and changes involve only modifications to the existing machine code rather than complete re-generation.

Interpreter

An interpreter executes each line of code as it comes to it in the source program. In order to do this it has the following parts:

Lexical analysis
Syntax analysis
Execution

The lexical analysis and the syntax analysis phases are similar to the compiler but the interpreter does not generate any code.

Advantages of a compiler

A compiled program will almost always run faster than an interpreted one, as no translation is taking place at the same time.
The object program(the machine code generated by the compiler) may be saved on disk and run whenever required without being re-compiled or requiring the user to have the compiler
Commercially produced software can be sold in the form of object code, thus preventing purchasers from listing the source code and making modifications to it.

Advantages of interpreter

Interpreters are very convenient for program development since making modifications does not mean that the whole program has to be recompiled which can take a considerable time for a large program. Many interpreters will allow a program to run up to the point where an error occurs, let the programmer fix the error and then continue from that point.
An interpreter is simpler to write and cheaper to buy.
An interpreter generally occupies much less memory than a compiler.

Cross-compilers and cross-assemblers

Cross-compilers and cross-assemblers are translators which enable a program to be compiled or assembled on one machine and executed on different type of machine. This is sometimes useful if, for example, program development is being done on a mainframe computer for a program which will eventually run on a microcomputer.

Assemblers

A program written in an assembly language is much more readable and understandable than its equivalent in machine code. The problem arises, however, that it is no longer directly executable by the computer. An assembler is a computer program which carries out the necessary translation.

The assembler accepts an assembly language program (the source code) as its data input, processes it and produces as output the required machine code program (the object code).

Different types of Assembler

Resident Assembler: This is an assembler that runs only on the target computer. It will only translate the mnemonics into the specific machine code for the processor resident in the particular computer.

Cross assembler: Unlike the resident assembler the cross assembler is able to produce the object code that will run on a different machine.

Two-pass assembler: When source code is to be assembled this type of assembler will first go through the source code producing the object code, but will ignore forward references such as jumps (i.e. it does not know where to jump to until it has got that far in the assembly process!). This is called the first pass. On the second pass the assembler will go through and resolve all the jump addresses to produce the final object code.

One-pass assembler: Unlike the two-pass assembler this one only needs to go through the assembly process once.

Meta assembler: This is an assembler that can deal with many different instruction sets.

Debugger

A debugger is often supplied with a compiler or interpreter. A debugger helps the programmer to find logical errors in the program. The compiler or interpreter will establish whether the program has broken any of the rules of the language but is cannot check whether the program is performing the correct task. A debugger can offer the following features:

Breakpoints-this stops the execution of the program at a predetermined point
Single step- to run the program one line at a time
Watches - to allow the programmer to inspect the contents of variables
Trace – provides a history of the statements executed immediately before the program failed
Store dump- provides details of the contents of the computer’s memory.

Linking

Programs that make up a system are normally compiled separately and each compilation generates an object file. In order to build a system it is necessary to combine several object files and a linkage editor program performs this task. The result is a machine code file that is often known as an executable file and contains all the required object files linked together.

It is also common practice to place regularly used programs in library files. A library is a file that contains a collection of object files. The linkage editor will manage these files and link them to other programs as necessary.

In order to link object files, the files have to be copied into memory. It is also necessary to copy an executable file into memory before it can be executed. When program code is copied into memory, it is said that the code is loaded into memory. The program that performs this task is called a loader. The loader is usually an integral part of the linkage editor.

Loader

A program is loaded into memory from a library by a program known as a loader. There are two types of loaders:

Absolute loader

It loads the program into a single fixed area of memory. All address references in the program are fixed at translation time (when the program is assembled or compiled) and it will only work properly when loaded into one specific position in main memory.

Relocating loader

It can load the program anywhere in main memory because the program has been translated in such a way that all addresses are relative to the start of the program. The start address of the program can be held in a special register called the base register.

There are two basic forms in which a relocatable object program can be prepared. For the first form, static relocation, once the object program has been loaded into main memory relocatability is lost and the process cannot be moved again. For the second form, dynamic relocation, relocatability is retained and a process may be moved to a different memory during its execution (essential in a multi-programming set-up where programs are constantly being swapped in and out of memory). This is made possible by not replacing any logical address references with physical addresses. The logical to physical mapping is done at run-time using base register addressing.

Typical applications of:

Machine code is rarely used for programming a computer, it is used for programming some computer-controlled devices, but even microwaves ovens or central heating systems would be programmed in assembly language.

Assembly language is used if a particular facility, needed by a program, is not available in a high-level language. It is also used if great speed is required, as might be needed for military equipment

Programs are compiled if speed of operation is important. Because the final code is compiled into machine code, this is a convenient way of selling your software, without giving your customers access to the actual high-level language code.

An interpreter is ideal in a learning and development environment, where instant feedback regarding errors is more important than speed or efficiency of execution of the program.

1. Explain why the size of the memory available is particularly relevant to the process of compilation. (4)

2. a) Explain the difference between the two translation techniques of interpretation and compilation. (2)

b) Give one advantage of the use of each of the two translation techniques. (2)

3. State the three stages of compilation and describe, briefly, the purpose of each. (6)

4. Explain, in detail, the stage of compilation known as lexical analysis. (6)

Answers

1 A. The computer must be able to simultaneously hold in memory:

-The compiler software/without which the process cannot be carried out.

-The source code/the code that needs to be compiled

-The object code/because the compilation process produces the program in machine code form.

-Working space/processing area to be used by the compiler during the process.

(4)

Notes: This question is trying to make the candidate think about the implications of the creation of a machine code version of the high-level language program. The significance of the size of the memory is not as significant as it used to be because the memory in a micro is now large enough for the problem not to arise in the main, but in the past the compilation of programs could cause trouble on a micro leading to the standard translation technique of interpretation.

2 A.a) -Interpretation involves the translation of an instruction and the execution of that instruction before the translation of the next.

-The compilation of a program involves creating the machine code version of the program before allowing any of it to be run. (2)

b) -Interpretation provides the programmer with better error diagnostics because the source code is always present and hence can be used to provide a reference whenever the error occurs.

-When a program is compiled no further translation is necessary no matter how many times the program is run, consequently there is nothing to slow down the execution of the program. (2)

Notes: The question is probably not in its best form as there are many responses that could justifiably be given to the difference between the two processes. A perfectly acceptable response here would be that interpretation does not create an object code while compilation does.

3 A. -Lexical analysis

-puts each statement into the form best suited to the syntax analyser.

-Syntax analysis

-Language statements are checked against the rules of the language.

-Code generation

-The machine code (object code) is produced. (6)

Notes: The number of marks for the question plays a big part in this answer. There are only six marks, three of which must be for stating the three stages. This means that there is only one mark each for saying what the purpose of each is. Do not launch into a long essay, you don’t have time in the constraints of the examination room, the examiner is simply looking for an outline description of what the stage does. Be careful about writing down all you know about each stage. There is a danger that the first thing you write down may be wrong. There is only one mark available and, if the answer is very long, the mark will be lost immediately. Also, don’t think that the marks can be carried across from another part. You may not know anything about code generation, but you do know a lot about lexical analysis – sorry, the marks cannot be transferred over in a question like this.

4 A. -Source program is used as the input

-Tokens are created from the individual characters and from…

-the special reserved words in the program.

-A token is a string of binary digits.

-Variable names are loaded into a look up table known as the symbol table

-Redundant characters (e.g. spaces) are removed

-Comments are removed

-Error diagnostics are issued. (6)

Notes: Compare this question with the last one. This one is asking for the details, so it becomes important to say as much as possible. The examiner may be a little more lenient about something in a student’s list that is wrong, but only a little! After all, the main thing about the stages of compilation is to know when each of them is appropriate, so don’t make too many errors.

Teacher Reviews

Here's what a teacher thought of this essay

Ivor Borkin

22nd of Apr 2012

***** A great piece of work! With questions and answers too!

Did you find this review helpful? Join our team of reviewers and help other students learn

Compilers and Interpreters

Compilation and interpretation compared

Compiler

Lexical Analysis

Syntax analysis

Syntax Analsis

This is a preview of the whole essay

Syntax analysis

Syntax Analsis

Examples of syntax error in a Pascal program:

Symbol Table

Generating the Symbol Table

Semantic Analysis

Code Generation

Interpreter

Advantages of a compiler

Advantages of interpreter

Cross-compilers and cross-assemblers

Different types of Assembler

Debugger

Linking

Teacher Reviews

Here's what a teacher thought of this essay

Ivor Borkin

Document Details

Related Essays

Hardware and Software Requirements For an IT system

140 Computing A-Level Revision Questions

What is Java?

Operating system.