Data Modelling & Data Structures.
Data Modelling & Data
Structures
Unit6 Element 6.1
By Nicky Wilson
GNVQ Advanced IT
Action Plan
4/5/95 -
9/5/95
Received assignment which is to cover element 3.1 all PCs. My initial task is
to Gather all relevant information on the basic data structures for storage and
retrieval. I will research through lecture notes and the books BTEC
Information Technology, BTEC in Computing, File structures theory and
practice, as well as to search through the CD ROM Groliers Encyclopaedia.
Take notes on any relevant Information
9/5/95 -
1/5/95
Research information on the way that Basic data structures are analysed for
different applications. Research through above books and CD ROM's and
take relevant notes.
1/5/95
3/5/95
Find out about logical and physical file organisation, with regards to PC3
Element 6.1 of the log book. Take notes on relevant Information.
3/5/95 -
5/5/95
6/5/95
8/5/95
Research information using methods as above with regards to how the
physical file organisation is analysed in relation to different media, PC4.
Make notes
Research information to cover PC 5, which needs me to explain location and
access methods. Use literature as above
9/5/95
Word process first draft, and take to tutor for first review
After outcome of first review take tutors advice accordingly.
20/5/95
Check work to see if any important facts have been omitted, ask Tutor for a
second review. After outcome of second review finalise any missing facts.
Word process final draft, check the work for mistakes and hand in finished
report for 1/6/95
Nicky Wilson
GNVQ Advanced IT
Investigate data Structure for storage and Retrieval
Element 6.1
Introduction
The report will analyse basic data structures for different applications and physical file organisation in relation to
different media. The report will also explain basic data structure for storage and retrieval, logical and physical file
organisation and location and access methods.
A data structure is essentially a number of data items, also called elements or nodes with some relationship
linking them together. Each item consists of one or more named parts called fields occupying one or more
memory locations in the computer. For instance a list of numbers occupying consecutive memory locations in a
computer is a simple data structure.
Array: This is an ordering of the data elements so that the data is able to be extracted in a logical fashion, shown
below is a diagram showing an example of this
1 6 9 3
Dim x (3) Index value
7 4 4 1 Dim y (3) Index value
Dim x (3,3) Index value
9 2 6 7 Dim y (3,3,3) Index value
Stack: The stack is a data structure chacterized by the expression LIFO = Last in first out this means that most
recent item added to the stack is the first one which can be removed from the stack. A stack pointer is used to
keep track of the last item added to the stack, which is the current top of the stack. Stacks are frequently used for
data temporary storage. One common application of stacks is for storing return addresses (link values) for closed
routines.
TOP
SP
BOTTOM
A stack only has two operations PUSH: Add an item
POP: Remove the top item.
FULL & EMPTY: Stack pointer
It can define maximum values only one end used.
Queue: The data structure known as a queue has the same characteristics as the queues that we encounter in
everyday life. A queue in a data structure in which elements are added only at the rear of the list and removed
only from the front of the list. A queue structure is often given the name FIFO which stands for first In first out.
Data in what we call a queue is not moved along like people in a cinema queue, instead each datum stays in its
storage location until its turn comes, thereby reducing time spent in data movement. The use of pointers makes
this possible.
FRONT BACK JOIN HERE
Take items from the front, add items to the end.
List: Lists provide a flexible way of handling data items in order. Changes to the order can be achieved with
minimal data movement and little loss of storage space These can be ...
This is a preview of the whole essay
Data in what we call a queue is not moved along like people in a cinema queue, instead each datum stays in its
storage location until its turn comes, thereby reducing time spent in data movement. The use of pointers makes
this possible.
FRONT BACK JOIN HERE
Take items from the front, add items to the end.
List: Lists provide a flexible way of handling data items in order. Changes to the order can be achieved with
minimal data movement and little loss of storage space These can be ordered can contain N > 0 items, each data
is an element, 3, 4, 41, 62, 79, 8, 11 or FRED, JIM, ANDY, CHRIS, SID.
Tree: The tree structure is an Hierarchical structure, the term tree refers to a non linear data structure in which
nodes have two or more pointers to other nodes forming an hierarchical structure.
The top node is called the root node
The bottom node are called leaf (Terminal nodes) and the nodes are connected by branches. Shown below is an
example of a tree structure showing how a record in a employee file may have the structure shown below.
Works Number
Name County Sex Post Holidays
Status Nation`ty phone Street Town Age Service Dept Years Salary Entitmnt
Storage & Retrieval
For example in a banking organisation, the information that must be recorded could be information on a
customers checking or savings account, on loan applications, about employees of banking institutions etc. Due
to the four parts of information, each part is related to as a file, so the banking organisation must record the
information in four separate file shown below.
Checking Savings Loan Employee
Accounts Accounts Applications
File File File File
Records: Are a collection of related fields, an example to show this could be a record of an accounts file, which
contains four fields. Illustrated below is a diagram showing this.
ACCOUNT NAME ADDRESS BALANCE
9783 - 59 -812
JOE BLOGGS
BLOGGS AVENUE
000.89
Files: Logical is referred to as the external view of the file a logical file is nothing more than a collection of all
logical data.
Media Access: File storage media is of two main types, Serial access and direct access, below is a short
explanation of the two.
Serial Access media: This means that in order to access a particular record, it is necessary to read all records
which precede it in the relevant file. An example of this storage medium is in normal cassette tape. A difficulty
with this storage media is that there are no readily identifiable physical access areas on the medium which can be
addressed, it is non addressable. Thus to look for an individual record the software needs to examine each
record key field, in sequence from the start of the file until the required record is found.
Direct access media: This allows direct access to a particular record, for example floppy or Hard drives. They
have physical divisions which can be identified by computer software, as well as hardware, and can be
addressable so that particular locations can be referred to by name or code, to retrieve a record which is shared at
the location.
Basic data structures are analysed for different applications
Input / output Queuing and spooling
Computer and printer everytime you print work out in room 107 YCC you go into a queue, it stores the
information and prints it out in the order it went in. Queuing information uses first in first out.
If it was more advanced, for example you needed to have certain priorities for printing ( small files first) to make
the system more efficient you would need to use a list structure. Spooling is the other way round, putting things
together ready to go out. It would be possible to use a queue data structure.
Storage (tables, declarations, files, databases)
Table for example containing storage devices.
TABLE 0 1 2 3 4
0
1
2
3
Two dimensional, one column specifies, and one column specifies the row. Stored in a two dimensional array
structure.
Files are made up by a number of logical records.
Dimensional array Dimensional array
Field
Record
Record Record
Record
Record
Problem must contain the same type of information
Each box of array can only store the same type of Information
Retrieval: All structure storage and retrieval vary from structure to structure. It may use a tree, to extract
information from a tree the name given is traversing the tree or tree walking, for simplicity we will use binary trees.
The reason for this is that each node can only have two branches.
Left Subtree Node Right Subtree
A
B C
D E F G
Inorder Traversal: Traverse the left subtree, visit the node.
Traverse the right subtree. = DBEEAFCG
Preorder Traversal Start at Node A
Traverse the left Subtree. = ABDECFG
Post Order Traversal Traverse the left
Traverse the right
Return to the Root. (Node) = DEBFGCA.
Searching For searching list and array structures.
Compilation: is the process of translating a High level language into machine code (Basic, Pascal, FORTRAN)
There are 3 main steps
* Lexical
* Syntax Analysis Data structures is what we are interested in
* Code generation
Lexical analysis: This involves breaking the input to the compiler into chunks, also known as tokens
Syntax Analysis: This involves checking whether the input tokens form valid sentences when put together.
This process is known as parsing. The second process of syntax analysis involves determining the values of
arithmetic expressions.
Code Generation: The final stage of the compilation process, where the machine code is generated.
Methods of Syntax Analysis
Parse trees can be used to evaluate whether a statement has the correct syntax
< CAR REGISTRATION
< LETTER > < DIGIT > < DIGIT > < DIGIT > < LETTER > < LETTER > < LETTER > <
B 3 8 4 P H X
Evaluate Arithmetical expressions
Operands Operator
e.g. ( 3 + 5 ) * ( 9 - 7 )
It is far easier for the compiler to evaluate arithmetic expressions if they are presented in reverse (Polish fix)
notation.
Post fix notation is also known as Prefix notation because the operators precede the operands.
e.g. + 35 - 97 * Polish notation
( 3 + 5 ) * ( 9 - 7 )
*
+
3 5 9 7
Left hand side Polish Notation (Prefix Notation)
Reverse Polish Notation (Post fix notation) Right hand side 35 + 97 - *
Next data structure used is a Stack 35 + 97 - *
A stack is used to evaluate the reverse polish expression using the following rules
. Operands are placed on the stack
2. Operators are applied to the top two items on the stack and the result is placed on top of the stack.
look at each item in turn
3 3 Place on the stack
5 5 Place on the stack
3
+ 8
9 8 Operand goes on the stack
9
7 7 Operand goes on the stack
9
8
- 2 Operator applied to top item on stack
8
* 16 Apply to top two items on stack
8 * 2
Originally had ( 3 + 5 ) * ( 9 - 7 ) = 16
Logical file organisation is explained.
Logical file structure
Serial file: This is where the records are stored one after another with no regard to the order of the records.
Shown below is an example
Customer 27 Customer 6 Customer 33 Customer 49
Sequential access files These are the files where the records are stored one after another in a predetermined
order. This is usually around the key field, when files of data are created you need a means of access to a
particular record within those files. This is done by giving each record a key field by which the record can be
recognised or identified. Examples of key fields could be
* Customer number in a customer ledger record
* Stock code number in a stock record
* Employee clock number in a payroll record
Customer 10 Customer 26 Customer 34 Customer 47
Indexed sequential file: Records are stored in a sequence like sequential, the important difference is that an
index is provided to enable individual records to be located. Strictly speaking the records may not always be
stored in sequence but the index will always enable the sequence to be determined. Illustrated below is an
example of an indexed sequential file.
1 INDEX
2
3 1
.
. 10
.
10
11 20
12
.
.
.
20
21
22
Random access file structure This allows the ability to retrieve a record without having to read all the records
that appear before it in the file. it allows fast access to records it is ideally suited for Interactive systems
Physical file organisation is analysed to different media
Magnetic tape. Because of the physical characteristics of magnetic tape it is necessary when processing a file
that the tape unit starts to read the tape unit at the beginning of the tape. Magnetic tape is a low cost high
storage capacity device, its advantages are that it is very cheap. Files can be organised two ways serial and
sequentially. Shown below is a diagram showing how a file is arranged on tape both logically and physically.
Block or physical record
File I I
header R1 R2 R3 R4 B R5 R6 R7 B R9 R10 R11 R12 ...
label G G
Logical Records
Inter Block Gap
Magnetic Disk: Magnetic disk provides storage facilities far more flexible than magnetic tape. The surface of the
disk is divided into physical locations. It is a direct access medium. Magnetic disk supports the following file
organisation methods Serial, Sequential.
CD ROM Uses tracks to store the data on, the tracks are very close together . They have a mass storage
capacity, they can hold about 600Mb of information and are direct access medium. Latest CDs now allow you to
put information on and keep adding to it.
RAM Random access memory is Electrical memory, it is a temporary store for holding programs and data that has
either been put into the computer from either disk, typed at the keyboard or input from some other device. This
type of memory is called volatile memory that means that the contents of main memory can be destroyed, either
by been overwritten or when the machine is switched off. It is direct access and very fast access, it has a limited
capacity and is relatively expensive.
Location and access methods are explained
serial sequential order: The lowest value is at the top, and the highest at the bottom. You would start at the
beginning and work your way, the advantage of using this way is if for instance if you wanted to find number 29,
if by the time it gets to number 34 the value is not found, the search will be terminated immediately. If it wasn't
sequential you would have to go through the entire list.
4
13
26
34 If number 29 is not found by here, search will be stopped
97
102
Serial search: Using a serial search you would go through the files in each order, look through data items one at
a time, from the start of the data structure to the end. This can be a very inefficient type of search because all of
the data items must be examined unless the data is ordered. This is the only type of search that can be used with
unordered information.
Serial record search: This means that in order to identify and retrieve a particular record it is necessary to read
all the records which precede it in the relevant file, until the file you require is found
RECORD 1 RECORD 2 RECORD 3 RECORD 4
Evaluation
I am happy with the outcome of the assignment, I feel that I have covered the criteria and the range
that was required. The way I approached the assignment was as such, first of all I researched
Information from the books Information Technology by Roger Carter, BTEC Computer Studies,
Information Processing BTEC, A level BTEC and first degree computing.
The next process was to decide which way, was the best way to try and cover the PCs and ranges for
the unit were covered. Eventually I reached the conclusion that it would be easier for me to work
through the PCs in the order that they appear in the log book. Thus starting with PC1.
The other way I thought of approaching the assignment was to start by doing PC1 first but to try and
bring in other elements of the ranges in accordingly. The reason why I opted out of doing it this way
was because I thought that it would make it more difficult.
The way that I tried to checked the validity of the Information was by, trying to compare the
information that I had it with the different books and CD ROM's to see if it was correct. This way
proved hard. In my opinion it is hard to judge the validity of the information for this assignment,
because certain areas relating to this subject is hard to find a wide range of Information on.
I have not done the work as instructed on my action plan, I have had reviews by tutor earlier than
stated in my action plan, the reason for this is because I have other assignments that need completing.
If any criticism is to be applied to my work, I feel that I have not gone into depth with certain parts of
the assignment, but elaborated too much on other areas.
Bibliography
Books and CD ROM's Used
Computer Studies for BTEC (3rd Edition) Geoffrey Knott, Nick Waites, Paul Callaghan, John
Ellison. Business Education Publisher ltd. 1993
Information Processing for BTEC 2nd Edition Geoffrey Knott, Nick Waites,
Paul Callaghan, John Ellison. Business Education Publisher ltd. 1990.
A level, BTEC & first degree Computing by Nick Waites, Geoffrey Knott.
Business Education Publishers Limited 1992
Information Technology by Roger Carter, first published 1991, reprinted 1992.
Encarta encyclopaedia, Times, Guardian, Groliers Encyclopaedia (CD ROMS)
Lecture Notes.