The intention of this study is to analyze and explore the emerging field of grid technology. It delves into how the grid is being used to enhance the capabilities of existing distributed systems and data resources.

University Degree Mathematical and Computer Sciences

Issues and Applications of Grid Computing

A thesis submitted in partial fulfillment of the requirements for the degree of
Bachelor of Science (Computer Science)

ABSTRACT

The intention of this study is to analyze and explore the emerging field of grid technology. It delves into how the grid is being used to enhance the capabilities of existing distributed systems and data resources. The characteristics of virtual organizations and their participation in implementing a grid structure are observed. The issues surfacing in grid implementation and their possible solutions are discussed. Enhancements and modifications are proposed for existing frameworks for database integration with the grid. A basic grid structure for the Department of Computer Science, University of Karachi has been planned out. The Globus Toolkit, used in grid middleware is tested and run on available resources.

ACKNOWLEDGEMENTS v

LIST OF FIGURES

Figure 1.1: Virtual Organizations 6

Figure 2.2: The Compact Muon Solenoid Experiment 13

Figure 2.3: An I-WAY Point of Presence (I-POP) Machine 16

Figure 3.4: The Layered Grid Architecture 19

Figure 3.5: The Layered Grid Architecture with respect to Services and APIs 24

Figure 3.6: The Layered Grid Architecture and its Relationship to the Internet Protocol Architecture 26

Figure 3.7: The Core Elements of the Open Grid Services Architecture (shaded) 30

Figure 3.8: Services Involved in the Example 31

Figure 3.9: The Three Layered Semantic Grid Architecture 37

Figure 3.10: Comparison of Peer-to-Peer and Grid Computing Styles 40

Figure 3.11: Middleware Peer (MP) Groups of Services at the edge of the Grid 41

Figure 4.12: Authentication, Authorization through Proxy 45

Figure 5.13: A Virtual Database System on the Grid 59

Figure 5.14: Separate Interaction with Databases on the Grid 61

Figure 5.15: Flowchart for Query Processing on the Grid 62

Figure 6.16: Proposed Structure for the University of Karachi Grid 68

Figure 6.17: Issuing Certificate to Grid User 69

LIST OF TABLES

Table 2.1: US-CMS Grid Resources [TGB2004] 14

Table 5.2: Example Accounting Policy 66

Chapter 1

FUNDAMENTALS OF GRID COMPUTING

The notion of linking people, computers, sensors and data with networks is decades old. However, the grid concept has gradually evolved and now, for the first time there is a coherent description of the hardware, software and applications required to create a functioning and persistent grid. Grid computing will prove to be one of the most significant developments of this age. This chapter introduces the concept of grid computing and deals with some major misconceptions related to the grid.

1.1 INTRODUCTION TO GRID COMPUTING

In simplest of terms grid computing is distributed computing taken to the next higher level. Whereas distributed computing is being implemented at a large scale through out the world, grid computing has been around for just a few years and is still in its development stages.

The goal is to create the illusion of a simple yet large and powerful self-managing virtual computer out of a large collection of connected systems, which may vary in the number of resources they share.

The grid is called the ‘next generation Internet’ [GCM2003, TAG2001]. The Internet came into being when communication was established between heterogeneous locations. This communication includes file sharing, access to web sites, video conferencing etc. Grid computing is taking this communication one step further on the level of resource sharing of individual systems.

Grid computing is different from conventional distributed computing by its focus on large-scale resource sharing, greater processing and computation capabilities and inventive applications utilizing mass parallelism.

1.2 HISTORY OF THE GRID

The origins of the grid can be linked to parallel computing. Research on parallel computing in the 1980s focused on the development of algorithms, programs and architectures that supported simultaneity. During the same time, researchers from multiple disciples began to come together to attack problems in science and technology that required large-scale computational resources. The problems faced in multidisciplinary problems and the geographically dispersed collaborations between them provided the coordination and distribution experience essential for creating the grid.

There are three generations of grid defined in [GCM2003]. The early grid projects linked supercomputing sites and provided computational resources to high-performance applications. Two projects in the first generation were FAFNER [FAF] and I-WAY [OIW]. FAFNER stands for Factoring via Network-Enabled Recursion. Contributors downloaded and built a daemon, which became their web client that used HTTP protocol to GET values from and POST the results from computations back to a CGI (Common Gateway Interface) script on the web server. FAFNER was capable of running on any workstation with more than 4 MB of memory. The Information Wide Area Year (I-WAY) was a year-long experimental effort to link many high-performance computers and advanced visualization environments. It connected a dozen ATM testbeds, seventeen supercomputer centers, five virtual reality research sites and over sixty application groups. The I-WAY was successfully demonstrated at Supercomputing ’95. Even though both projects lacked scalability, FAFNER was the forerunner for projects like SETI@home [SET] and I-WAY for Globus [TGP] and Legion.

Second-generation systems focus on middleware to support large-scale data and computation. Middleware is generally considered to be the layer of software sandwiched between operating system and applications, providing a variety of services to the applications [GCM2003]. In a Grid environment, middleware serves to mask the heterogeneous nature of the resources. Second-generation technologies include Globus [TGP] and Legion.

The second generation provided the interoperability that was required for large-scale computation. However as other aspects of the grid were explored it became apparent that it was desirable to use existing components and information resources. Third generation projects focus on service oriented architecture and metadata. There is also a strong sense of automation, such as properties to dynamic configuration, recovery and optimized use of resources.

1.3 FUNDAMENTAL CONCEPTS

1.3.1 RESOURCE SHARING

A grid is a collection of machines which act as nodes in a network. Each node takes part in contributing resources to the grid as a whole. These resources maybe utilized according to the restrictions applied by the owner.

Different types of resources that may be linked by a grid, include:

1.3.1.1 Computation

Computation is one of the principle uses of the grid. Shared processing power aims to provide users with smaller computation times. This was also a prime factor in the emergence of grid technology as many scientific problems require extremely high processing speeds in order to operate upon data generated thousand of times per second.

Exploiting the resources of the grid for better computation power can be one in three ways:

Executing the application on a faster machine with a larger memory.
Splitting an application’s task among many different nodes so that the job is accomplished in time inversely proportional to the number of nodes.
Running a process that needs to be executed many times on multiple machines simultaneously.

Scalability is a measure of how efficiently the processors on a grid are used. If twice as many processors make an application complete the task in half the time taken previously then it is said to be perfectly scalable. However there are limits to scalability because applications cannot be split indefinitely. Some computations may be dependent on others or some tasks may not run in parallel because of certain restrictions.

1.3.1.2 Storage

A grid is also used for data storage. A Data Grid is one providing an integrated view of data storage [GCM2003]. Each node provides some quantity of storage whether permanent or temporary. Some may dedicate parts of their secondary storage for use by other machines while others just use volatile memory to store data temporarily while performing tasks for a machine.

Applications may be designed such that they execute in parallel while accessing data on only one node. This is usually done in cases where local memory of single node is not enough to hold all the data.

Other techniques include using a unifying file system. An individual file or database can be stored at many different devices but provide a uniform view to the users.

1.3.1.3 Communication

A grid is functional due to its network capabilities [GCM2003]. Bandwidth is a critical factor in determining the speed and efficiency of data communication especially in data intensive applications.

Communication includes data and message exchanges within the grid as well as outside it. The Internet and other LAN or WAN sites may be accessed by any node in the grid. Such external communication would also depend on the network design and pathways maintained in the grid.

1.3.1.4 Software Usage

Due to licensing costs or installation restrictions it may be feasible to install the software only on one machine, but access it through different machines by creating several instances. This is also an important use of grid technology.

1.3.2 VIRTUAL ORGANIZATIONS

Figure 1.1: Virtual Organizations

As semantics of grid computing were defined, the term Virtual Organization was coined in [TAG2001] to refer to the participants in a grid environment. These are usually collections of resources defining the rules related to the sharing of their resources. They can be distributed across the globe and be heterogeneous (Personal Computers, Servers, Mainframes, Supercomputers etc).

In such settings there needs to be explicit controls over authentication, authorization, resource access, resource discovery etc. These systems are governed by certain protocols, which aim for the coordination of these resources in a proper manner.

The types of virtual organizations vary from scientific and technical organizations aiming to utilize as much of the computing resources as they can in order to gain results, to enterprises accessing huge amounts of data for purposes such as data mining.

1.3.3 PARALLEL PROCESSING

Parallel Processing [TSP2002] relates to the execution of a job on multiple processors concurrently in order to save the total execution time. This simultaneous processing is the very basis of grid computing. Rather than use the resources currently available at one particular site, it utilizes the computing assets at geographically dispersed locations and carries out the task in parallel at more than one processor.

However, the study of distributed and parallel computing varies with that of grid computing. Whereas distributed computing research generally focuses on the problems faced due to the geographical separation between multiple resources, grid research focuses on addressing the problems of integration and management of software so as to enable the sharing of resources.

1.4 BASIC BLOCKS OF THE GRID

1.4.1 NETWORKS

Networks form the vital link by which the resources on the grid communicate. Typical issues [GCM2003] in a network environment with respect to grid computing include carrying capacity and reliability.

Capacity of a network is measured in terms of bandwidth. High capacity networking increases the capability of the grid to support both parallel and distributed applications. In the US grids are built on high performance networks such as national networks, which exhibit roughly 10 Gbs-1 backbone performance.

Reliability is the likelihood that the link in a network would fail in some way. For grid applications the reliability of the network is an important factor as it cannot be afforded that large amounts of data traveling on the links be dropped or lost on the way.

1.4.2 COMPUTATIONAL NODES ON THE GRID

Grid applications are more inclined towards using resources for computational needs. Thus nodes, which are themselves high performance parallel machines or clusters, are of great interest to the grid researchers. Clusters belong to the Multiple Instruction Multiple Data (MIMD) category of computer systems. They consist of whole computers with their own dedicated memory interacting via some network facility. Other grid resources include data storage devices and even fax, printers etc.

1.5 MISCONCEPTIONS ABOUT THE GRID

Some basic misconceptions about the grid, which have been cleared in [TAG2001] are:

The Grid is an alternative to the Internet

The grid is not an alternative to the Internet, rather it uses and builds upon the Internet capability to provide services and protocols for huge data and heavy computation problems. The Web largely consists of clients talking to servers individually whereas in the grid clients and servers work together and interchangeably to solve some problem.

The Grid is a source of free cycles

The grid does not provide an unlimited supply of computing power, rather restriction will be in place on more or less all resources that are shared. Resource owners would employ policies and accounting mechanisms to restrict the use of the resource accordingly.

The Grid requires new computing models

Programming in grid environment means working in a parallel domain. Hence problems will be encountered which are not obvious in sequential computers. However, the programming models remain same and the current programming contexts can be used for grid programming.

The Grid makes high-performance computers obsolete

Even though the grid makes it possible to access and harness the computing power of many resources, the need for high-performance computers would continue to increase as more and more data-intensive and high processing problems arise.

Chapter 2

GRID APPLICATIONS

The feasibility of any new technology becomes evident through the applications that exploit it. Grid applications include those from science and industry, from academia and laboratories and from large corporations. They address problems ranging from multiplayer gaming, fault diagnosis, and astronomy to real-time analysis. This chapter gives an overview of the type of applications suited to grid computing, with special emphasis on CERN’s LHC grid and the I-WAY project.

2.1 LIFE SCIENCE APPLICATIONS

Computational Biology, Bioinformatics, Genomics, Computational Neuroscience and others are included in life sciences. These areas are turning towards grid computing for accessing, collecting and mining large amounts of data. Many scientific tools have been developed that incorporate the use of the grid’s resources. These resources include super computers and clusters.

Examples of grid projects related to life science include the Protein Data Bank [PDB], the myGrid project [MGD], the Biomedical Information Research Network (BIRN) [BIR] and MCell [MGM]. An ‘in silico’ experiment is a procedure that uses computer-based information repositories and computational analysis to test a hypothesis, derive a summary, search for patters or demonstrate a known fact. The myGrid project is developing middleware to support in silico experiments in biology.

The BIRN project started in September 2001 and links instruments and federated databases. It is developing hardware, software and protocols necessary to share and mine data for both basic and clinical research. The architecture to accomplish this goal is designed around a flexible, large-scale grid model where resources are tightly integrated by grid middleware technologies, including the Globus Toolkit [TGP]. The MCell is a collaboration between computational biologists and computer scientists to deploy large-scale Monte Carlo simulations using grid technologies.

2.2 ENGINEERING ORIENTED APPLICATIONS

Large-scale science and engineering applications can be executed more efficiently by the use of the grid, which makes possible the concept of concurrent engineering.

An example of the deployment of grid infrastructure to the engineering sciences is NASA Information Power Grid (IPG) [NIP] in the United States. It is aimed to revolutionalize the way in which NASA executes large-scale science and engineering problems. It provides computing and data management services that shall, on demand, locate and schedule the multicenter resources needed to address large-scale or widely distributed problems.

The NEESgrid is a grid-based system that supports a broad range of activities for improving the performance of buildings and other structures when subjected to the effects of earthquakes. NEESgrid integrates a range of earthquake engineering test apparatus into the grid infrastructure. The George E. Brown Network for Earthquake Engineering and Simulation (NEES) program was created in 1999. It has done major investments in earthquake engineering test facilities such as shake tables, reaction walls and wave tanks. All of these have to be network accessible so as to support broad community access to these expensive instruments. There was a need for an infrastructure to integrate test equipment, simulation, data repositories and collaboration tools. This led to the deployment of grid technologies called NEESgrid. NEESgrid builds on top of standard grid infrastructure, specifically the Globus Toolkit [TGP], augmenting it with specialized earthquake engineering tools and services [MKP].

2.3 PHYSICAL SCIENCE APPLICATIONS

CERN’s linear accelerator provides an example of grid computing utilization in the physical science area. CERN [CLH] is the European Organization for Nuclear Research. The linear accelerator [LHC] will provide huge amounts of data per second, which needs to be analyzed. Various countries, including Pakistan, are playing a vital part in this experiment by devoting their computing resources at various research centers to the analysis of this data.

Grid computing is also making its presence felt in the areas of astronomy. Virtual Observatories are just an extension of how the grid can be used in this field.

Data Intensive Applications

All the application areas mentioned above may also use the Grid as a data intensive application tool to collect, store and analyze data. So, the grid will not only be used for its storage capacity characteristics but also for gaining knowledge about the data stored though techniques such as data mining. An example of a data-oriented application is Distributed Aircraft Maintenance Environment (DAME) [DAM]. DAME is an industrial application being developed in the United Kingdom in which grid technology is used to handle the gigabytes of in-flight data gathered by aircrafts and to integrate maintenance, manufacturer and analysis centers. It addresses performance issues such as large-scale data management with real-time demands.

Commercial Applications

The grid is not just limited to scientific experiments; rather it is also being used for commercial purposes. These include enterprise computing areas, storage-on-demand, information-on-demand etc. The generalization of this is the concept of Application Service Providers (ASPs). Grid technologies are also being used in innovative ways in a variety of areas such as inventory control, enterprise computing, gaming (examples include The Butterfly Grid [BGM] and the Everquest Multiplayer Gaming Environment [EMG]) etc.

The growing collaboration between scientific and commercial sectors in promoting the grid will provide mutual benefits. Not only will there be revolutionary scientific advances but also a new generation of successful commercial products.

2.4 THE COMPACT MUON SOLENOID EXPERIMENT

CMS (Compact Muon Solenoid) [CLH] is a high-energy physics detector planned for the Large Hadron Collider (LHC) [LHC] at the European Center for Nuclear Research (CERN) near Geneva, Switzerland. CMS is currently under construction and is expected to be completed in 2007, at which time it will begin to record data from the highest-energy proton-proton collisions ever produced, which are known as events. Data from these collisions will help in solving many fundamental scientific issues such as the search for the Higgs particle and the origin of mass in the universe. It will help in recreating the environment present at the origin of the universe. The data will contain information from millions of elements within the detector itself, which will be used to construct the actual collision. It is expected that CMS will produce up to several petabytes of data per year [TGB2004]. Although the CMS detector will not be operational until after 2007, hundreds of physicists around the world are taking place in compute-intensive simulation studies of the detector, which will help in detector design. It is expected that data from this experiment will be analyzed by more than 2000 physicists at more than 150 universities and laboratories in 34 countries.

Grid technology has shown great promise in effectively managing large-scale problems such as this. Scientists and institutions from all over the world are participating in the CMS collaborations. The participating sites are typically organized as cluster farms with server nodes and worker nodes.

Figure 2.2: The Compact Muon Solenoid Experiment

The participants in the US-CMS Grid include the California Institute of Technology, the Fermi National Accelerator Laboratory, the University of California, San Diego, the University of Florida, and the University of Wisconsin, Madison. For a period of time, a group from CERN also joined the US-CMS Grid effort.

Table 2.1: US-CMS Grid Resources [TGB2004]

The US-CMS Grid is based on the GriPhyN Virtual Data Toolkit, which is in turn based on the Globus Toolkit [TGP] and the Condor High-Throughput Computing System, including the Condor-G job submission interface to the Globus Toolkit. MOP (Monte Carlo Production) is a Grid Adapter developed for CMS by Particle Physics Data Grid (PPDG) that sits between the job creation step and the grid middleware in the Virtual Data Toolkit and adds necessary subtasks to each job to enable it to run on the grid without modification, basically making them grid aware. MOP represented each generated job as directed acyclic graphs (DAGs).

On 8th July 2003, in Islamabad the PAEC (Pakistan Atomic Energy Commission) [PAE] signed a protocol with the European Center for Nuclear Research (CERN) and joined the allies of CERN. In order to achieve the goals set by CERN, there are six major centers setup in Pakistan. These include the PAEC-1, PAEC-2, PAEC-3, NCP, COMSATS and NUST.

2.5 I-WAY

The first modern grid is generally considered to be the Information Wide Area Year (I-WAY) developed as an experimental demonstration project for Supercomputing ’95. It was a year long effort to link existing US national testbeds based on ATM (Asynchronous Transfer Mode) to interconnect supercomputer centers, virtual reality research locations and applications development sites [OIW]. It connected seventeen different sites within North America and was used by over sixty application groups. The goal of the I-WAY project was to enable applications to use more than one supercomputer and virtual reality device. Developing software infrastructure for the I-WAY provided powerful experience for the first generation of modern grid researchers and projects.

The major part of the experiment was to develop a uniform software environment across the geographically distributed and diverse computational resources. To meet this requirement a management and application programming environment I-Soft was developed. The I-Soft system was designed to run on dedicated I-WAY point of presence (I-POP) machines deployed at each participating site. These machines provided a uniform environment for deployment of management software and also simplified security solutions by serving as a neutral zone under the joint control of I-WAY developers and local authorities. I-Soft provided a variety of services including scheduling, security, parallel programming support and a distributed file system. These services allowed a user to log on to any I-POP machine and then schedule resources on heterogeneous collections of resources without being aware of the location of these resources.

I-POP is a dedicated workstation accessible via the Internet and operating inside a site’s firewall. An ATM interface allows it to monitor and manage the site’s ATM switch. There is a site-specific implementation of a simple management interface, which allows I-WAY management systems to communicate with other machines at the site to allocate and access resources. Development, maintenance and auditing costs were reduced if all I-POP computers were of the same type and so in the I-WAY experiment Sun SPARCStations were used. In distributed and heterogeneous resources it was infeasible to replace the schedulers already in place with a single I-WAY scheduler. Instead there was a need to negotiate the scheduling of resources with the local schedulers by an independent entity. This entity was called the Computational Resource Broker (CRB) and in the limited I-WAY network one was CRB was sufficient [GCM2003].

Figure 2.3: An I-WAY Point of Presence (I-POP) Machine

Security was handled by dividing the authentication problem into two parts: authentication to the I-POP environment and authentication to the local sites. Authentication to the I-POPs was handled by using a telnet client modified to use Kerberos authentication and encryption. The scheduler software served as an ‘authentication proxy’, performing subsequent authentication to other I-WAY resources on the user’s behalf. Most sites used a privileged (root) rsh from the local I-POP to an associated resource. The rsh command is used for command execution on the network and does not require the user to input a password [HLE2001]. This method was used because of time constraints and acceptable only because the local site administered the local I-POP and the rsh request was sent to a local resource over a secure local network.

The I-WAY project provided an opportunity to deploy and study the solutions to problems in a grid-like environment such as those related to resource naming and allocation, authentication, coordination and integrity management. However, because of the relatively moderate number of users (few hundred) and participating sites (around 20) the issue of scalability was, to a large extent, ignored [GCM2003]. Also, a more sophisticated resource description language and scheduling framework was required. Regarding I-WAY security, root rsh is an unacceptable long-term solution. A more fundamental limitation was that each user had to have an account at each site to which access was required, which was not a scalable solution. There also needs to be formal representations of conditions of use, as well as mechanisms for representing transitive relationships. Another difficulty was that while resource database entries were generated automatically by the scheduler, the information contained in these entries (for e.g. network interfaces) had to be provided manually by the I-Soft team. The discovery, entry and maintenance of this information proved to be a significant source of overhead, particularly in an environment in which network status was changing rapidly. Clearly, this information should be discovered automatically. For example, a tool should use dedicated ATM links if available but automatically fall back on shared Internet if they become unavailable. Another limitation in the I-WAY project was lack of distributed file system support.

Chapter 3

THE GRID ARCHITECTURE

The grid is an emerging technology. Standards for its various operations are still being defined. In order to understand and contribute to the grid revolution, there is a need to understand its architecture and services. Section 3.1 of this chapter deals with the Layered Grid Architecture whose comparison with the Internet Model is given in section 3.2. Sections 3.3 to 3.6 deal with the Service Oriented Architecture defined by the OGSA, while some types of grids are described in the last section.

3.1 THE LAYERED GRID ARCHITECTURE

The ...

This is a preview of the whole essay

Chapter 3

THE GRID ARCHITECTURE

3.1 THE LAYERED GRID ARCHITECTURE

The grid architecture can best be described by the layered structure defined in [TGB2004, GCM2003]. Components within each layer are related to one another but can also be built on capabilities and behaviors provided by lower layers. This architecture follows the ‘hour glass model’ in which the narrow neck of the hour glass defines a small set of core abstractions and protocols on which many different high-level behaviors can be mapped (the top of the hour glass) and which themselves can be mapped onto many different underlying technologies (the base of the hour glass).

In the Layered Grid Architecture the neck of the hourglass defines the Resource and Connectivity protocols, which are implemented on top of the Fabric Layer. The top layers (Collective Services Layer and User Applications Layer) use the lower layers to construct a wide range of global services and applications.

3.1.1 FABRIC: CONTROL INTERFACES TO LOCAL RESOURCES

This is the layer at the lowest level of the hierarchy. At this level there are physical devices or resources that grid users want to share and access for collaborative work. These include computers, storage systems, catalogs, networks and various forms of sensors. A resource may also be a logical entity such as a distributed file system, computer cluster or a distributed computers pool. In such cases a resource implementation may involve internal protocols (for e.g. the NFS storage access protocol or a cluster resource management system’s process management protocol) but these are not related to the grid architecture as they are a resource’s internal protocols and their management responsibilities lies with the resource themselves.

Figure 3.4: The Layered Grid Architecture

Fabric components implement the local, resource-specific operations occurring on a resource due to sharing jobs being executed at higher layers. Hence richer Fabric Layer functionality enables more sophisticated sharing operations at high levels and vice versa. An example of this is advance reservation of resources through which high-level services aggregate and co-schedule resources. However if many resources do not support advance reservation, such a requirement for advance reservation would increase the cost of incorporating new resources into a grid.

In [TAG2001] it is suggested that at a minimum, resources should implement the following mechanisms:

Introspection Mechanisms that permit discovery of the structure, state and capabilities of resources (for e.g. whether they support a particular feature).
Resource Management Mechanisms that provide control over the resources and hence control over the delivered quality of service.

Examples of resources and their implemented mechanisms are:

Computational Resources

Management Mechanisms are required for starting programs, monitoring and controlling the execution of processes allocated to resources. Introspection Mechanisms determine hardware and software characteristics as well as relevant state information.

Storage Resources

Introspection Mechanisms include the hardware and software characteristics as well as relevant load information such as available space on the storage device. Management Mechanisms include reading/writing to files, control over the resources allocated to data transfers (space, bandwidth, CPU) etc.

The third layer is sometimes split into two layers: Resource and Connectivity. The protocols in these layers must be implemented everywhere and therefore must be relatively small in number.

3.1.2 RESOURCE: SHARING SINGLE RESOURCE

The Resource Layer contains protocols that exploit Communication and Authorization protocols of the grid to provide initialization, monitoring, control, accounting and payment of resource-sharing operations. The implementations of these protocols call the Fabric Layer functions for access to resources. Resource Layer functions are only concerned with individual resources because issues of distributed collections of resources are handled by the Collective Layer.

There are two primary classes of Resource Layer protocols that have been distinguished in [TAG2001]:

Information protocols are used to obtain information about the structure and state of a resource. For example, its configuration, load, usage policy (which includes parameters such as cost) etc.

Management protocols are used to specify resource requirements and the operations to be performed. These protocols also ensure that the requested operations are consistent with the sharing policy defined by that resource. Control of the resources includes initiating and terminating the operation. Accounting and payment issues are also considered by these management protocols.

There should also be reliable error reporting by these protocols to support smooth execution of operations.

3.1.3 CONNECTIVITY: EASY AND SECURE COMMUNICATION

The core communication and authorization protocols required for grid-specific network transactions are defined by the Connectivity Layer. Communication protocols enable and control the data exchange between resources on the network whereas authorization protocols build on communication services to provide cryptographically secure mechanisms for verifying the identity of users and resources.

Communication requirements include transport, routing and naming. Wherever possible the grid is trying to utilize existing standards and protocols with increased functionality, if required. For communication, the protocols are drawn from the TCP/IP stack i.e. the Internet Layered Protocol architecture [TII1994]. However, support for other reliable protocols should also exist.

The security aspect in resource sharing is particularly important. Since the interacting entities in a grid span multiple administrative domains authentication and security control is essential. Authentication solutions include the following characteristics [TAG2001]:

Users must be able to log on or authenticate to the grid just once and then have access to multiple resources.
There should exist the ability to delegate rights to programs so that they can authorize to a resource on behalf of the user.
Grid security and authorization policies must be able to interoperate with local security solutions.
A user may require access and usage of resources from more than one resource providers. In such cases it must not be required that the security system of the respective resource providers interact with each other.

3.1.4 COLLECTIVE: COORDINATING MULTIPLE RESOURCES

The Collective Layer consists of services, APIs and protocols that implement interactions across collections of resources. While the Resource Layer is focused on interaction with a single resource, the Collective Layer focuses on collections of resources. The components at this layer combine and exploit components from the Resource and Connectivity Layers and thus a relatively large number of services can be formed through their use.

For example:

Directory Services allow for the discovery of existence and properties of resources. The query of resources maybe by name and/or by attribute such as type, availability, load etc. Refer to [GIS2001] for more information.
Coallocation, Scheduling and Brokering Services.
Monitoring and Diagnostic Services.
Data Replication Services. Refer to [SED2001] for more information.

Programming models and tools often define and invoke Collective Layer functions:

Grid-enabled programming systems enable the use of familiar and current programming models in grid environments such as the grid-enabled Message Passing Interface [GEM1998].
Workflow Systems.
Software Discovery Services. Examples include NetSolve [NSN1997] and Ninf [DIN1999].
Collaboratory Services support the coordinated exchange of information. Examples include CAVERNsoft [CDA] and Access Grid [AGI2000].

Security, policy and accounting issues are also raised at the Collective Level.

Community Authorization Servers enforce community policies, which members can use to access community resources. Akenti [CBA1999] addresses some of these issues.
Community Accounting and Payment Services gather resource usage information for the purpose of accounting, payment and to limit usage by specific community members.

Figure 3.5: The Layered Grid Architecture with respect to Services and APIs

Collective Layer protocols can be general purpose to highly application or domain specific, whereas the Resource Layer protocols must be general purpose. Collective functions can be implemented as standalone services or as libraries designed to be linked with applications.

3.1.5 APPLICATIONS

Applications Layer comprises of the user applications which are constructed in terms of and by calling upon services defined at lower layers. At each layer we have well defined protocols and APIs that provide access to some useful service such as resource management, data access, resource discovery etc which are used by the application at the top layer in the grid model.

3.2 COMPARSION OF THE GRID MODEL WITH THE INTERNET LAYER MODEL

The Layered Grid Architecture may be compared to the Internal Layer Model [TII1994]. The Internet Protocol Architecture extends from network to application. There is a mapping from grid layers onto Internet layers.

The Fabric Layer, which is at the lowest level of the hierarchy in the Grid Model, corresponds to the Link Layer in the Internet Model and thus controls the physical aspects of the individual resources.

The Connectivity Layer on top of the Fabric Layer corresponds to the Internet and Transport Layer in the Internet Protocol Architecture. Similar to these layers the Connectivity Layer focuses on the communication and reliable transport of data across sites.

The Application and Collective Layers form the Application Layer of the Internet Model. These involve the coordinated use of multiple resources and the user applications that operate within a Virtual Organization Environment.

Figure 3.6: The Layered Grid Architecture and its Relationship to the Internet Protocol Architecture

3.3 SERVICE-ORIENTED ARCHITECTURE

[TGB2004] describes a service as a network enabled entity that provides some functionality or capability to its users by exchanging messages. A service is defined by identifying sequences of specific message exchanges that cause the service to perform some operation. A service oriented architecture is one in which all entities are services and thus any function is the result of message exchanges. For example, a storage service would provide functionality for storing, retrieving, accessing data in a storage space and also querying for the status and other attributes of the storage service.

Service operations and functionality is encapsulated behind a message oriented service interface, which isolates the user from the implementation details. For example, in a storage service, a user would request asking for some data stored on disk by invoking this operation on a particular instance of the storage service. The user is oblivious to how the service would then go about locating and accessing that data.

Interface Definition Languages, such as WSDL [WSD2001] are used to facilitate interactions with a service. It describes the messages that a service consumes and produces but not what the service does in response to these messages i.e. the internal working of the service is not defined by an interface, only those specifications are described which facilitate the discovery of services according to a specific function and the operations supported by the service.

This encapsulation between the implementation and the interface of services serves four important aspects of services [TGB2004]:

Service Discovery

In a distributed computing environment service discovery is essential and may be needed not only at design time but also dynamically at run time.

Service Composition

Enables code reuse and dynamic construction of complex systems from simpler components.

Specialization of Services

This is required for the use of different implementations of a service on different platforms.

Interface Extension

Additional functionality can be added to services without losing the common characteristics defined in the common interface.

A service-oriented view of the grid is based upon the notion of various entities (represented as software agents) providing services to one another under various forms of contract (or service level agreement).

3.4 WEB SERVICES

Web services are an emerging integration architecture designed to allow independently operated information systems to intercommunicate. A web service can also be discovered through a standard mechanism (at run time or at design time) and invoked through a declared API, which can occur over a network. The function of web services of interconnecting information systems is similar to that of the grid. Also, web services standards provide description languages, platforms, common services and platform dependant tools.

In addition to WSDL (Web Services Description Language) [WSD2001], the web services community has defined or is defining a variety of other standards concerned with such issues as service registry, security, policy, service orchestration and grid services. These definition activities are taking place with W3C (World Wide Web Consortium), GGF (Global Grid Forum) [TGG], OASIS and other standards bodies.

WSDL is a standard used for defining software components or services in a manner that is independent of any particular programming language and encapsulates the implementation approach also. A WSDL service definition is a document encoded using the Extensible Markup Language (XML) [BXM2000].

Interoperability can only be gained when all the participating entities are using the same vocabulary for describing objects of common interest. Thus OGSA uses WSDL for describing properties of grid entities.

3.5 THE OPEN GRID SERVICES ARCHITECTURE

The OGSA [GSD2002] integrates both the Globus and web services approaches. It is the product of combining the flexible and dynamically bound integration architecture of Web Services with the scalable distributed architecture of the grid.

It aims to define core services for a variety of services including [GCM2003]:

Systems Management and Automation
Workload/Performance Management
Security
Availability/Service Management
Logical Resource Management
Clustering Services
Connectivity Management
Physical Resource Management

OGSA defines standard interfaces (portTypes in WSDL terminology) for basic grid services. It is an implementation independent specification. OGSA aims to define standards for different behaviors so that services are defined in standard ways regardless of their context. This would simplify application design and encourage code reuse.

A system compliant with OGSA is built by composing grid services. Each grid service is also a web service and is defined by WSDL. Certain extensions to WSDL are proposed for the grid properties to be defined, this extended version of WSDL is called Grid Services Description Language (GSDL).

The three principle elements of OGSA defined in [TGB2004] are the Open Grid Services Infrastructure, OGSA services and OGSA schemas.

Web services provided the base on top of which other issues of the grid architecture have to be dealt. Current web service standards do not address important issues such as when services are created, how long they live, how to manage faults, etc. These and other important service behavior issues must be standardized to enable service virtualization and service interoperability. These issues are addressed by a core set of interfaces called the Open Grid Services Infrastructure [OGS2003]. A web service that adheres to these standards is called a Grid Service.

Figure 3.7: The Core Elements of the Open Grid Services Architecture (shaded)

OGSI defines essential building blocks for distributed systems including standard interfaces. It describes the behaviors for discovering service attributes, creating service instances, managing service lifetimes etc.

EXAMPLE

There are multiple storage services each of which implements a standard storage service interface. These may be different in their implementation and may provide different quality of service. One may access only a simple local file system whereas the other may use distributed file system storage.

Figure 3.8: Services Involved in the Example

There may be additional services that provide different functionalities in order to correctly benefit from the storage services. These may include File Transfer Services, which handle transfer of files from one storage service to another. The File Transfer Service may in turn call upon other services such a Storage Broker Service to schedule access to storage services. Another service such as a Monitor Service may exist to indicate the status of different storage services.

3.6 GRID SERVICES

Along with a standard interface definition language the grid also requires standard semantics for service interactions to ensure interoperability among components. For example, different components should follow the same conventions for error notification so that clients or consumers of multiple services may be able to process these errors in a uniform and similar way regardless of which service generated the error.

A Grid Service is defined as a web service that provides a set of well-defined interfaces and that follows a specific convention [TPG].

These interfaces should be able to address the following issues:

Discovery
Dynamic Service Creation
Lifetime Management
Notification
Manageability
Authorization
Concurrency Control

The convention that the interfaces follow should address naming and upgradeability issues.

To be a Grid Service the component must implement certain portTypes, must comply with certain lifetime management requirements and must be uniquely identifiable by a Grid Service Handle (GSH) through out its lifetime.

A difference between a grid service and a web service is that the former is stateful and provides a standard mechanism to retrieve its state whereas a web service does not have a state.

3.7 TYPES OF GRIDS

3.7.1 DATA GRIDS

Data Grids are grid architectures oriented towards data-intensive applications [GCM2003]. However Data Grids and Computational Grids are built on a common infrastructure and hence the term Data Grid is used to address computational and data intensive applications that make use of very large datasets and may involve resources and users from geographically distributed locations.

Data intensive applications occur within both scientific and business domains.

Scientific Applications

Scientists require mechanisms to analyze, transfer, replicate and share huge amounts of data. The sharing entities in such applications may exist in different parts of the world. Examples of some data intensive applications are:

Data mining may utilize globally dispersed data sources in order to mine correlations or anomalies from data related to some specific problem. Astronomical data frequently involves the mining process and includes the processing of huge amounts of data. However they have fairly uniform metrics, units and other vocabulary as defined in [ADA1998].
Statistical Analysis is applied in high-energy physics applications on the data generated by events occurring in particle detectors and accelerators, such as the CERN particle accelerator [LHC, CLH] is expected to generate Giga Bytes of data, which would need to be processed.
Simulation and analysis are used in biological applications.

Business Applications

Commercial applications are also gradually beginning to realize the importance of grids. Some engineering design and pharmaceutical research and development problems are similar to the above mentioned scientific applications such that they also involve huge amounts of data and require more computing power.

Data mining is also being used in some business applications as this field evolves. In the financial sector it is starting to play a significant role in fraud detection, purchasing behaviors etc.

Another important class of commercial applications being explored focuses on integrated data access both within and among enterprises. This can be viewed as a form of distributed databases [PDD]. Structured data is available at various nodes and there is a need to govern the access, maintenance and security of not only the distributed data but also transactions carried out on it.

3.7.1.1 The Data

In most of the existing architectures the data management services are restricted to the handling of files. However, in principle Data Grids should be able to handle data elements from single bits to complex collections of data and even virtual data, which must be generated upon request. All kinds of data need to be identifiable through some mechanism that is able to uniquely identify and locate the data. In OGSA terminology this identifier is called Grid Data Handle (GDH).

The following are some types of data defined in [GCM2003] that Data Grids deal with:

Files

For many Virtual Organizations the data granularity level is a data file. Access of files is simpler and well understood, hence data management is easier. Security can be implemented through many well-known security mechanisms (Unix permissions, Access Control Lists etc).

File Collections

There may be a requirement of assigning a single name to multiple files (or a collection of files) so that they can be treated as one in grid operations. Semantically there are two different types of file collections: Confined Collections in which all files which make up the collection are always kept on the same resource and are treated as one, like in a zip or a tar file. Free Collections that are composed of files and other collections, which may not be on the same resource. Hence free collections provide greater flexibility in the movement of files across resources but the availability of any particular file in that collection is not guaranteed at any given time.

Relational Databases

The data defined by a GDH may correspond to the data in a Relational Database or to any table, row, view or other granularity level [DSP1990].

XML Database and Semi-structured Data

Data with loosely defined or irregular structure can be defined using the semi structured data model, which is represented using XML [BXM2000]. The GDH may correspond to any XML data object in an XML database.

Data Objects

This is the most generic form of a single data instance. The structure of an object is completely arbitrary and thus the grid services should be able to handle objects in which the type varies from object to object.

Virtual Data

Virtual data denotes secondary data or data that is generated at run time from primary data stored at a location. Additional services are required to handle such data.

Data Sets

Data sets differ from Free Collections only in that they can contain any kind of the above mentioned data types in addition to files. Such data is useful for archiving, logging and debugging purposes.

3.7.2 SEMANTIC GRIDS

A Semantic Grid is characterized as an open system in which users, software components and computational resources come and go on a continual basis. There should be a high degree of automation and flexible communication and collaboration between the resources, which are all owned by different stakeholders.

The computing infrastructure is characterized in [GCM2003] as consisting of three layers:

Data/Computation

This layer deals with the allocation, scheduling of computing resources and the transfer of data between resources in order to carry out a processing task. It deals with large volumes of data corresponding to heavy computation. This layer is built upon the Fabric layer of the Grid Layered Model, which may interconnect scientific equipment.

Information

This layer deals with the way information is stored, retrieved, shared and maintained. It includes the semantics and meanings of data units. For example, that an integer denotes the current pressure in a cylinder.

Knowledge

Knowledge services use the information provided to solve scientific problems or to make a decision. It deals with how knowledge is acquired, used, retrieved, published and maintained.

Figure 3.9: The Three Layered Semantic Grid Architecture

This is just a conceptual model and direct implementation is not feasible. However all grids have some element of these three layers in them. The service-oriented view is applicable at all three layers i.e. there are services, producers and consumers that use these services at all the three layers.

The key components of a service-oriented architecture are as follows:

Service Owners
Services
Service Consumers
Contracts (between Service Producers and Consumers)
Marketplace
Market Owner

All services have an owner (or a set of owners). The owner is the body that is responsible for offering the service for use by others. The owner sets the terms and conditions under which the service can be accessed and used. Thus, the owner may decide to make the service free for all and universally available or the owner may decide to put some limitations on the service access which may be for only a specific class of users, may be priority based or have a price on its usage.

The relationship between the service owner and consumer is defined by a contract, which specifies the terms, and conditions under which the owner agrees to provide the service to the consumer. This contract may define the price of the service, the expected output, the expected time taken and the penalties if failure by the service provider to do so.

The service owners and service consumers interact with one another in a specific environmental context. This may be open to all services, i.e. all services may interact in a common environment. There may be cases in which the environment is closed, i.e. membership may be limited according to some attributes. A particular environment is called a ‘Marketplace’ and the entity that runs and establishes the Marketplace is called the ‘Market Owner’. The market owner may be entities in the marketplace i.e. a producer or a consumer or it may be a neutral third party.

A Service Life Cycle as defined in [GCM2003] for e-science applications consists of the following steps:

Service Creation

The service needs to be defined using an appropriate Service Description Language. Service creation should be seen as a continuous activity. New services may come into the environment and existing ones may be removed at any time. Hence, no steady state is ever reached. A number of services may also be combined to form a new service.

Meta information needs to be specified which is associated with the service such as who can access this service and other contract options related to it. The service also needs to be advertised and maybe registered so that it is available in a marketplace.

Service Procurement

This phase occurs in a particular marketplace and involves a service owner and service consumer establishing a connection on the basis of a contract for the enabling of a particular service. This may fail if the two parties do not agree on a mutually acceptable agreement. This negotiation may be carried out offline by the respective service owner and consumer or it may be carried out dynamically at run time.

Service Enactment

After establishing a connection and agreeing on a contract the service owner has to undertake the necessary actions in order to fulfill the obligation as specified in the contract.

3.7.3 PEER-TO-PEER GRIDS

Peer-to-peer systems are Internet applications that harness resources of a large number of autonomous participants.

P2P and grid computing are both concerned with resource sharing within distributed environments. However, differences exist between the two. P2P technologies focus on resource sharing in environments characterized by millions of users with mutual distrust, most with homogenous desktop systems and low bandwidth connections to the Internet. The emphasis is on massive scalability and fault tolerance. Grid systems generally connect smaller groups of users, which are better connected and have a more diverse range of resources to share.

Figure 3.10: Comparison of Peer-to-Peer and Grid Computing Styles

However, the long-term objectives of the P2P and the grid seem to converge as both take on a broader view of scale and resource sharing. The relationship between the two has been presented in [DTC2003].

A P2P grid computer could combine the varied resources, services and power of grid computing with the global-scale, resilient and self-organizing properties of P2P systems. A P2P system provides lower-level services on top of which grid services infrastructure can be built to enable global distribution and resource sharing.

Figure 3.11: Middleware Peer (MP) Groups of Services at the edge of the Grid

Example architecture of a peer-to-peer grid may be in which peer groups are managed locally. They are then arranged in a global system supported by core servers. The grid controls central services whereas services at the edge are grouped into less organized middleware peer groups.

Chapter 4

ISSUES IN GRID COMPUTING

This chapter discusses the issues and problems pertaining to grid computing and grid-enabled applications. As this technology matures solutions to these matters are emerging, but more enhancement issues are also surfacing which would have to be dealt with in the future.

A Grid environment should be flexible, robust, coordinated and measurable while the resources themselves should be interoperable, manageable, available and extensible. In a Data Grid the data should be accessible from anywhere at any time, however the user is not necessarily interested in the exact location of the data. Also the users should not be concerned with issues and problems related to data conversion.

With respect to P2P systems work is required in two areas in order to broaden the range of computational tasks that can be treated with a massive distributed P2P system. First, the infrastructure needs to handle tightly coupled distributed computation better. There is a need for exploiting self-organizing properties, better timing and using innovative data transfer schemes to minimize communication overhead. Second, algorithms should be designed, specifically to exploit P2P properties.

Some of the issues arising in grid computing are:

4.1 AVAILABILITY AND FAULT TOLERANCE

The grid should be able to handle failures and unpredictable behavior of nodes. There should be a graceful fault tolerance mechanism so that the reliability of the whole system is not compromised. Policies need to be created that solve issues such as what would happen if a service is unavailable for a particular time, how service overload is dealt with, what happens if the Registry (in which all services are registered) becomes unavailable, how to deal with network partitioning and other network problems.

4.2 SCALABLITY

One of the main factors of the increasing popularity of the grid is its easy scalability. This is mentioned in the Requirements for OGSA [GCM2003, GSS2002, TPG]. It should be possible to add new services on the run and the grid environment should be able to handle the increased load while still remaining flexible. As such, there should be no limit to scalability.

4.3 AUTHENTICATION AND AUTHORIZATION

Authentication, authorization and policy are one of the most challenging issues related to grid computing. There is a difference between traditional authorization tactics deployed currently and the requirements of the grid.

In Client Server architecture, the client is the one requesting for a service from the server. The server machine determines whether the client is genuine and if authorization can be granted. In a grid environment the distinction between client and server tends to disappear. If machine A requests computing power from machine B, machine A is the client and machine B the server. However, at any other time machine B may become the client requesting, for example, storage space on machine A. Hence, authorization mechanisms are essential on both sides and request processing occurs only when both machines have agreed upon some contract parameters.

Authentication in a grid environment can be called Two-way authentication. The resource providers need some sort of assurance that they can enforce local policies and are able to block malicious users from attempting any harmful activity on their system. This should be possible locally and without the need to invoke some remote service. On the other side, users connecting to a resource need to be assured that their data cannot be compromised by local site administrators.

Grid security issues differ from the current security practices because the following features must be provided in a grid [TGB2004, GCM2003]:

Single Sign-on

Users must be able to log on or authenticate to the grid just once and then have access to multiple resources. Requiring a user to reauthenticate on each occasion is impractical. Authentication may require typing in a password.

Delegation

A job entered by a user may need to initialize sub programs, which would need to access resources itself. Hence there should exist the ability to delegate rights to programs. This can be done through the creation of a proxy credential.

Integration with Local Security Solutions

Each site of a grid may have its own security solutions in place. One site may be using Kerberos while the other may have employed Unix solutions. Grid security and authorization policies must be able to interoperate with these existing solutions.

Figure 4.12: Authentication, Authorization through Proxy

User-based Trust Relationships

A user may require access and usage of resources from more than one resource providers. In such cases it must not be required that the security system of the respective resource providers interact with each other.

There is a need to check how existing web service security mechanisms might interoperate with grid security infrastructures.

Questions that arise about grid security include how VO-wide security policies are to be applied, how are local security policies enforced and what relation exists between the global grid security mechanisms and the ones at the local site, is it possible for a user to belong to different VOs and use both resources even if the security mechanisms differ, should there exist a way for one VO to authenticate another VO and if so, how should it be implemented across heterogeneous platforms taking into account different security mechanisms.

4.4 INTEROPERABILITY AND COMPATIBILITY

Interoperability is an explicit requirement of the grid and is one of the driving concepts behind OGSA [GSD2002]. Web services, as well as grid services are designed such that the modules are highly interoperable. There is no uniform protocol required that each service has to speak. WSDL [WSD2001] descriptions are there to ensure interoperability.

Interoperability is very closely related to discovery because services that need to interoperate have to discover common protocols that they can use and agree on other parameters so that compatibility is ensured.

Service owners and consumers can be conceptualized as autonomous agents. Characterization of agents has been researched in [ABS1997]. Then the interaction between such agents means they should be able to interoperate in a meaningful way. Such interoperation is difficult to obtain in grids because the different agents will typically have their own individual information models.

It should be possible for any agent to establish a marketplace (a particular resource sharing environment). In order to create a marketplace the owner needs a representation scheme for describing the various entities that are allowed to participate in the market place, a means of describing how the various entities are allowed to interact with one another and what monitoring mechanisms are to be put in place, if any are needed.

4.5 RESOURCE MANAGEMENT AND SCHEDULING

The fundamental ability of the grid is to discover, allocate and negotiate the use of network-accessible capabilities. Resource management in traditional computing systems is a well-defined problem. Resource managers such as batch schedulers and operating systems exist that are local to a system and have complete control of a resource.

In a grid environment resource management is different and comparatively difficult because of many reasons [TGB2004]. First is the fact that the managed resources span multiple administrative domains. Heterogeneity also presents problems and there is a need for standard resource management protocols and standard mechanisms for expressing resource and task requirements. Different organizations operate their resources under different policies. Utilizing a resource means following the local policy in place. A task may require the use of multiple resources simultaneously, which may belong to different virtual organizations and so a mutual agreement will need to be established. A resource may also be shared among different virtual organizations. There is also the scenario of on-demand access, in which resource capability is made available at a specified point in time and for a specified duration. This is especially important if one wishes to coordinate the use of two or more resources. Co-scheduling for the grid involves scheduling multiple individual and heterogeneous resources so that multiple processes can be executed at the same time such that they may communicate and coordinate with each other [GCM2003].

4.6 INTEGRATION

There is a need to integrate services not just within but also across Virtual Organizations. Standards need to be defined so that services are integrated across VOs. For Data Grids there is also the issue of data integration. Different Virtual Organizations should be able to have secure and reliable data access.

4.7 ACCOUNTING AND PAYMENT

P2P systems such as SETI@home [SHA2002] rely on users simply volunteering their CPU resources. Introducing an economic model whereby resources are rented out adds a new complication of accounting for their use.

There is a difference between file and CPU resource sharing. File sharers have some degree of separation that allows them to upload and download files independently. Thus cooperation is of no disadvantage to them. However, interactive computing is of a bursty nature and so it may affect any local job being carried out. This is the reason accounting mechanisms of resource sharing are required to limit, monitor and prioritize sharing.

There should be a reliable and accurate method for accounting the usage of resources by a client and then the calculation of payment for it. Statistics need to be kept, published and monitored. Payment may be in the form of money paid to the service providers, or there may be provision for a client to share an equal amount of its own resources in return.

The payment methods and models will vary in the academic and business domains. If the market economy model is applied, in which every peer is free to set its own prices for resources and a stable global equilibrium is reached based on supply and demand; will global optimization be achieved with respect to resource supply and utilization? Methods to ensure fairness are to be determined. Lessons from economics and distributed algorithmic mechanism design will play an increasingly large part in the design of such systems.

4.8 MONITORABILITY (QoS METRICS)

Each Virtual Organization may implement different levels of QoS. However, Virtual Organizations need to interact and interoperate hence VOs should be able to fulfill many different QoS requirements. There may be many different parameters of Quality of Service.

There needs to be not only agreed metrics of QoS but also definitions from each service on how it will enhance or decrease certain QoS metrics. Another important property of a grid is ‘Measurability’. It is essential to have QoS metrics by which the Virtual Organization can measure itself and by which it can be measured by others. This plays an important role when it comes to billing and payment. However OGSA does not elaborate on QoS metrics. It is not mentioned in the Requirements for OGSA [GCM2003, GSS2002, TPG].

4.9 TRANSPARENCY

The grid should be transparent to the users. Virtual Organizations [TAG2001] may be dynamic, i.e. they may change over time in their members and also their capabilities. A grid should be able to adjust to these changes such that they are transparent to the users.

4.10 USER CONNECTIVITY

When considering the larger view of the grid it is essential to consider the network connections of the different types of nodes connected. Not only can the connection quality be low in some cases but also there are differences between dial-up, broadband and connections from academic or corporate networks. Hence, applications must consider the heterogeneity of their peer’s connections.

There may also be nodes that cannot accept incoming connections, maybe because they do not have any externally recognized IP address or because they are behind a separately administered firewall. These factors and others contribute towards complicating routing behaviors in real deployments.

Existing grid environments tend to comprise participants, which are connected by well-administered and reliable academic networks. However as more diverse nodes are connected, these issues may become more important. Work is still needed to ensure congestion free networks and guaranteed performance in large-scale distributed systems. Solutions may be obtained through localized traffic engineering and scheduling algorithms.

4.11 LOAD BALANCING

In an environment of heterogeneous resources and competing job requirements, load balancing is difficult. It involves a trade-off between the best allocation option of a job to a resource and the rate at which job and resource properties are distributed.

4.12 ANONYMITY

Some P2P systems offer privacy by masking user identities. Some go further and also mask content so that peers exchanging data do not know who delivered or stored which data. Research is needed to ensure if this would compromise grid security and whether this can be implemented on top of applications or middleware.

4.13 INDUSTRY SUPPORT

As yet the grid is still seen more as a scientific and academic technology rather than from its commercial perspective. Broad industry support is required in order to fully capitalize on the grid’s potential.

Chapter 5

DATABASES AND THE GRID

The previous chapters provided an overview of grid computing and the issues related to it. With respect to those factors, a framework for database integration with the grid has been proposed in this chapter. It discusses enhancements that can be made to existing database integration propositions regarding major database concepts such as Metadata (section 5.3.1), Query (section 5.3.2), Transaction (section 5.3.3) and grid concepts such as Scheduling (section 5.3.6) and Accounting (section 5.3.7).

5.1 BACKGROUND: DATABASES AND THE GRID

A database is a single large repository of data [DSP1990]. Distributed database is defined as [PDD]:

‘A collection of multiple, logically interrelated databases distributed over a computer network.’

A Distributed Database Management System (DDBMS) is then defined as [PDD]:

‘The software system that permits the management of the Distributed Database System and makes the distribution transparent to the users.’

Support for databases in a grid environment is becoming essential because of the gains that can be achieved by combining data from various sources. Users can search for data relevant to specific projects or subjects and be returned data sources from all over the world. Imagine scientific data at geographically dispersed sites A and B. Both are incomplete by themselves, and do not lead to some new theory. However, it is possible that by combining data from the two the results would be of more scientific value. An example of this is the data in astronomical laboratories all around the world. The amount of data is huge but it corresponds to a fairly uniform set of metrics, units and other vocabulary [ADA1998]. Integrating this data can lead to exciting new discoveries. However, this data is stored in databases, which not only correspond to different database models, but the DBMSs also differ.

Currently almost all grid applications are file-based, so very little has been done to integrate databases with the grid [DTG]. Complete standards have not been defined by the OGSA for database integration. Oracle 10g [ORA] claims support for the grid: ‘Oracle Database 10g is the first relational database designed for Enterprise Grid Computing’ [ODG2003].

The Globus Toolkit [TGP] is used for building computational grids. With reference to databases the documentation for version 2.4 states that:

‘The Grid Resource Information Service (GRIS) provides a uniform means of querying resources on a computational grid for their current configuration, capabilities and status. Such resources include but are not limited to:

Computational nodes
Data storage systems
Scientific instruments
Network links
Databases

The Grid Index Information Service (GIIS) provides a mans for identifying interesting resources where ‘interesting’ can be defined arbitrarily.’

Spitfire [PST2002], a European Data Grid project has developed an infrastructure that allows a client to query a relational database over GSI-enabled HTTP. The Open Grid Services Architecture-Data Access and Integration (OGSA-DAI) [ODA] project is both a framework and a tool to grid-enable existing structured data resources and provide a uniform interface to access distributed and heterogeneous data sources. It is a reference implementation of the GGF DAIS specifications [TGG].

5.2 ISSUES IN THE ACCESS AND INTEGRATION OF DATABASES INTO THE GRID

There are two main dimensions of complexity to the problem of integrating databases into the grid [DTG]: implementation differences between server products within a database paradigm and the variety of database paradigms.

Existing DBMSs do not provide Grid functionality, except for Oracle 10g whose grid implementation is controversial. [ROW] claims that the grid element of Oracle 10g is mainly an enhancement of the existing cluster features of Oracle 9i. In the article [INE], competitor IBM questions the grid implementation by Oracle, which is markedly different from that of IBM. The IBM approach [IAG] provides a virtual view of information, whereas in the Oracle version the servers are controlled by Oracle.

The current DBMSs [DSP1990] have been constructed after years of research and simply discarding them and creating new ones for grid-enabled databases is not feasible. Rather some changes should be made so that they are able to manage databases on the grid. As the grid becomes commercial, database vendors would themselves wish to provide grid support according to the emerging grid standards.

The integration of databases with the grid can be done by two different ways. Either by separately providing support for every type of database that exists or by providing a middleware with common support for all types of databases. The latter method is more favorable and is being researched so that existing databases need not be changed; rather a wrapper is implemented on top of them to provide additional grid functionality. If separate integration of every type of database to the grid is carried out, all the effort put in database research till now will be in vain because new models would have to be made from scratch. Also, it would take up too much time whereas the need for database access and integration with the grid is emerging now.

5.2.1 DATABASE REQUIREMENTS OF GRID APPLICATIONS

There are two sets of requirements that must be met [DTG]: firstly those that are generic across all components of grid applications and allow databases to be used within applications, and secondly, those that are specific to databases and allow database functionality to be exploited by grid applications.

A set of standards needs to be defined which would be implemented by all grid components so that there is uniform access to databases. Work being done by the Global Grid Forum [TGG] suggests that security, accounting, performance monitoring and scheduling will be important. It must be possible to specify all combinations of access restrictions (read, write, insert, delete, etc) and to have fine-grained control over the granularity of the data (table, row, column, etc) through grid applications. Role-based access also needs to be provided in which access permissions are based on the role that the user adopts.

There can also be many ways to execute a query. Results can be returned one by one in the form of a stream, or they can be returned in the form of a block. This would depend on the further analysis that needs to be done on the resultant data set.

Internally DBMSs make decisions on how to best execute a query through the use of cost models that are based on estimates of the costs of operations used within queries, data sizes and access costs. In a grid environment the DBMS needs to be provided with cost information related to resources as well so that it can decide not only which resource to run the query on, but also what mode of communication will be best in the transfer of data.

Grid applications will not only use the functionality provided by current databases but there are some requirements that are added when databases are integrated into the grid. These include scalability, unpredictable usage, need for metadata-driven access and the heterogeneity of databases. Grid applications can have extremely demanding performance and capacity requirements. Low response times and high access throughput is desired as there will be a large number of clients. Current databases have standard ways of access by the user. In a grid environment there will be open, ad-hoc access to databases. There will be a need to manage load as well as to prevent accidental or intentional damage. Current DBMSs provide little support for the control of related resources such as CPU, disk I/O, cache storage, etc. Monitoring and accounting services for these resources will also need to be defined.

5.3 THE PROPOSED FRAMEWORK

The proposed framework in [DTG] is service-based. A service-based distributed query processor for the grid has been described in [ODS]. The objective here is to describe and enhance these infrastructures with specific regard to distributed cost processing in a grid environment.

Following are the services described in [DTG] along with the proposed changes:

5.3.1 METADATA

This service provides access to technical metadata about the database and the set of services that it offers for Grid applications.

Metadata is data about data. It adds context to the data, aiding in its location, identification and interpretation. Key metadata includes the name and location of the data sources, the structure of the data held within it, data item names and description [DTG]. Metadata is very valuable in exploiting the full potential of grid-enable databases. When databases are published on the grid, their metadata is installed in a catalogue. Users search these catalogues for the relevant data they require. Metadata provides them with the location of the database storing the required data. The need for standards for metadata thus becomes important. All users accessing the data need to do so in a uniform manner and the results also need to be returned in a similar manner.

The information provided by metadata should include the following:

Physical and logical name of the database, ownership and version numbers.

A description of the contents of the database. There should be a standard string defined along with other detailed description so that querying for relevant data becomes easier. It can follow a pattern such as AREA_FIELD_TYPE_DETAILS. For example, for astronomical data the contents of a specific database holding information about the Gamma Rays with respect to the planet Mars can be described by SCIENCE_ASTRONOMY_NUMERIC_MARS-GAMMA-RAYS.

[DTG] defines Provenance as a type of metadata that provides information on the history of data. It includes the data’s creation, source, owner, what processing has taken place, the software versions, what analyses has been carried out, what results have been produced and the level of accuracy of the information. This can also be used by grid applications in narrowing down the search for required data. Provenance data should be stored in a separate structure such as:

Date of creation,

Source of the data,

Owner of the data,

What processing has taken place (with software versions),

What analyses the data has been used in,

What results have been produced (with links to databases where the results can be viewed),

Level of accuracy.

Referring again to the example of astronomy, a user may search for data about Mars using the following pattern:

SCIENCE_ASTRONOMY_?_MARS?

Where ? denotes a wild card.

The GIIS using the grid catalogue should return a list of databases that satisfy this content query. The user can narrow down the search results by specifying the following provenance structure:

>2002, ?, ?, Normalization_ABCSoftware, ?

This string means that the user requires data that should have been created after 2002 and on which normalization has been carried out using ABCSoftware.

The database schema is also defined as metadata. There are two ways to implement this: the schema can be defined using the local database model or one specific standard can be defined and all schemas should follow that model. [DTG] notes that the inclusion of a service federation middleware for heterogeneous databases seems to be a better option. So, the schema definition in metadata should be defined using the current model and the middleware should have support for all models so that it can translate between them.

Metadata should also include what functionality is offered by the database. Some databases may only provide read access while others may provide read and write access. Permissions may depend on the role of the user. Payment details can also be related to the roles. For example if three roles are provided by a specific database: Guest, Admin and Analyzer, and there are four kinds of permissions: Read, Write, Execute and Delete, there may exist rules that state:

Guest: read access

Admin: read, write, delete, execute access.

Analyzer: read, execute access.

Where execute means that the user can execute some operation on the data using some software or analysis capability provided by the resource. For obvious reasons, the payment for Analyzer would be greater than that for a Guest.

5.3.2 QUERY

Metadata should also specify what type of query language the database supports. The middleware services should provide support for all the major query languages. This can be achieved considering that the base for all languages is SQL.

The query service also needs to support query evaluation with respect to communication and scheduling. The service-based distributed query processor (OGSA-DPQ) [ODS] supports the evaluation of queries expressed in a declarative language over Grid Data Services [ODA]. OGSA-DPQ provides two services to fulfill its functions: The Grid Distributed Query Service (GDQS) and the Grid Query Evaluation Service (GQES). The GDQS provides the primary interaction interfaces for the user and acts as a coordinator between the underlying compiler/optimizer engine and the GQES instances. The GQES is used to execute a query sub-plan assigned to it by the GDQS. The Query Optimizer makes the decision on where to create the GQES instances. GQES instances are created and scheduled dynamically and their interaction is coordinated by the GDQS.

All databases that satisfy a user’s requirements can be used to create a Virtual Database System (DBS) [DTG]. This would present to the user a single integrated schema for the virtual DBS and queries will be accepted against it.

There are two steps in evaluating an input query. First the resource on which the query is to be run has to be decided. The communication and execution costs are to be determined. Secondly, the best possible way to execute the query on a resource has to be deduced. Current DBMSs support the internal cost evaluation of a query. So, support for only the first stage needs to be defined. With respect to communication costs, the factors that play a part are:

The size of the input data.
Node to node travel cost of the input data.
The estimate of the size of the resultant data set.
Node to node travel cost of the resultant data set.
Reliability of the network link.
Payment cost of the network link.

Figure 5.13: A Virtual Database System on the Grid

In order to determine the execution cost of a query on a resource, the functionality provided by current DBMSs can be exploited. The DBMSs can be provided the additional feature that calculates the execution time of an input query and returns the cost. However, this cost evaluation would differ from normal query optimization cost because factors such as CPU load and time, cache utilization (for temporary storage of results while they are being sent on the network), the payment costs of the database access and other resources such as CPU, I/O and cache, etc also need to be considered.

Current optimization strategies used by DBMSs include static and dynamic query optimization [PDD]. Optimization can be done statically before executing the query or dynamically as the query is executed. Static Optimization is done at query compile time. In this method statistics need to be maintained which is done on the basis of historical evidence. Considering this in grid context, the behavior of resources can be highly unpredictable. A resource available at one time may not be present for sharing next time. A time limit may be defined after which statistics for that resource would be deleted and not considered by the optimizer. In a Virtual Organization the optimizer would need to be updated about which resources are online on the grid. Once a resource is back online after a period of inactivity it may be queried by the services middleware for any changes in configuration so that statistics may be updated accordingly.

Dynamic query execution proceeds at query execution time. At any point of execution, the choice of the next best operation can be based on accurate knowledge of the results of the operations executed previously. However, this can be more expensive than static query optimization because it must be repeated for each execution of the query. The best option, then, seems to be the hybrid query optimization for grid-enabled databases. Statistics should be kept and maintained but the optimizer can switch to dynamic query optimization techniques if there is a high difference between predicted and actual costs.

It is also possible that some types of queries are better executed on some resources than others. For example consider the execution of the following query:

SELECT *

FROM EMP, PRJ

WHERE PRJ.EMPNO= EMP.NUM

Figure 5.14: Separate Interaction with Databases on the Grid

It is possible that this database is distributed and replicated such that at one node both tables (EMP and PRJ) are present and there are two other nodes that have one table each. A join operation is more easily carried out if both tables are present on the same node. However, the optimizer may choose to utilize the two nodes by transferring one table’s relevant data to the other node and then carrying out the join operation, if the communication overhead is considerably less than the single node option. Decisions such as this would require analysis of the query, the execution and communication resources.

Figure 5.15: Flowchart for Query Processing on the Grid

The compiler and optimizer can also decide to split the query across two nodes. For example if a query involves two joins, they can be executed on separate data nodes and results can then be combined.

The layers of query processing for normal distributed databases will also be followed. These include Query Decomposition, Data Localization, Global Optimization and Local Optimization [PDD].

It is also possible that the user requires the resultant dataset to be sent to some location other than the one from which the query was input. This scenario can be imagined as a scientist works from his home, accesses and analyses data from two databases, halfway across the world and stores the results in the database located at his office. The destination of the resultant dataset can also play a role in determining which resource to carry out the execution on. If the same data is available at a location near the office the optimizer would prefer that location so that the resultant data set can be transferred in the minimum possible time. Less time not only means more efficiency but also greater reliability because results of the user’s analysis are not kept on some out-of-station resource for a time longer than necessary.

5.3.3 TRANSACTION

In distributed database architecture issues such as concurrency and serializability arise. In the grid environment it will become very necessary to prevent access of data in any way that compromises the accuracy or reliability of data. Operations such as Commit, Begin and Rollback will need to be provided. These changes will need to be brought about in the local DBMS. Logs can be kept locally and also at other sites in the same Virtual Organization so that backup and reliability is ensured in case of damage to the original data. It is also possible that some types of transactions are not allowed at a resource. This needs to be specified in the metadata so that when a user searches for relevant databases the returned information also specifies the kinds of operations that can be performed.

Heterogeneity is also experienced at the transaction level. This can be overcome by the middleware services. There needs to be a translation mechanism so that the current databases need not be drastically changed.

5.3.4 BULK LOADING

As more and more data is accessed and saved on the grid, bulk loading will become important. Services such as GridFTP will need to be made more secure and optimized for large transfer of data. If a user specifies a bulkfile to be partitioned and uploaded to different databases according to relevant subject, the bulk load service of each individual DBMS can carry out the loading.

This can be visualized in a query execution instance where the resultant data set consists of data about two different topics such as the planet Mars and the Sun. The single file will need to be split and stored at the different nodes holding the respective databases. The database content description described previously can help in facilitating this. SCIENCE_ASTRONOMY_NUMERIC_SUN shows that the node only contains information about the sun and so the portion of file holding data about mars would need to be stored elsewhere.

5.3.5 NOTIFICATION

The grid also aims to provide a notification mechanism by which users can be informed if specific changes occur to data they are interested in. A client can specify that it should be notified if any additions, deletions or changes are made to a specific table (or at some other granularity level) of a database. If such an event occurs a trigger can fire and the client will be notified. Implementing a notification mechanism will become simpler if the underlying DBMS provides trigger support. The local trigger can then notify the grid middleware, which would locate the respective client and inform of the change.

5.3.6 SCHEDULING

Databases, like other grid resources will need to be scheduled for a particular task. A user may require a supercomputer and a database to be scheduled at the same time so that data from the database can be fed into and then processed by the supercomputer.

It should be noted that along with the database additional resources related to it, like CPU, disk, memory, network links, etc also need to be scheduled because the database will need to exercise them. Currently, there is no defined way as to how CPU, memory, etc usage can be controlled or accounted for. This functionality will need to be provided by the DBMSs locally so that grid applications can access that information. The grid middleware can only perform translations between models and provide selection and scheduling of resources, the internal information on a resource can only be utilized by the grid if it is made available in the first place.

5.3.7 ACCOUNTING

Payment and accounting schemes need to be defined. This can be done according to the roles defined for a database such as the Guest user can be charged less if he has less access permissions. Other than role based accounting, payment can also depend on the extent of time taken and resource utilization. When policies are created some clients may be given preference such as those within the same organization.

Consider an example in which the roles and permissions are defined as in section 5.3.1, i.e. three roles are provided by a specific database: Guest, Admin and Analyzer, and there are four kinds of permissions: Read, Write, Execute and Delete. Execute means that the user can execute some operation on the data using some software or analysis capability provided by the resource.

The accounting policy on a node belonging to the Virtual Organization: ABC can be defined as shown in the table:

Table 5.2: Example Accounting Policy

Chapter 6

PROPOSED GRID DEVELOPMENT FOR THE UNIVERSITY OF KARACHI

The University of Karachi was established in 1951 and currently has more than 40 departments. This research and education center has provided continuous growth in the knowledge domains relating to various fields. As the university campus is quite large in area, the different departments are scattered at some distances from each other. The campus-wide network aims to link all the computers in various laboratories. However, using grid technology not only can information from various research projects be shared easily, but the faculty and students will have hands-on access to all the resources available, at any department in the university.

A basic grid structure for the University of Karachi is proposed in this chapter. It aims to link the various departments while utilize existing resources as much as possible.

6.1 MOTIVATION FOR THE KU GRID

Departments such as Applied Physics, Chemistry, Mathematics and Microbiology are conducting modern scientific experiments. The facilities at these institutes may prove to be inadequate at times for high computational tasks. Instead of bringing in new resources, the computational resources already present in other departments of the university can be utilized through the grid. For example, the Department of Computer Science [CSK] has a number of computer laboratories whose idle CPU time can be effectively availed. The data from the scientific instruments located in the different departments can be analyzed on these machines and the results can be sent back on the same network. Administrative policies should not present a problem because of the common policies of the university departments.

6.2 PROPOSED GRID ARCHITECTURE

The architecture of the grid at the university would consist of large and small computational clusters, which can be connected via optical fiber. The grid may be divided into multiple administrative domains if required. Each of these may contain multiple clusters with a shared file system. An administrative domain may consist of only one department or more than one. The resources that need to be shared have to identified and monitoring and accounting will be done accordingly. When identifying resources for sharing, it needs to be considered that lightly loaded systems may turn into heavily loaded systems on the grid. Root administration will be required when installing resources onto the grid and here the administrative domain and policies will need to be explicitly defined and implemented. The security and privacy of individual resources should not be compromised by grid users. Local security mechanisms such as firewalls should co-exist with the grid security mechanisms. Some services such as GIS (Grid Information Service) [GIS2001] may require some ports to be open. These and other issues require that the security policies be negotiated and agreed upon.

All basic Grid services, such as security and authentication, resource management, static resource information and data management can be provided by the Globus Toolkit [TGP]. Jobs can be submitted to various clusters by accredited users. The Globus Toolkit’s monitoring and discovery service can be used to provide access to information such as operating system version, CPU load, metadata about applications and job execution histories. There will be a need to set up a Globus Certificate Authority to issue grid identity certificates. Large data transfer between administrative domains can be done through GridFTP. High performance communication can be done on a WAN network. However, for the university campus an optic fiber network can be created for the grid.

When clusters are made for resources in a department, there may be a need for a scheduler and batch system. One machine in the cluster can act as the server and connect to the grid, receive jobs and then distribute them among the client machines. Schedulers such as PBS can be used with batch systems such as BOSS.

Figure 6.16: Proposed Structure for the University of Karachi Grid

There shall be a need to monitor the various grid tools and resources. Each department should monitor and maintain its own grid resources so that they are always available when needed. A central grid administration can also be created for the sole task of maintaining the grid. The Certificate Authority may also rest with this administration. There will be criteria defined for the users who are allowed to access the grid resources. The Certificate Authority may accept or deny requests for certificates based on these policies.

Because many scientists, researchers and students may find it daunting to deal with grid infrastructure, a portal can be created so that interaction with the grid becomes easier. The Globus-Toolkit based portal can provide interfaces for authenticating users, accessing information on the system and its status, and scheduling and submitting jobs to resources within the individual domains. This can be run on a web server and the user can access it through Secure HTTP.

Figure 6.17: Issuing Certificate to Grid User

Like every research and education institute, the University of Karachi has a vast collection of books, papers, periodicals etc. Work on the digital library is also underway. The digital library can be incorporated into the grid. This would result in easier access to data along with usage of different resources in the library. A user can search for data relating to some past experiment, stored in storage resources on the grid and then use it in collaboration with modern experiments by allocating different resources on the grid to solving the task. Metadata will also play a major part in the search for related data.

Once, the university wide grid is deployed, researchers can look towards other sources of computation and data, from outside the university. Even if the major universities throughout Pakistan are integrated on the grid, there can be major advancements in every field of study. If an expensive instrument is in operation at some university in Islamabad, and the same experiment has to be carried out at the University of Karachi, instead of setting up all the equipment from scratch, the equipment in Islamabad can be rented. Hence, the education department will save not only cost but also time and this would result in greater technological advances.

Chapter 7

CONCLUSIONS

7.1 CONCLUSIONS

The grid is appropriately being called the ‘next generation Internet’. Once large-scale deployment of grid technology is carried out, it will revolutionalize our lifestyles just as the Internet has. However, there is still a long way to go before complete standards for all areas of the grid are defined.

This study covered an overview of grid technology and the problems being faced in its development. Research is being carried out at various centers around the globe and as these projects mature, more issues related to grid computing would surface. The basic problems currently faced are related to the heterogeneity and the integration factors due to the geographical distances between resources. Once issues such as scalability are overcome, standards need to be defined for resource monitoring and accounting. The grid can only become a commercial success if it serves to be economically feasible. Only then, would it gain the industry support that it needs to grow and advance. Currently, the academic and scientific institutions are keener in its progress. However, some commercial entities have also started showing interest in grid technology.

As grid technology matures, standards are emerging for basic Grid operations. The Globus Toolkit is being used by most Grid applications. The Globus Project and IBM’s Open Service Architecture group have proposed the Open Grid Services Architecture, which would serve to define standard behaviors and interfaces for Grid Services. These features would then be incorporated into the Globus Toolkit.

7.2 FUTURE WORK

The complete manifestations of any new technology can only be thoroughly understood when it is implemented and vigorously tested. A basic structure for a grid network for the University of Karachi has been proposed in this study. Once it is implemented more problems would surface that would help in improving the overall architecture. What is required is that practical work be carried out and the findings shared with other research and study institutions so that the standards being defined for grid technology can be collectively improved.

Further study should be carried out on the framework proposed in this research, related to database integration with the grid. Databases are playing a huge role in almost every field of life today. Integrating them with grid technology would greatly benefit not only this domain of science, but also have a positive effect on everyday life. Research work carried out at one center would be made accessible halfway across the globe. Cost can also be greatly reduced if instruments and equipment available at one place are used remotely from other locations rather than having them present locally.

With respect to grid applications, this study concludes that there will have to be changes made to current distributed applications to make them grid-enabled. However, these are not major changes and middleware such as the Globus Toolkit is already serving to minimize the grid functionality required in these applications. Once grid technology is vastly deployed, more and more applications will be grid-enabled, as vendors would start providing these services.

Appendix A

THE GLOBUS TOOLKIT

The Globus Project [TGP] provides software tools that make it easier to build computational grids and grid-based applications. These tools are collectively called the Globus Toolkit.

The Globus Toolkit consists of three primary components: GRAM, MDS and GridFTP. GRAM implements a resource management protocol, MDS implements an information services protocol, and GridFTP implements a data transfer protocol. They all use the GSI security protocol at the connection layer.

OVERVIEW OF GRAM

The Globus resource management architecture is a layered system in which a high-level global resource management services are layered on top of local resource allocation services. Additionally, the Information Service is provided by MDS. There is currently no Resource Broker in the Globus Toolkit.

The GRAM service is mainly provided by a combination of two programs: the gatekeeper, and the jobmanager. The gatekeeper provides functionality, which is similar to a secure inetd. It authenticates an inbound request using GSI, and determines how that user will be authorized locally by mapping it to a local user id. The incoming request specifies a particular local service to launch. That local service is usually a jobmanager.

The client composes its request in the Resource Specification Language (RSL), which is passed to the jobmanager by the gatekeeper. The jobmanager then parses the request, and translates it into the language of the local scheduler. The jobmanager will then return a contact string, which the client can use to check up on ongoing job status, or to cancel a job in progress.

By default, the Globus Toolkit is configured to trust the Globus Certificate Authority, and requests certificates from that CA as well. However, for members of a Virtual Organization (VO), trust should be established in a separate CA, and removed from the Globus CA.

The authentication does not use the certificate itself. Instead, it generates a proxy certificate, signed by the original certificate. This is done when grid-proxy-init is run. This proxy has a limited lifetime, and its private key is protected by filesystem permissions, not a key. The next step is authorization. By default, the gatekeeper is looking for a file called /etc/grid-security/grid-mapfile. This file should contain mappings of certificate subject names to local user names, like the following:

"/O=Grid/O=Globus/OU=your.domain.edu/CN=Name" user

A sample entry is:

"/O=Grid/O=Globus/OU=ku1.ku.edu.pk/CN=Amina"

OVERVIEW OF GRID SECURITY INFRASTRUCTURE (GSI)

The Globus Toolkit uses the Grid Security Infrastructure (GSI) for enabling secure authentication and communication over an open network. GSI provides a number of useful services for Grids, including mutual authentication and single sign-on.

OVERVIEW OF MDS

The Globus Metacomputing Directory Service (MDS) provides the necessary tools to build an LDAP-based information infrastructure for computational grids. MDS uses the LDAP protocol as a uniform means of querying system information from a rich variety of system components, and for optionally constructing a uniform namespace for resource information across a system that may involve many organizations.

The Grid Resource Information Service (GRIS) provides a uniform means of querying resources on a computational grid for their current configuration, capabilities, and status. The Grid Index Information Service (GIIS) provides a mechanism for identifying resources with certain criteria. For example, a GIIS could list all of the distributed data storage systems owned by a particular agency.

OVERVIEW OF GRIDFTP

GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. The GridFTP protocol is based on File Transfer Protocol (FTP).

INSTALLING THE GLOBUS TOOLKIT 2.4

The server bundle of the Globus Toolkit should be used for systems that will assign jobs whereas the client bundle should be installed on all machines in the cluster whose resources shall be used for the jobs. For example, if there are 10 machines in a cluster, which have to be shared, each of these should have the client installed. These 10 machines may accept job requests from a server located many miles away.

We installed the Globus Toolkit version 2.4 on a Linux 7.2 machine. The installation of Globus Toolkit will follow these steps:

Create a user named “globus”. This is optional, but recommended.
Create a location to install Globus.
Install GPT.
Use GPT to install the bundles.
Configure the installation.

CREATE A USER NAMED "GLOBUS"

Create an account named “globus” using whatever tool is appropriate for the system. The rest of these instructions should be performed as the user “globus”, except where noted.

[root@ku1]# su globus

CREATE A LOCATION TO INSTALL GLOBUS

Create a directory where the Globus Toolkit will be installed. It should be owned by the user named “globus” (or some user account, if the globus account was not created).

[globus@ku1]$ mkdir /home/globus/globusInstall

SETTING UP THE ENVIRONMENT

Before the Globus Toolkit can be built and installed, the environment needs to be setup. Set the environment variable GLOBUS_LOCATION to the directory in which the Globus Toolkit will be installed. Depending on the shell, this can be done as:

{csh} setenv GLOBUS_LOCATION <globus_install_dir>

{bash} export GLOBUS_LOCATION=<globus_install_dir>

Next GPT_LOCATION has to be set. This environment variable will point to the packaging tools needed to install the Globus Toolkit. It is recommended that GPT be installed in a different directory than the Globus Toolkit. Examples:

{csh} setenv GPT_LOCATION <gpt_install_dir>

{bash} export GPT_LOCATION=<gpt_install_dir>

INSTALLING GPT

The Globus Toolkit 2.4 uses the Grid Packaging Toolkit (GPT) packaging software. GPT is a multi-platform packaging system used to deploy Grid middleware. Release 2.4 of the Globus Toolkit should be built using GPT 3.0.1.

Unzip and untar the file:

[globus@ku1]$gzip -dc gpt-3.0.1-src.tar.gz | tar xf –

This will create a directory named “gpt-version/”. cd into the directory:

[globus@ku1]$ cd gpt-3.0.1

Run build_gpt. This will install GPT into $GPT_LOCATION.

[globus@ku1]$ ./build_gpt

All of the perl libraries will be installed in $GPT_LOCATION/lib/perl. All of the scripts will be installed into $GPT_LOCATION/sbin.

PACKAGING OVERVIEW

The Globus Toolkit 2.4 uses the Grid Packaging Technology for installation. A collection of GPT packages is called a “bundle”. There are two choices for installing the Globus Toolkit.

Install from a binary distribution. If the Globus Toolkit is required primarily to build a Grid, to develop Grid-enabled applications using the available libraries, or to use the available Grid tools, the precompiled binaries may be used. By doing this, storage space is saved and the compilation phase of installation can be skipped.

Install from a source distribution. If there is an intention to make changes to the Globus Toolkit code or debug the Globus Toolkit code at the source level, then the source code has to be compiled, built and the resulting libraries and programs installed.

In the experiment carried out, we used the binary distribution.

Source bundles

Source bundles are available for clients, servers, and SDKs. A complete installation of the toolkit consists of installing all of these bundles.

The server bundle will contain the necessary pieces to install the servers, but do not contain everything required to be a client of those servers. The client bundle contains the client tools, but not the server components. The SDK for a component contains the libraries and headers required to compile and link programs against that component. If it were desirable to install something like MPICH-G2 on top of Globus the SDK bundles would be required.

Binary bundles

Binary bundles have been built for specific platforms.

We installed the toolkit using the Linux binary bundle for the Intel i686 platform. To install the binary bundle, run the following command:

[globus@ku1]$ $GPT_LOCATION/sbin/gpt-install \

globus-all-2.4.0-i686-pc-linux-gnu-bin.tar.gz

Now there is a need to source the following file. To do so, first set GLOBUS_LOCATION. Then, depending on the shell, run:

{csh} source $GLOBUS_LOCATION/etc/globus-user-env.csh

{sh} . $GLOBUS_LOCATION/etc/globus-user-env.sh

Run the following commands to complete the installation:

[globus@ku1]$ $GPT_LOCATION/sbin/gpt-postinstall

For SDK bundles, run the following command to get a header file which other header files in the SDK bundle depend on. For <flavor>, specify the flavor of the binary installation. We used gcc32dbg as the build flavor for Linux.

[globus@ku1]$ $GPT_LOCATION/sbin/gpt-build <flavor> -nosrc

The flavor name and options for each bundle are based on the following table:

Table A: Flavors for Globus bundles.

Flavor names are a collection of compile time options. There is a selection between compilers (gcc, vendorcc, mpicc), architecture (32 or 64 bit), debugging or not, and threaded or not. It can also be chosen whether to make something static at build time. Therefore, looking at the table, we can see that the suggested flavors are for 32 bit architecture with debugging turned on, always using gcc, and using pthreads when threading is applicable. To see a master list of flavors, and what options they pass, install GPT and run globus-flavor-configuration -standard.

CONFIGURING THE INSTALLATION

The first thing to do is to complete the setup of GSI, the security software that Globus uses. To complete the setup of the GSI software, run the following command as root to configure the /etc/grid-security directory:

[root@ku1]# $GLOBUS_LOCATION/setup/globus/setup-gsi

Or if root privileges are not available, run setup-gsi with its ‘-nonroot’ option. This will install the grid security directory to $GLOBUS_LOCATION/etc. The ‘-nonroot’ option is intended to make client side installations of the Globus Toolkit possible without root access.

[root@ku1]# $GLOBUS_LOCATION/setup/globus/setup-gsi -nonroot

When it asks ‘if you wish to continue’, hit return and then type ‘q’ followed by another return.

VERIFICATION

To verify the installation of the Globus Toolkit, we ran some simple tests. The following procedure explains how to carry out testing of some of the basic functionality of the Globus Toolkit.

Step 1: Verifying the installation

To verify that the installation is coherent, i.e. that all package dependencies have been satisfied, run the command:

[globus@ku1]$ $GPT_LOCATION/sbin/gpt-verify

We got a message stating that the set of packages was coherent.

Step 2: Obtaining certificates

The Globus configuration is checked once a certificate has been obtained. Additionally, if we are running our own gatekeeper, we will have to request a certificate for our host as well. The gatekeeper must be run on a host, which keeps a consistent name (i.e., it should not be run on a computer using DHCP where a different name could be assigned to the computer).

First set the GLOBUS_LOCATION. Then, depending on the shell, run:

{csh} source $GLOBUS_LOCATION/etc/globus-user-env.csh

{sh} . $GLOBUS_LOCATION/etc/globus-user-env.sh

Now, to request a user certificate, simply run:

[globus@ku1]$ ./grid-cert-request

It will ask for a password to protect the key, and provide a set of instructions for how to mail the request to the Certificate Authority.

Once the certificate arrives, save it as ~/.globus/usercert.pem. In the end, there will be a userkey.pem and usercert.pem in the $HOME/.globus directory.

If there is a gatekeeper running on the machine, run the following command as root to get a gatekeeper certificate, replacing <FQDN> with the fully qualified hostname of the machine.

[root@ku1]# grid-cert-request -service host -host <FQDN>

Then the contents of /etc/grid-security/hostcert_request.pem should be mailed to the CA. Once the certificate arrives, save it to /etc/grid-security/hostcert.pem. This should be done as root as this file should be owned by root with permissions 600.

GLOBUS SIMPLE CA PACKAGE

If the grid is part of a virtual organization with its own Certificate Authority, the Globus Simple CA package is not needed. However, in our test scenario, we did not have access to a CA. Thus a CA was set up using the Globus Simple CA package.

The Globus Simple CA package provides a convenient method of setting up a Certificate Authority (CA) that is interoperable with the Globus Toolkit. It is intended for users of small, test grid environments, or users that are not part of a larger grid. SimpleCA provides a wrapper around the openssl CA functionality and is sufficient for simple Grid services.

INSTALLATION

Before proceeding with the setup, the Globus Toolkit 2.X or greater should be installed. Next, set the GLOBUS_LOCATION environment variable to the Globus Toolkit 2.X installation. Then run the gpt-build command to install the globus_simple_ca bundle:

[globus@ku1]$ $GLOBUS_LOCATION/sbin/gpt-build \

globus_simple_ca_bundle-latest.tar.gz <flavor type>

Depending on the Globus installation, the gpt-build executable may be at GPT_LOCATION.

In order for the system to know the location of the Globus Toolkit commands installed, set the environment variable and source the globus-user-env.sh script.

As the user globus, set GLOBUS_LOCATION to where the Globus Toolkit was installed. This will be one of the following depending on the shell:

{csh} setenv GLOBUS_LOCATION <globus_install_dir>

{bash} export GLOBUS_LOCATION=<globus_install_dir>

Then source $GLOBUS_LOCATION/etc/globus-user-env. .sh is for Bourne shell, .csh for C shell.

CREATING USERS

Make sure the following users exist:

A user account, which will be used to run the client programs.
A generic globus account, which will be used to perform administrative tasks such as starting and stopping the container, deploying services, etc. This user will also be in charge of managing the SimpleCA. This account should have read and write permissions in the $GLOBUS_LOCATION directory.

RUNNING THE SETUP SCRIPT

A script was installed to set up a new SimpleCA. Run this script once per grid.

[globus@ku1]$ $GLOBUS_LOCATION/setup/globus/setup-simple-ca

Subject name: This script prompts for information about the CA we wish to create:

The unique subject name for this CA is:

cn=Globus Simple CA, ou=simpleCA-ku1.ku.edu.pk, ou=GlobusTest, o=Grid

Do you want to keep this as the CA subject (y/n) [y]:

The Common Name (cn) is Globus Simple CA , which identifies this particular certificate as the CA certificate within the GloubusTest/simpleCA-hostname domain.

The Organizational Unit (ou) is GlobusTest , and the second ou is specific to the hostname. That identifies this CA from other CAs created by SimpleCA by other people.

The organization is Grid.

Press y to keep the default subject name.

Email: The next prompt looks like:

Enter the email of the CA (this is the email where certificate

requests will be sent to be signed by the CA):

Enter the email address where to receive certificate requests.

Expiration: Then the screen shows the following message:

The CA certificate has an expiration date. Keep in mind that once the CA certificate has expired, all the certificates signed by that CA become invalid. A CA should regenerate the CA certificate and start re-issuing ca-setup packages before the actual CA certificate expires. This can be done by re-running this setup script. Enter the number of DAYS the CA certificate should last before it expires. [default: 5 years (1825 days)]:

This is the number of days for which the CA certificate is valid. Once this time expires, the CA certificate will have to be recreated, and all of its certificates regranted.

Accept the default.

Passphrase:

Generating a 1024 bit RSA private key

........++++++

................++++++

writing new private key to

'/home/globus/.globus/simpleCA//private/cakey.pem'

Enter PEM pass phrase:

The passphrase of the CA certificate will be used only when signing certificates (with grid-cert-sign). It should be hard to guess, as its compromise may compromise all the certificates signed by the CA.

Enter the passphrase, which must not contain any spaces. This gives us:

A self-signed certificate has been generated for the Certificate Authority with the subject:

/O=Grid/OU=GlobusTest/OU=simpleCA-ku1.ku.edu.pk/CN=Globus Simple CA

If this is invalid, rerun this script setup/globus/setup-simple-ca and enter the appropriate fields.

The private key of the CA is stored in

/home/globus/.globus/simpleCA//private/caky.pem

The public CA certificate is stored in

/home/globus/.globus/simpleCA//cacert.pem. The distribution package built for this CA is stored in

/home/globus/.globus/simpleCA//globus_simple_ca_68ea3306_setup-0.17.tar.gz

This information will be important for setting up other machines in the grid. The number 68ea3306 in the last line is known as the CA hash. It will be some 8 hexadecimal digit string.

Press any key to acknowledge this screen. The CA setup package finishes installing and ends the procedure with the following reminder:

**********************************************************************

Note: To complete setup of the GSI software you need to run the following script as root to configure your security configuration directory:

/opt/gt3/setup/globus_simple_ca_68ea3306_setup/setup-gsi

For further information on using the setup-gsi script, use the –help option. The -default option sets this security configuration to be the default, and -nonroot can be used on systems where root access is not available.

**********************************************************************

setup-ssl-utils: Complete

FINALIZING GSI

To finish the setup of GSI, run as root (or, if no root privileges are available, add the -nonroot option to the command line):

[root@ku1]# GLOBUS_LOCATION/setup/globus_simple_ca_CA_Hash_setup/setup-gsi

–default

The output should look like:

setup-gsi: Configuring GSI security

Installing

/etc/grid-security/certificates//grid-security.conf.CA_Hash...

Running grid-security-config...

Installing Globus CA certificate into trusted CA certificate directory...

Installing Globus CA signing policy into trusted CA certificate directory...

setup-gsi: Complete

REQUESTING AND SIGNING HOST CERTIFICATES

It is essential to request and sign a host certificate and then copy it into the appropriate directory for secure services. The certificate must be for a machine, which has a consistent name in DNS; it should not be run on a computer using DHCP where a different name could be assigned to the computer.

Request a host certificate: As root, run:

[root@ku1]# grid-cert-request -host 'hostname'

This creates the following files:

/etc/grid-security/hostkey.pem
/etc/grid-security/hostcert_request.pem
(an empty) /etc/grid-security/hostcert.pem

Sign the host certificate. As the user globus, run:

[globus@ku1]$ grid-ca-sign -in hostcert_request.pem -out hostsigned.pem

A signed host certificate, named hostsigned.pem is written to the current directory. When prompted for a passphrase, enter the one specified earlier (for the private key of the CA certificate.)

As root, move the signed host certificate to /etc/grid-security/hostcert.pem. The certificate should be owned by root, and read-only for other users. The key should be read-only by root.

REQUESTING AND SIGNING USER CERTIFICATES

Users also must request user certificates, which will be signed using the globus user.

Request a user certificate: As a normal user (not globus), run:

[amina@ku1]$ grid-cert-request

After a passphrase is entered, this creates

~$USER/.globus/usercert.pem (empty)
~$USER/.globus/userkey.pem
~$USER/.globus/usercert_request.pem

Email the usercert_request.pem file to the SimpleCA maintainer.

Sign the user certificate: as the SimpleCA owner globus, run:

[globus@ku1]$ grid-ca-sign -in usercert_request.pem -out signed.pem

When prompted for a password, enter the one specified earlier (for the private key of the CA certificate.) Now send the signed copy (signed.pem) back to the user who requested the certificate.

As the normal user account (not globus), copy the signed user certificate into ~/.globus/ and rename it as usercert.pem, thus replacing the empty file. The certificate should be owned by the user, and read-only for other users. The key should be read-only by the owner.

To test that the SimpleCA certificate is installed in /etc/grid-security/certificates and that the certificate is in place with the correct permissions, run:

[amina@ku1]$ grid-proxy-init -debug -verify

After entering the passphrase, successful output shows the location of the certificate files, the user’s identity and validity of the proxy.

CHANGING THE OWNERSHIP AND ACCESS PERMISSIONS

Run the setperms.sh script to change the ownership of some Globus files under the $GLOBUS_LOCATION/bin directory. This step allows resource management tools to run as root. As root, run:

[root@ku1]# $GLOBUS_LOCATION/bin/setperms.sh

ADD AUTHORIZATION

To add authorizations for users, create /etc/grid-security/grid-mapfile as root. Two pieces of information are required: the subject name of a user, and the account name it should map to. The syntax is one line per user, with the certificate subject followed by the user account name.

Run grid-cert-info to get the subject name, and whoami to get the account name, such as:

[globus@ku1]$ grid-cert-info –subject

/O=Grid/OU=GlobusTest/OU=simpleCA-ku1.ku.edu.pk/OU=ku.edu.pk/CN=Amina Bukhari

[globus@ku1] whoami

globus

The corresponding line in the grid-mapfile:

"/O=Grid/OU=GlobusTest/OU=simpleCA-ku1.ku.edu.pk/OU=ku.edu.pk/CN=Amina Bukhari" globus

STARTING GRAM

To setup a full gatekeeper, make the following modifications as root:

In /etc/services, add the service name “gsigatekeeper:” to port 2119.

gsigatekeeper 2119/tcp # Globus Gatekeeper

Depending on whether the host is running inetd or xinetd, there is a need to modify its configuration. Our host was running xinetd. For xinetd, add a file called “globus-gatekeeper” to the /etc/xinetd.d/ directory that has the following contents.

service gsigatekeeper

{

socket_type = stream

protocol = tcp

wait = no

user = root

env = LD_LIBRARY_PATH=GLOBUS_LOCATION/lib

server = GLOBUS_LOCATION/sbin/globus-gatekeeper

server_args = -conf GLOBUS_LOCATION/etc/globus-gatekeeper.conf

disable = no

}

After adding the globus-gatekeeper service to xinetd, notify xinetd that its configuration file has changed.

[root@ku1]# /etc/rc.d/init.d/xinetd restart

The “reload” option may also be used, if it is supported on the system.

The gatekeeper will start up when a connection comes in to port 2119, and will keep a log of its activity in $GLOBUS_LOCATION/var/globus-gatekeeper.log. However, it does not yet have any authorization mapping between certificate subjects and usernames. This is the reason why the file /etc/grid-security/grid-mapfile is created.

RUNNING A JOB

After the complete installation of the tools mentioned above, we ran a test job. A program written in C language ‘hello.c’ was run using Globus whose executable was saved as /root/hello.

The following steps were followed:

cd to the Globus directory:

# cd $GLOBUS_LOCATION/bin

In our case this was:

[root@ku1]# cd /home/globus/globusInstall/bin

Change the user to globus:

[root@ku1]# su globus

Set up the environment:

[globus@ku1]$ export GLOBUS_LOCATION =

/home/globus/globusInstall

[globus@ku1]$ export GPT_LOCATION = /home/globus/

Source the environment script:

[globus@ku1]$ sh

# . /home/globus/globusInstall/etc/globus-user-env.sh

# bash

Get proxy certificate:

[globus@ku1]$ ./grid-proxy-init

Enter password key.

Start the gatekeeper:

[globus@ku1]$./globus-personal-gatekeeper –start

Run the job:

[globus@ku1]$ ./globus-job-run pc22.ku.edu.pk /root/hello

The output from the program /root/hello is seen on the screen.

REFERENCES

(In Alphabetical Order)

Books

[BXM2000] Hunter, Cagle, Gibbons, Ozu, Pinnock and Spencer. Beginning XML. Wrox Publishers. 2000.

[DSP1990] Catherine Ricardo. Database Systems: Principles, Design and Implementation. Macmillan Publishing Company. 1990.

[GCM2003] Berman, Fox and Hey. Grid Computing – Making the Global Infrastructure a Reality. John Wiley and Sons Publishers. 2003.

[HLE2001] Hatch, Lee and Kurtz. Hacking Linux Exposed: Linux Security Secrets and Solutions. The McGraw Hill Companies. 2001.

[PDD] M.Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems. Second Edition.

[TGB2004] Foster and Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Second edition. Morgan Kaufmann Publishers. 2004.

[TII1994] Stevens. TCP/IP Illustrated Volume 1 – The Protocols. Addison-Wesley Publications. 1994.

Websites

[BGM] Butterfly Grid for Multiplayer Games. .

[BIR] Biomedical Informatics Research Network BIRN Grid. .

[CLH] The Compact Muon Solenoid Technical Proposal, CERN/LHCC and CERN LHCC-P1. .

[CSK] Department of Computer Science, University of Karachi. .

[DAM] Distributed Aircraft Maintenance Environment DAME. .

[EMG] Everquest Multiplayer Gaming Environment. .

[FAF]

[IAG] .

[INE] .

[LHC]

[MGD] MyGrid – Directly Supporting the e-Scientist. .

[MGM] MCell: General Monte Carlo Simulator of Cellular Microphysiology. .

[MKP] .

[NIP] NASA Information Power Grid. .

[ODA] OGSA-DAI. .

[ORA] Oracle. .

[PAE] .

[PDB] Protein Data Bank Worldwide Repository for the Processing and Distribution of 3-D Biological Macromolecular Structure Data. .

[PST2002] Gavin McCance. Project Spitfire: Towards Grid Web Service Databases. European DataGrid Project. Data Management Work Package (WP2). GGF5 DAIS. July 22, 2002. .

[ROW] Mark Rittman's Oracle Weblog.

[SET] .

[TGG] The Global Grid Forum. .

[TGP] The Globus Project. .

White Papers

[ABS1997] Wooldridge. Agent-Based Software Engineering. IEEE Proceedings on Software Engineering. 1997.

[ADA1998] Mehringer, Plante and Roberts. Astronomical Data Analysis Software and Systems VIII, ASP Conference Series, Vol. 172. Astron. Soc. Pac., San Francisco. 1998.

[AGI2000] Childers, Disz, Olson, Papka, Stevens and Udeshi. Access Grid: Immersive Group-to-Group Collaborative Visualization. Proceedings of the 4th International Immersive Projection Technology Workshop. 2000.

[CBA1999] Thompson, Johnston, Mudumbai, Hoo, Jackson and Essiari. Certificate-Based Access Control for Widely Distributed Resources. Proceedings of the 8th Usenix Security Symposium. 1999.

[CDA] Leigh, Johnson and DeFanti. CAVERN: A Distributed Architecture for Supporting Scalable Persistence and Interoperability in Collaborative Virtual Environments. Virtual Reality: Research, Development and Applications.

[DIN1999] Nakada, Sato and Sekiguchi. Design and Implementations of Ninf: Towards a Global Computing Infrastructure, Future Generation Computing Systems.

[DTC2003] Foster and Iamnitchi. On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing. 2nd International Workshop on Peer-to-Peer Systems, Berkeley, CA. LNCS. 2003.

[DTG] Paul Watson. Databases and the Grid. Version 3.1.

[GEM1998] Foster and Karonis. A Grid-Enabled MPI: Message Passing in Heterogeneous Computing Environment. Proceedings of the SC’98, 1998.

[GIS2001] Czajkowski, Fitzgerald, Foster and Kesselman. Grid Information Services for Distributed Resource Sharing. 2001.

[GSD2002] Foster, Kesselman, Nick and Tuecke. Grid Services for Distributed System Integration. IEEE Computer. 2002.

[GSS2002] Tuecke, Czajkowski, Foster, Frey, Graham and Kesselman. Grid Service Specifications. 2002. .

[NSN1997] Casanova and Dongarra. NetSolve: A Network Server for Solving Computational Science Problems. International Journal of Supercomputer Applications and High Performance Computing. 1997.

[ODG2003] Penny Avril. Oracle Database 10g: A Revolution in Database Technology. An Oracle White Paper. December 2003.

[ODS] Alpdemir, Mukherjee, Gounaris, Paton, Watson, Fernandes and Smith. OGSA-DQP: A Service-Based Distributed Query Processor for the Grid.

[OGS2003] Tuecke, Czajkowski, Foster, Frey, Graham, Kesselman, Maquire, Sandholm, Snelling and Vanderbilt. Open Grid Services Infrastructure (OGSI), Version 1.0. Technical Report, Open Grid Services Infrastructure WG, Global Grid Forum. 2003.

[OIW] DeFanti, Foster, Papka, Stevens and Kuhfuss. Overview of the I-WAY: Wide Area Visual Supercomputing.

[SED2001] Allcock, et al. Secure, Efficient Data Transport and Replica Management for High Performance Data-Intensive Computing. Mass Storage Conference. 2001.

[SHA2002] Anderson, Cobb, Korpella, Lebofsky, Werthimer. SETI@home: An Experiment in Public-Resource Computing. Communications of the ACM. 2002.

[TAG2001] Foster, Kesselman and Tuecke. The Anatomy of the Grid – Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, Sage Publications Inc. 2001.

[TPG] Foster, Kesselman, Nick and Tuecke. The Physiology of the Grid, An Open Grid Services Architecture for Distributed Systems Integration. .

[TSP2002] Laszewski, Su, Foster and Kesselman. The Sourcebook of Parallel Computing. Morgan Kaufmann Publishers. 2002.

[WSD2001] Christensen, Curbera, Meredith and Weerawarana. Web Services Description Language (WSDL) 1.1. 2001. .

The intention of this study is to analyze and explore the emerging field of grid technology. It delves into how the grid is being used to enhance the capabilities of existing distributed systems and data resources.

ABSTRACT

TABLE OF CONTENTS

LIST OF FIGURES

LIST OF TABLES

FUNDAMENTALS OF GRID COMPUTING

1.1 INTRODUCTION TO GRID COMPUTING

1.2 HISTORY OF THE GRID

1.3 FUNDAMENTAL CONCEPTS

1.3.1 RESOURCE SHARING

1.3.1.1 Computation

1.3.1.2 Storage

1.3.1.3 Communication

1.3.1.4 Software Usage

1.3.2 VIRTUAL ORGANIZATIONS

1.3.3 PARALLEL PROCESSING

1.4 BASIC BLOCKS OF THE GRID

1.4.1 NETWORKS

1.4.2 COMPUTATIONAL NODES ON THE GRID

1.5 MISCONCEPTIONS ABOUT THE GRID

GRID APPLICATIONS

2.1 LIFE SCIENCE APPLICATIONS

2.2 ENGINEERING ORIENTED APPLICATIONS

2.3 PHYSICAL SCIENCE APPLICATIONS

Data Intensive Applications

Commercial Applications

2.4 THE COMPACT MUON SOLENOID EXPERIMENT

2.5 I-WAY

THE GRID ARCHITECTURE

3.1 THE LAYERED GRID ARCHITECTURE

This is a preview of the whole essay

THE GRID ARCHITECTURE

3.1 THE LAYERED GRID ARCHITECTURE

3.1.1 FABRIC: CONTROL INTERFACES TO LOCAL RESOURCES

3.1.2 RESOURCE: SHARING SINGLE RESOURCE

3.1.3 CONNECTIVITY: EASY AND SECURE COMMUNICATION

3.1.4 COLLECTIVE: COORDINATING MULTIPLE RESOURCES

3.1.5 APPLICATIONS

3.2 COMPARSION OF THE GRID MODEL WITH THE INTERNET LAYER MODEL

3.3 SERVICE-ORIENTED ARCHITECTURE

3.4 WEB SERVICES

3.5 THE OPEN GRID SERVICES ARCHITECTURE

3.6 GRID SERVICES

3.7 TYPES OF GRIDS

3.7.1 DATA GRIDS

3.7.1.1 The Data

3.7.2 SEMANTIC GRIDS

3.7.3 PEER-TO-PEER GRIDS

ISSUES IN GRID COMPUTING

4.1 AVAILABILITY AND FAULT TOLERANCE

4.2 SCALABLITY

4.3 AUTHENTICATION AND AUTHORIZATION

4.4 INTEROPERABILITY AND COMPATIBILITY

4.5 RESOURCE MANAGEMENT AND SCHEDULING

4.6 INTEGRATION

4.7 ACCOUNTING AND PAYMENT

4.8 MONITORABILITY (QoS METRICS)

4.9 TRANSPARENCY

4.10 USER CONNECTIVITY

4.11 LOAD BALANCING

4.12 ANONYMITY

4.13 INDUSTRY SUPPORT

DATABASES AND THE GRID

5.1 BACKGROUND: DATABASES AND THE GRID

5.2 ISSUES IN THE ACCESS AND INTEGRATION OF DATABASES INTO THE GRID

5.2.1 DATABASE REQUIREMENTS OF GRID APPLICATIONS

5.3 THE PROPOSED FRAMEWORK

5.3.1 METADATA

5.3.2 QUERY

5.3.3 TRANSACTION

5.3.4 BULK LOADING

5.3.5 NOTIFICATION

5.3.6 SCHEDULING

5.3.7 ACCOUNTING

PROPOSED GRID DEVELOPMENT FOR THE UNIVERSITY OF KARACHI

6.1 MOTIVATION FOR THE KU GRID

6.2 PROPOSED GRID ARCHITECTURE

CONCLUSIONS

7.1 CONCLUSIONS

7.2 FUTURE WORK