The intention of this study is to analyze and explore the emerging field of grid technology. It delves into how the grid is being used to enhance the capabilities of existing distributed systems and data resources.

Authors Avatar

Issues and Applications of Grid Computing

A thesis submitted in partial fulfillment of the requirements for the degree of
Bachelor of Science (Computer Science)



ABSTRACT

The intention of this study is to analyze and explore the emerging field of grid technology. It delves into how the grid is being used to enhance the capabilities of existing distributed systems and data resources. The characteristics of virtual organizations and their participation in implementing a grid structure are observed. The issues surfacing in grid implementation and their possible solutions are discussed. Enhancements and modifications are proposed for existing frameworks for database integration with the grid. A basic grid structure for the Department of Computer Science, University of Karachi has been planned out. The Globus Toolkit, used in grid middleware is tested and run on available resources.


TABLE OF CONTENTS

ACKNOWLEDGEMENTS        v


LIST OF FIGURES

Figure 1.1: Virtual Organizations        6

Figure 2.2: The Compact Muon Solenoid Experiment        13

Figure 2.3: An I-WAY Point of Presence (I-POP) Machine        16

Figure 3.4: The Layered Grid Architecture        19

Figure 3.5: The Layered Grid Architecture with respect to Services and APIs        24

Figure 3.6: The Layered Grid Architecture and its Relationship to the Internet Protocol Architecture        26

Figure 3.7: The Core Elements of the Open Grid Services Architecture (shaded)         30

Figure 3.8: Services Involved in the Example        31

Figure 3.9: The Three Layered Semantic Grid Architecture        37

Figure 3.10: Comparison of Peer-to-Peer and Grid Computing Styles        40

Figure 3.11: Middleware Peer (MP) Groups of Services at the edge of the Grid        41

Figure 4.12: Authentication, Authorization through Proxy        45

Figure 5.13: A Virtual Database System on the Grid        59

Figure 5.14: Separate Interaction with Databases on the Grid        61

Figure 5.15: Flowchart for Query Processing on the Grid        62

Figure 6.16: Proposed Structure for the University of Karachi Grid        68

Figure 6.17: Issuing Certificate to Grid User        69

LIST OF TABLES

Table 2.1: US-CMS Grid Resources [TGB2004]        14

Table 5.2: Example Accounting Policy        66


Chapter 1

FUNDAMENTALS OF GRID COMPUTING

The notion of linking people, computers, sensors and data with networks is decades old. However, the grid concept has gradually evolved and now, for the first time there is a coherent description of the hardware, software and applications required to create a functioning and persistent grid. Grid computing will prove to be one of the most significant developments of this age. This chapter introduces the concept of grid computing and deals with some major misconceptions related to the grid.

  1. 1.1 INTRODUCTION TO GRID COMPUTING

In simplest of terms grid computing is distributed computing taken to the next higher level. Whereas distributed computing is being implemented at a large scale through out the world, grid computing has been around for just a few years and is still in its development stages.

The goal is to create the illusion of a simple yet large and powerful self-managing virtual computer out of a large collection of connected systems, which may vary in the number of resources they share.

The grid is called the ‘next generation Internet’ [GCM2003, TAG2001]. The Internet came into being when communication was established between heterogeneous locations. This communication includes file sharing, access to web sites, video conferencing etc. Grid computing is taking this communication one step further on the level of resource sharing of individual systems.

Grid computing is different from conventional distributed computing by its focus on large-scale resource sharing, greater processing and computation capabilities and inventive applications utilizing mass parallelism.

  1. 1.2 HISTORY OF THE GRID

The origins of the grid can be linked to parallel computing. Research on parallel computing in the 1980s focused on the development of algorithms, programs and architectures that supported simultaneity. During the same time, researchers from multiple disciples began to come together to attack problems in science and technology that required large-scale computational resources. The problems faced in multidisciplinary problems and the geographically dispersed collaborations between them provided the coordination and distribution experience essential for creating the grid.

There are three generations of grid defined in [GCM2003]. The early grid projects linked supercomputing sites and provided computational resources to high-performance applications. Two projects in the first generation were FAFNER [FAF] and I-WAY [OIW]. FAFNER stands for Factoring via Network-Enabled Recursion. Contributors downloaded and built a daemon, which became their web client that used HTTP protocol to GET values from and POST the results from computations back to a CGI (Common Gateway Interface) script on the web server. FAFNER was capable of running on any workstation with more than 4 MB of memory. The Information Wide Area Year (I-WAY) was a year-long experimental effort to link many high-performance computers and advanced visualization environments. It connected a dozen ATM testbeds, seventeen supercomputer centers, five virtual reality research sites and over sixty application groups. The I-WAY was successfully demonstrated at Supercomputing ’95. Even though both projects lacked scalability, FAFNER was the forerunner for projects like SETI@home [SET] and I-WAY for Globus [TGP] and Legion.

Second-generation systems focus on middleware to support large-scale data and computation. Middleware is generally considered to be the layer of software sandwiched between operating system and applications, providing a variety of services to the applications [GCM2003]. In a Grid environment, middleware serves to mask the heterogeneous nature of the resources. Second-generation technologies include Globus [TGP] and Legion.

The second generation provided the interoperability that was required for large-scale computation. However as other aspects of the grid were explored it became apparent that it was desirable to use existing components and information resources. Third generation projects focus on service oriented architecture and metadata. There is also a strong sense of automation, such as properties to dynamic configuration, recovery and optimized use of resources.

  1. 1.3 FUNDAMENTAL CONCEPTS

  1. 1.3.1 RESOURCE SHARING

A grid is a collection of machines which act as nodes in a network. Each node takes part in contributing resources to the grid as a whole. These resources maybe utilized according to the restrictions applied by the owner.

Different types of resources that may be linked by a grid, include:

  1. 1.3.1.1 Computation

Computation is one of the principle uses of the grid. Shared processing power aims to provide users with smaller computation times. This was also a prime factor in the emergence of grid technology as many scientific problems require extremely high processing speeds in order to operate upon data generated thousand of times per second.

Exploiting the resources of the grid for better computation power can be one in three ways:

  1. Executing the application on a faster machine with a larger memory.
  2. Splitting an application’s task among many different nodes so that the job is accomplished in time inversely proportional to the number of nodes.
  3. Running a process that needs to be executed many times on multiple machines simultaneously.

Scalability is a measure of how efficiently the processors on a grid are used. If twice as many processors make an application complete the task in half the time taken previously then it is said to be perfectly scalable. However there are limits to scalability because applications cannot be split indefinitely. Some computations may be dependent on others or some tasks may not run in parallel because of certain restrictions.

  1. 1.3.1.2 Storage

A grid is also used for data storage. A Data Grid is one providing an integrated view of data storage [GCM2003]. Each node provides some quantity of storage whether permanent or temporary. Some may dedicate parts of their secondary storage for use by other machines while others just use volatile memory to store data temporarily while performing tasks for a machine.

Applications may be designed such that they execute in parallel while accessing data on only one node. This is usually done in cases where local memory of single node is not enough to hold all the data.

Other techniques include using a unifying file system. An individual file or database can be stored at many different devices but provide a uniform view to the users.

  1. 1.3.1.3 Communication

A grid is functional due to its network capabilities [GCM2003]. Bandwidth is a critical factor in determining the speed and efficiency of data communication especially in data intensive applications.

Communication includes data and message exchanges within the grid as well as outside it. The Internet and other LAN or WAN sites may be accessed by any node in the grid. Such external communication would also depend on the network design and pathways maintained in the grid.

  1. 1.3.1.4 Software Usage

Due to licensing costs or installation restrictions it may be feasible to install the software only on one machine, but access it through different machines by creating several instances. This is also an important use of grid technology.

  1. 1.3.2 VIRTUAL ORGANIZATIONS

Figure 1.1: Virtual Organizations

As semantics of grid computing were defined, the term Virtual Organization was coined in [TAG2001] to refer to the participants in a grid environment. These are usually collections of resources defining the rules related to the sharing of their resources. They can be distributed across the globe and be heterogeneous (Personal Computers, Servers, Mainframes, Supercomputers etc).

In such settings there needs to be explicit controls over authentication, authorization, resource access, resource discovery etc. These systems are governed by certain protocols, which aim for the coordination of these resources in a proper manner.

The types of virtual organizations vary from scientific and technical organizations aiming to utilize as much of the computing resources as they can in order to gain results, to enterprises accessing huge amounts of data for purposes such as data mining.

  1. 1.3.3 PARALLEL PROCESSING

Parallel Processing [TSP2002] relates to the execution of a job on multiple processors concurrently in order to save the total execution time. This simultaneous processing is the very basis of grid computing. Rather than use the resources currently available at one particular site, it utilizes the computing assets at geographically dispersed locations and carries out the task in parallel at more than one processor.

However, the study of distributed and parallel computing varies with that of grid computing. Whereas distributed computing research generally focuses on the problems faced due to the geographical separation between multiple resources, grid research focuses on addressing the problems of integration and management of software so as to enable the sharing of resources.

  1. 1.4 BASIC BLOCKS OF THE GRID

  1. 1.4.1 NETWORKS

Networks form the vital link by which the resources on the grid communicate. Typical issues [GCM2003] in a network environment with respect to grid computing include carrying capacity and reliability.

Capacity of a network is measured in terms of bandwidth. High capacity networking increases the capability of the grid to support both parallel and distributed applications. In the US grids are built on high performance networks such as national networks, which exhibit roughly 10 Gbs-1 backbone performance.

Reliability is the likelihood that the link in a network would fail in some way. For grid applications the reliability of the network is an important factor as it cannot be afforded that large amounts of data traveling on the links be dropped or lost on the way.

  1. 1.4.2 COMPUTATIONAL NODES ON THE GRID

Grid applications are more inclined towards using resources for computational needs. Thus nodes, which are themselves high performance parallel machines or clusters, are of great interest to the grid researchers. Clusters belong to the Multiple Instruction Multiple Data (MIMD) category of computer systems. They consist of whole computers with their own dedicated memory interacting via some network facility. Other grid resources include data storage devices and even fax, printers etc.

  1. 1.5 MISCONCEPTIONS ABOUT THE GRID

Some basic misconceptions about the grid, which have been cleared in [TAG2001] are:

The Grid is an alternative to the Internet

The grid is not an alternative to the Internet, rather it uses and builds upon the Internet capability to provide services and protocols for huge data and heavy computation problems. The Web largely consists of clients talking to servers individually whereas in the grid clients and servers work together and interchangeably to solve some problem.

The Grid is a source of free cycles

The grid does not provide an unlimited supply of computing power, rather restriction will be in place on more or less all resources that are shared. Resource owners would employ policies and accounting mechanisms to restrict the use of the resource accordingly.

The Grid requires new computing models

Programming in grid environment means working in a parallel domain. Hence problems will be encountered which are not obvious in sequential computers. However, the programming models remain same and the current programming contexts can be used for grid programming.

The Grid makes high-performance computers obsolete

Even though the grid makes it possible to access and harness the computing power of many resources, the need for high-performance computers would continue to increase as more and more data-intensive and high processing problems arise.


Chapter 2

GRID APPLICATIONS

The feasibility of any new technology becomes evident through the applications that exploit it. Grid applications include those from science and industry, from academia and laboratories and from large corporations. They address problems ranging from multiplayer gaming, fault diagnosis, and astronomy to real-time analysis. This chapter gives an overview of the type of applications suited to grid computing, with special emphasis on CERN’s LHC grid and the I-WAY project.

  1. 2.1 LIFE SCIENCE APPLICATIONS

Computational Biology, Bioinformatics, Genomics, Computational Neuroscience and others are included in life sciences. These areas are turning towards grid computing for accessing, collecting and mining large amounts of data. Many scientific tools have been developed that incorporate the use of the grid’s resources. These resources include super computers and clusters.

Examples of grid projects related to life science include the Protein Data Bank [PDB], the myGrid project [MGD], the Biomedical Information Research Network (BIRN) [BIR] and MCell [MGM]. An ‘in silico’ experiment is a procedure that uses computer-based information repositories and computational analysis to test a hypothesis, derive a summary, search for patters or demonstrate a known fact. The myGrid project is developing middleware to support in silico experiments in biology.

The BIRN project started in September 2001 and links instruments and federated databases. It is developing hardware, software and protocols necessary to share and mine data for both basic and clinical research. The architecture to accomplish this goal is designed around a flexible, large-scale grid model where resources are tightly integrated by grid middleware technologies, including the Globus Toolkit [TGP]. The MCell is a collaboration between computational biologists and computer scientists to deploy large-scale Monte Carlo simulations using grid technologies.

  1. 2.2 ENGINEERING ORIENTED APPLICATIONS

Large-scale science and engineering applications can be executed more efficiently by the use of the grid, which makes possible the concept of concurrent engineering.

An example of the deployment of grid infrastructure to the engineering sciences is NASA Information Power Grid (IPG) [NIP] in the United States. It is aimed to revolutionalize the way in which NASA executes large-scale science and engineering problems. It provides computing and data management services that shall, on demand, locate and schedule the multicenter resources needed to address large-scale or widely distributed problems.  

The NEESgrid is a grid-based system that supports a broad range of activities for improving the performance of buildings and other structures when subjected to the effects of earthquakes. NEESgrid integrates a range of earthquake engineering test apparatus into the grid infrastructure. The George E. Brown Network for Earthquake Engineering and Simulation (NEES) program was created in 1999. It has done major investments in earthquake engineering test facilities such as shake tables, reaction walls and wave tanks. All of these have to be network accessible so as to support broad community access to these expensive instruments. There was a need for an infrastructure to integrate test equipment, simulation, data repositories and collaboration tools. This led to the deployment of grid technologies called NEESgrid. NEESgrid builds on top of standard grid infrastructure, specifically the Globus Toolkit [TGP], augmenting it with specialized earthquake engineering tools and services [MKP].

  1. 2.3 PHYSICAL SCIENCE APPLICATIONS

CERN’s linear accelerator provides an example of grid computing utilization in the physical science area. CERN [CLH] is the European Organization for Nuclear Research. The linear accelerator [LHC] will provide huge amounts of data per second, which needs to be analyzed. Various countries, including Pakistan, are playing a vital part in this experiment by devoting their computing resources at various research centers to the analysis of this data.

Grid computing is also making its presence felt in the areas of astronomy. Virtual Observatories are just an extension of how the grid can be used in this field.

  1. Data Intensive Applications

All the application areas mentioned above may also use the Grid as a data intensive application tool to collect, store and analyze data. So, the grid will not only be used for its storage capacity characteristics but also for gaining knowledge about the data stored though techniques such as data mining. An example of a data-oriented application is Distributed Aircraft Maintenance Environment (DAME) [DAM]. DAME is an industrial application being developed in the United Kingdom in which grid technology is used to handle the gigabytes of in-flight data gathered by aircrafts and to integrate maintenance, manufacturer and analysis centers. It addresses performance issues such as large-scale data management with real-time demands.

  1. Commercial Applications

The grid is not just limited to scientific experiments; rather it is also being used for commercial purposes. These include enterprise computing areas, storage-on-demand, information-on-demand etc. The generalization of this is the concept of Application Service Providers (ASPs). Grid technologies are also being used in innovative ways in a variety of areas such as inventory control, enterprise computing, gaming (examples include The Butterfly Grid [BGM] and the Everquest Multiplayer Gaming Environment [EMG]) etc.

The growing collaboration between scientific and commercial sectors in promoting the grid will provide mutual benefits. Not only will there be revolutionary scientific advances but also a new generation of successful commercial products.

  1. 2.4 THE COMPACT MUON SOLENOID EXPERIMENT

CMS (Compact Muon Solenoid) [CLH] is a high-energy physics detector planned for the Large Hadron Collider (LHC) [LHC] at the European Center for Nuclear Research (CERN) near Geneva, Switzerland. CMS is currently under construction and is expected to be completed in 2007, at which time it will begin to record data from the highest-energy proton-proton collisions ever produced, which are known as events. Data from these collisions will help in solving many fundamental scientific issues such as the search for the Higgs particle and the origin of mass in the universe. It will help in recreating the environment present at the origin of the universe. The data will contain information from millions of elements within the detector itself, which will be used to construct the actual collision. It is expected that CMS will produce up to several petabytes of data per year [TGB2004]. Although the CMS detector will not be operational until after 2007, hundreds of physicists around the world are taking place in compute-intensive simulation studies of the detector, which will help in detector design. It is expected that data from this experiment will be analyzed by more than 2000 physicists at more than 150 universities and laboratories in 34 countries.

Grid technology has shown great promise in effectively managing large-scale problems such as this. Scientists and institutions from all over the world are participating in the CMS collaborations. The participating sites are typically organized as cluster farms with server nodes and worker nodes.

Figure 2.2: The Compact Muon Solenoid Experiment

The participants in the US-CMS Grid include the California Institute of Technology, the Fermi National Accelerator Laboratory, the University of California, San Diego, the University of Florida, and the University of Wisconsin, Madison. For a period of time, a group from CERN also joined the US-CMS Grid effort.

Table 2.1: US-CMS Grid Resources [TGB2004]

The US-CMS Grid is based on the GriPhyN Virtual Data Toolkit, which is in turn based on the Globus Toolkit [TGP] and the Condor High-Throughput Computing System, including the Condor-G job submission interface to the Globus Toolkit. MOP (Monte Carlo Production) is a Grid Adapter developed for CMS by Particle Physics Data Grid (PPDG) that sits between the job creation step and the grid middleware in the Virtual Data Toolkit and adds necessary subtasks to each job to enable it to run on the grid without modification, basically making them grid aware. MOP represented each generated job as directed acyclic graphs (DAGs).

On 8th July 2003, in Islamabad the PAEC (Pakistan Atomic Energy Commission) [PAE] signed a protocol with the European Center for Nuclear Research (CERN) and joined the allies of CERN. In order to achieve the goals set by CERN, there are six major centers setup in Pakistan. These include the PAEC-1, PAEC-2, PAEC-3, NCP, COMSATS and NUST.

  1. 2.5 I-WAY

The first modern grid is generally considered to be the Information Wide Area Year (I-WAY) developed as an experimental demonstration project for Supercomputing ’95. It was a year long effort to link existing US national testbeds based on ATM (Asynchronous Transfer Mode) to interconnect supercomputer centers, virtual reality research locations and applications development sites [OIW]. It connected seventeen different sites within North America and was used by over sixty application groups. The goal of the I-WAY project was to enable applications to use more than one supercomputer and virtual reality device. Developing software infrastructure for the I-WAY provided powerful experience for the first generation of modern grid researchers and projects.

The major part of the experiment was to develop a uniform software environment across the geographically distributed and diverse computational resources. To meet this requirement a management and application programming environment I-Soft was developed. The I-Soft system was designed to run on dedicated I-WAY point of presence (I-POP) machines deployed at each participating site. These machines provided a uniform environment for deployment of management software and also simplified security solutions by serving as a neutral zone under the joint control of I-WAY developers and local authorities. I-Soft provided a variety of services including scheduling, security, parallel programming support and a distributed file system. These services allowed a user to log on to any I-POP machine and then schedule resources on heterogeneous collections of resources without being aware of the location of these resources.

I-POP is a dedicated workstation accessible via the Internet and operating inside a site’s firewall. An ATM interface allows it to monitor and manage the site’s ATM switch. There is a site-specific implementation of a simple management interface, which allows I-WAY management systems to communicate with other machines at the site to allocate and access resources. Development, maintenance and auditing costs were reduced if all I-POP computers were of the same type and so in the I-WAY experiment Sun SPARCStations were used. In distributed and heterogeneous resources it was infeasible to replace the schedulers already in place with a single I-WAY scheduler. Instead there was a need to negotiate the scheduling of resources with the local schedulers by an independent entity. This entity was called the Computational Resource Broker (CRB) and in the limited I-WAY network one was CRB was sufficient [GCM2003].

Figure 2.3: An I-WAY Point of Presence (I-POP) Machine

Security was handled by dividing the authentication problem into two parts: authentication to the I-POP environment and authentication to the local sites. Authentication to the I-POPs was handled by using a telnet client modified to use Kerberos authentication and encryption. The scheduler software served as an ‘authentication proxy’, performing subsequent authentication to other I-WAY resources on the user’s behalf. Most sites used a privileged (root) rsh from the local I-POP to an associated resource. The rsh command is used for command execution on the network and does not require the user to input a password [HLE2001]. This method was used because of time constraints and acceptable only because the local site administered the local I-POP and the rsh request was sent to a local resource over a secure local network.

The I-WAY project provided an opportunity to deploy and study the solutions to problems in a grid-like environment such as those related to resource naming and allocation, authentication, coordination and integrity management. However, because of the relatively moderate number of users (few hundred) and participating sites (around 20) the issue of scalability was, to a large extent, ignored [GCM2003]. Also, a more sophisticated resource description language and scheduling framework was required. Regarding I-WAY security, root rsh is an unacceptable long-term solution. A more fundamental limitation was that each user had to have an account at each site to which access was required, which was not a scalable solution. There also needs to be formal representations of conditions of use, as well as mechanisms for representing transitive relationships. Another difficulty was that while resource database entries were generated automatically by the scheduler, the information contained in these entries (for e.g. network interfaces) had to be provided manually by the I-Soft team. The discovery, entry and maintenance of this information proved to be a significant source of overhead, particularly in an environment in which network status was changing rapidly. Clearly, this information should be discovered automatically. For example, a tool should use dedicated ATM links if available but automatically fall back on shared Internet if they become unavailable. Another limitation in the I-WAY project was lack of distributed file system support.

Join now!

Chapter 3

THE GRID ARCHITECTURE

The grid is an emerging technology. Standards for its various operations are still being defined. In order to understand and contribute to the grid revolution, there is a need to understand its architecture and services. Section 3.1 of this chapter deals with the Layered Grid Architecture whose comparison with the Internet Model is given in section 3.2. Sections 3.3 to 3.6 deal with the Service Oriented Architecture defined by the OGSA, while some types of grids are described in the last section.

  1. 3.1 THE LAYERED GRID ARCHITECTURE

The ...

This is a preview of the whole essay