• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Data Management: Past, Present, and Future

Extracts from this document...


Data Management: Past, Present, and Future Jim Gray Microsoft Research June 1996 Technical Report MSR-TR-96-18 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 This paper appeared in IEEE Computer 29(10): 38-46 (1996) IEEE: (c) 1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Data Management: Past, Present, and Future Jim Gray, Microsoft Research, 301 Howard St. San Francisco, CA 94105, 415-778-8222 Gray@Microsoft.com.DRAFT Abstract: Soon most information will be available at your fingertips, anytime, anywhere. Rapid advances in storage, communications, and processing allow us move all information into Cyberspace. Software to define, search, and visualize online information is also a key to creating and accessing online information. This article traces the evolution of data management systems and outlines current trends. Data management systems began by automating traditional tasks: recording transactions in business, science, and commerce. This data consisted primarily of numbers and character strings. Today these systems provide the infrastructure for much of our society, allowing fast, reliable, secure, and automatic access to data distributed throughout the world. Increasingly these systems automatically design and manage access to the data. The next steps are to automate access to richer forms of data: images, sound, video, maps, and other media. A second major challenge is automatically summarizing and abstracting data in anticipation of user requests. These multi-media databases and tools to access them will be a cornerstone of our move to Cyberspace. 1. Introduction And Overview Figure 1: The six generations of data management, evolving from manual methods, through several stages of automated data management. Computers can now store all forms of information: records, documents, images, sound recordings, videos, scientific data, and many new data formats. ...read more.


Rather than implicitly storing the relationship between flights and trips, a relational system explicitly stores each flight-trip pair as a record in the database. This is the "Segment" table in Figure 2.d. To find all segments reserved for customer Jones going to San Francisco, one would write the SQL query: Select Flight# From City, Flight, Segment, Trip, Customer Where Flight.to = "SF" AND Flight.flight# = Segment.flight# AND Segment.trip# = trip.trip# AND trip.customer# = customer.customer# AND customer.name = "Jones" The English equivalent of this SQL query is: "Find the flight numbers for flights to San Francisco which are a segment of a trip booked by any customer named "Jones." Combine the City, Flight, Segment, Tip, and Customer tables to find this flight." This program may seem complex, but it is vastly simpler than the corresponding navigational program. Given this non-procedural query, the relational database system automatically finds the best way to match up records in the City, Flight, Segment, Trip, and Customer tables. The query does not depend on which relationships are defined. It will continue to work even after the database is logically reorganized. Consequently, it has much better data independence than a navigational query based on the network data model. In addition to improving data independence, relational programs are often five or ten times simpler than the corresponding navigational program. Inspired by Codd's ideas, researchers in academe and industry experimented throughout the 1970's with this new approach to structuring and accessing databases promising dramatically easier data modeling and application programming. The many relational prototypes developed during this period converged on a common model and language. Work at IBM Research led by Ted Codd, Raymond Boyce, and Don Chamberlin and work at UC Berkeley led by Michael Stonebraker gave rise to a language called SQL. This language was first standardized in 1985. There have been two major additions to the standard since then [5], [6]. ...read more.


What are the rights and responsibility of people operating in Cyberspace? Our grandchildren will probably still be wrestling with these societal issues 50 years hence. The technical challenges are more tractable. There is broad consensus within the database community on the main challenges and a research agenda to attach those problems. Every five years, the database community does a self-assessment that outlines this agenda. The most recent self-assessment, called the Lagunita II report [8], emphasizes the following challenges: * Defining the data models for new types (e.g., spatial, temporal, image, ...) and integrating them with the traditional database systems. * Scaling databases in size (to petabytes), space (distributed), and diversity (heterogeneous). * Automatically discovering data trends, patterns, and anomalies (data mining, data analysis). * Integrating (combining) data from multiple sources. * Scripting and managing the flow of work (process) and data in human organizations. * Automating database design and administration. These are challenging problems. Solving them will open up new applications for computers both for organizations and for individuals. These systems will allow us to access and analyze all information from anywhere at any time. This easy access to information will transform the way we do science, the way we manage our businesses, the way we learn, and the way we play. It will both enrich and empower us and future generations. Perhaps the most challenging problem is understanding the data. There is little question that most data will be online - both because it is inexpensive to store the data in computers and because it is convenient to store it in computers. Organizing these huge data archives so that people can easily find the information they need is the real challenge we face. Finding patterns, trends, anomalies, and relevant information from a large database is one of the most exciting new areas of data mangement [7]. Indeed, my hope is that computers will be able to condense and summarize information for us so that we will be spared the drudgery searching through irrelevant data for the nuggets we seek. The solution to this will require contributions from many disciplines. ...read more.

The above preview is unformatted text

This student written piece of work is one of many that can be found in our AS and A Level Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related AS and A Level Computer Science essays

  1. Peer reviewed

    information systems assignment 2

    4 star(s)

    wizard is even available to help the user if they get into a complicated situation which they do not understand. cost spreadsheet has no added cost for the supermarket as it appears on all computers which are bought with Microsoft.

  2. Business blue print document for the implementation of SAP R/3 (4.6B) payroll at Mastek ...

    After the remuneration payroll you can carry out various subsequent activities, for example, you can see to remuneration payment or the creation of various lists, and make evaluations. The Gross Payroll component processes employee master data from SAP Personnel Administration and generates wage types and amounts to be used by

  1. Control Unit, Memory Unit, and Arithmetic Logic Unit. The CPU or Central Processing ...

    give sensible suggestions for each of the four areas of peripherals (communication, input, output and storage). In other words the mark will not be for a keyboard or a mouse, but for suggesting sensible methods of input to the system.

  2. Hardware and Software

    It can reduce a file's size by as much as 96%. It is used when 24-bit colour are used and mostly when it is for Internet distribution. GIF (Graphic Interchange Format) It is one of the two most common file formats for graphic images on the World Wide Web.

  1. Explain sequence, selection and iteration as used in computer programming; outline the benefits of ...

    In .NET, it is 4 bytes (32 bits) (Wikibooks, 2011), while in Java it is 1 bit (1/8th of a byte) (Mark Fishpool, 2011). While a Boolean at its most basic is only 1 bit (true or false; 0 or 1), .NET adds provides this extra padding to make it more efficient for a 32bit CPU to read, by sending it as 32bits of data.

  2. Review of the article "Supply Chain Analysis at Volkswagen of America".

    If the output location scenario had changed, they ran the simulation using the new location policy as input. Even though, the team "could not guarantee convergence in general, this procedure resulted in fairly quick convergence in their computational experiments." Most of the time they reached a final location scenario between

  1. Today's competitive global business environment, understanding and managing enterprise wide information is crucial for ...

    product integration * Data Modelling with ability to model star-schema and multi-dimensionality * Extraction and Transformation/propagation tools to load the data warehouse * Data warehouse database server * Analysis/end-user tools: OLAP/multidimensional analysis, Report and query * Tools to manage information about the warehouse (Metadata)

  2. Different ways of data capture

    marks made in a pre set position on a form e.g multiple choice exams, lottery tickets. 10. Bar Code Reader (Scanner) - 4 Main pieces of info on a bar code are: *First 2 digits indicate country product is registered e.g UK = 50 *Next 5 represent manufacturers code.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work