Data Management: Past, Present, and Future

Authors Avatar

Data Management: Past, Present, and Future

Jim Gray

Microsoft Research

June 1996

Technical Report

MSR-TR-96-18

Microsoft Research

Microsoft Corporation

One Microsoft Way

Redmond, WA  98052

This paper appeared in  IEEE Computer 29(10): 38-46 (1996)

IEEE: © 1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.


Data Management: Past, Present, and Future

Jim Gray,

Microsoft Research,

301 Howard St.

 San Francisco, CA 94105,

415-778-8222

[email protected]

Abstract: Soon most information will be available at your fingertips, anytime, anywhere.  Rapid advances in storage, communications, and processing allow us move all information into Cyberspace.  Software to define, search, and visualize online information is also a key to creating and accessing online information.  This article traces the evolution of data management systems and outlines current trends. Data management systems began by automating traditional tasks: recording transactions in business, science, and commerce. This data consisted primarily of numbers and character strings. Today these systems provide the infrastructure for much of our society, allowing fast, reliable, secure, and automatic access to data distributed throughout the world.  Increasingly these systems automatically design and manage access to the data. The next steps are to automate access to richer forms of data: images, sound, video, maps, and other media.  A second major challenge is automatically summarizing and abstracting data in anticipation of user requests. These multi-media databases and tools to access them will be a cornerstone of our move to Cyberspace.

1. Introduction And Overview

Figure 1: The six generations of data management, evolving from manual methods, through several stages of automated data management.

Computers can now store all forms of information: records, documents, images, sound recordings, videos, scientific data, and many new data formats.  We have made great strides in capturing, storing, managing, analyzing, and visualizing this data.  These tasks are generically called data management.  This paper sketches the evolution of data management systems describing six generations of data managers shown in Figure 1. The article then outlines current trends,

Data management systems typically store huge quantities of data representing the historical records of an organization.  These databases grow by accretion.  It is important that the old data and applications continue to work as new data and applications are added.  The systems are in constant change.  Indeed, most of the larger database systems in operation today were designed several decades ago and have evolved with technology.  A historical perspective helps to understand current systems.

There have been six distinct phases in data management.  Initially, data was manually processed.  The next step used punched-card equipment and electro-mechanical machines to sort and tabulate millions of records. The third phase stored data on magnetic tape and used stored program computers to perform batch processing on sequential files.  The fourth phase introduced the concept of a database schema and online navigational access to the data.  The fifth step automated access to relational databases and added distributed and client-server processing.  We are now in the early stages of sixth generation systems that store richer data types, notably documents, images, voice, and video data.  These sixth generation systems are the storage engines for the emerging Internet and Intranets.

2. Historical perspective: The Six Generations of Data Management

2.0. Zeroth generation: Record Managers 4000BC -1900

The first known writing describes the royal assets and taxes in Sumeria.  Record keeping has a long history.  The next six thousand years saw a technological evolution from clay tablets to papyrus to parchment and then to paper.  There were many innovations in data representation: phonetic alphabets, novels, ledgers, libraries, paper and the printing press.  These were great advances, but the information processing in this era was manual.  (Note to editor: it would be nice to have a photo of a Sumarian tablet or a Hollerith machine here. U. Penn has a good collection of photos of Sumerian tablets.)

2.1. First Generation: Record Managers 1900 -1955

The first practical automated information processing began circa 1800 with the Jacquard Loom that produced fabric from patterns represented by punched cards.  Player pianos later used similar technology.  In 1890, Hollerith used punched card technology to perform the US census. His system had a record for each household. Each data record was represented as binary patterns on a punched card.  Machines tabulated counts for blocks, census tracts, Congressional Districts, and States.  Hollerith formed a company to produce equipment that recorded data on cards, sorted, and tabulate the cards [1].  Hollerith’s business eventually became International Business Machines.  This small company, IBM, prospered as it supplied unit-record equipment for business and government between 1915 and 1960.

By 1955, many companies had entire floors dedicated to storing punched cards, much as the Sumerian archives had stored clay tablets.  Other floors contained banks of card punches, sorters, and tabulators.  These machines were programmed by rewiring control panels (patch-boards) that managed some accumulator registers, and that selectively reproduced cards onto other cards or onto paper.  Large companies were processing and generating millions of records each night.  This would have been impossible with manual techniques.  Still, it was clearly time for a new technology to replace punched cards and electro-mechanical computers.

2.2. Second Generation: Programmed Unit Record Equipment 1955-1970

Stored program electronic computers had been developed in the 1940’s and early 1950’s for scientific and numerical calculations.  At about the same time, Univac had developed a magnetic tape that could store as much information as ten thousand cards: giving huge improvements in space, time, convenience, and reliability.  The 1951 delivery of the UNIVAC1 to the Census Bureau echoed the development of punched card equipment.  These new computers could process hundreds of records per second, and they could fit in a fraction of the space occupied by the unit-record equipment.

Software was a key component of this new technology.  It made them relatively easy to program and use.  It was much easier to sort, analyze, and process the data with languages like COBOL and RPG.  Indeed, standard packages began to emerge for common business applications like general-ledger, payroll, inventory control, subscription management, banking, and document libraries.

The response to these new technologies was predictable. Large businesses recorded even more information, and demanded faster and faster equipment.  As prices declined, even medium-sized businesses began to capture transactions on cards and use a computer to process the cards against a tape-based master file.  

Join now!

The software of the day provided a file-oriented record processing model.  Typical programs sequentially read several input files and produced new files as output.  COBOL and several other programming languages were designed to make it easy to define these record-oriented sequential tasks.  Operating systems provided the file abstraction to store these records, a job control language to run the jobs, and a job scheduler to manage the workflow.

Batch transaction processing systems captured transactions on cards or tape and collected them in a batch for later processing. Once a day these transaction batches were sorted. The sorted transactions were merged with ...

This is a preview of the whole essay