Information Management for microarray experimental data

Authors Avatar

INFORMATION MANAGEMENT FOR MICROARRAY EXPERIMENTAL DATA

Supawan Prompramote

Yi-Ping Phoebe Chen

Frederic Maire

Centre for Information Technology Innovation

Faculty of Information Technology

Queensland University of Technology

Abstract: Today, proficiency in generating microarray data is fast overcoming the capacity for storing and analysing this data. Although there are some existing microarray databases, they have their own storage structure and implementation. In addition, those proposed databases might use the different terminologies to describe the same domain or concepts. As a result, these could lead to a limitation in the sharing of data with other laboratories and the combination of other experimental results. Here, we proposed the integrated information management architecture for microarray experimental data. Unlike the past work on database interoperation in the bioinformatics community, this database design will take into account the important issues in microarray data integration including a lack of a common shared microarray-ontology and having dynamic data representation of the microarray data sources. Copyright © 2003 IFAC.

Keywords:    Database interoperation, Schema matching, Greedy algorithm,

   Microarray.

1   INTRODUCTION

A living organism function is associated with thousands of genes and their products (RNA and proteins) to create the mystery of life. Even though most cells in a human body contain the same genes, not all of these genes are used in each cell. Some genes are expressed when they are needed. Many genes are used to specify features unique to each type of cell; for example, liver cells express genes for enzymes that detoxify poisons. To find how each cell achieves such uniqueness, scientists need to discover a way to identify which genes each type of cell possesses (MuhIrad, 2001).

Traditionally, one molecular biology experiment is based on one gene at a time; this is a limitation of obtaining the whole picture of gene function. The advent of a DNA array technology during the last few years allows researchers to gain a greater picture of the interactions among thousands of genes simultaneously. It also allows the researchers to look at many genes at once and determine which genes are expressed in a particular cell type. There are two major application forms for microarray technology: identification of sequence (gene/gene mutation) and determination of expression level (abundance) of genes. These forms will lead to new insight into fundamental biological problems such as gene discovery, gene regulation, disease diagnosis, as well as drug discovery and toxicology (Altruis, 2002; Muhlrad, 2001).

An experiment, typically, requires tens or hundreds of microarrays, where a single microarray will generate between 100,000 and a million pieces of data (Murphy, 2002). The organization of this huge-volume of data produced by microarray techniques is one of the biggest challenges that scientists and bioinformatics have yet faced. To design a microarray database, the large amount of data is not the only major difficult, with other unique characteristics such as complexity, dynamic data representation, and lack of standard nomenclatures, each causing additional problems.

There are a limited number of efficient, publicly available tools for storing microarray data. Existing relevant public DNA microarray databases each have their own storage structure and implementation, with differences

in hardware platforms, DBMS, data models and data languages. In addition, these databases are created by different developers, and unavoidably might use the different definitions and terms to describe the same domain or concept (because of the lack of a common shared microarray-ontology). In contrast, those developers might use a definition or term to have a different meaning. As a result, this could lead to a limitation in the sharing of data with other laboratories and in the combining with other experimental results (Bergamaschi, et al., 1998).

Fortunately, many of these types of issues have been already addressed research in fields outside the life sciences, particularly in the realm of commercial business. One successful strategy that has been applied to elucidate these issues is database integration. In this, we are taking advantage of the efficient and powerful database interoperation approaches that have been developed over the past decade for business applications, and we tailor them to the needs of microarray research. We believe that by looking to sources outside of the biological sciences, and taking advantage of existing methods and resources, that a microarray data management system that allows users to interact with a set of heterogeneous databases as seamlessly as they interact with each individual database, can be established. The word “interact” in this paper denotes general browsing, seeking of information about particular objects, and performing complex queries. This will not be easy, rather more difficult for microarray databases than for business sources, due to the unique characteristics of microarray experimental data.

Join now!

In this paper, we propose to investigate suitable methodologies and tools in the area of data management and analysis to develop a well-defined storage for DNA microarray data, that will allow sharing microarray experimental results between laboratories by linking related data from different public microarray sources, integrating them to provide a consistent view of data to the users, and resolve the important issues in microarray data integration such as:

  • A lack of a common shared microarray-ontology, which leads to naming conflicts when different names are employed to represent the same information.
  • Since microarray technology is in ...

This is a preview of the whole essay