Information Management for microarray experimental data

Supawan Prompramote

Yi-Ping Phoebe Chen

Frederic Maire

Centre for Information Technology Innovation

Faculty of Information Technology

Queensland University of Technology

Abstract: Today, proficiency in generating microarray data is fast overcoming the capacity for storing and analysing this data. Although there are some existing microarray databases, they have their own storage structure and implementation. In addition, those proposed databases might use the different terminologies to describe the same domain or concepts. Asa result, these could lead to a limitation in the sharing of data with other laboratoriesand the combination of other experimental results. Here, we proposed the integrated information management architecture for microarray experimental data. Unlike the past work on database interoperation in the bioinformatics community, this database design will take into account the important issues in microarray data integration including a lack of a common shared microarray-ontology and having dynamic data representation of the microarray data sources. Copyright © 2003 IFAC.

A living organism function is associated with thousands of genes and their products (RNA and proteins) to create the mystery of life. Even though most cells in a human body contain the same genes, not all of these genes are used in each cell. Some genes are expressed when they are needed. Many genes are used to specify features unique to each type of cell; for example, liver cells express genes for enzymes that detoxify poisons. To find how each cell achieves such uniqueness, scientists need to discover a way to identify which genes each type of cell possesses (MuhIrad, 2001).

Traditionally, one molecular biology experiment is based on one gene at a time; this is a limitation of obtaining the whole picture of gene function.

Fig.1. Architecture of the Information Management for Microarray Experimental

The Mediator class is a particular module for resolving problems that arise due to adding a new microarray data source to a system. It consists of two parts, the Mediator interface and the Transformation call.

The Transformation call is an important part in incorporating a new data source. The DBA is required to describe the data source, to map source attributes to corresponding global schema attributes, and to convert between different representations of the same characteristic. Once the data transformation has been performed, the Mediator Interface will be created into a new microarray data source.

  1. Example of schema integration

Consider the representations shown in Fig 2(a) and (b). They both include Sample, Experimental sample, Treatment, and Researcher; although occasionally called by different names. The first one also contains Strains, while the second includes Labels, Hybridization condition, Control Gene, and Experimental Control Gene. If these concepts are overlaid, the resulting composite representation is shown in Fig 2(c). While this is a reasonable representation of the concept, problems may arise in practice due to the implicit relationship between the attributes from different data sources. This type of issue is common in both business and scientific domains. The important distinction is that, while in business there is a single correct value, this is not always the case in scientific domains. Here, we will use intelligent techniques for the extraction and integration of heterogeneous information to resolve those problems. The details of those techniques are explained below.


The matching approach to our schema integration system can be described in the following phases.

Evaluation of schema class affinity. This step is to evaluate the level of affinity between schema classes for subsequent integration.

