• Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

Information Management for microarray experimental data

Extracts from this document...



Supawan Prompramote

Yi-Ping Phoebe Chen

Frederic Maire

Centre for Information Technology Innovation

Faculty of Information Technology

Queensland University of Technology

Abstract: Today, proficiency in generating microarray data is fast overcoming the capacity for storing and analysing this data. Although there are some existing microarray databases, they have their own storage structure and implementation. In addition, those proposed databases might use the different terminologies to describe the same domain or concepts. Asa result, these could lead to a limitation in the sharing of data with other laboratoriesand the combination of other experimental results. Here, we proposed the integrated information management architecture for microarray experimental data. Unlike the past work on database interoperation in the bioinformatics community, this database design will take into account the important issues in microarray data integration including a lack of a common shared microarray-ontology and having dynamic data representation of the microarray data sources. Copyright © 2003 IFAC.

Keywords:    Database interoperation, Schema matching, Greedy algorithm,



A living organism function is associated with thousands of genes and their products (RNA and proteins) to create the mystery of life. Even though most cells in a human body contain the same genes, not all of these genes are used in each cell. Some genes are expressed when they are needed. Many genes are used to specify features unique to each type of cell; for example, liver cells express genes for enzymes that detoxify poisons. To find how each cell achieves such uniqueness, scientists need to discover a way to identify which genes each type of cell possesses (MuhIrad, 2001).

Traditionally, one molecular biology experiment is based on one gene at a time; this is a limitation of obtaining the whole picture of gene function.

...read more.



Fig.1. Architecture of the Information Management for Microarray Experimental

The Mediator class is a particular module for resolving problems that arise due to adding a new microarray data source to a system. It consists of two parts, the Mediator interface and the Transformation call.

The Transformation call is an important part in incorporating a new data source. The DBA is required to describe the data source, to map source attributes to corresponding global schema attributes, and to convert between different representations of the same characteristic. Once the data transformation has been performed, the Mediator Interface will be created into a new microarray data source.

  1. Example of schema integration

Consider the representations shown in Fig 2(a) and (b). They both include Sample, Experimental sample, Treatment, and Researcher; although occasionally called by different names. The first one also contains Strains, while the second includes Labels, Hybridization condition, Control Gene, and Experimental Control Gene. If these concepts are overlaid, the resulting composite representation is shown in Fig 2(c). While this is a reasonable representation of the concept, problems may arise in practice due to the implicit relationship between the attributes from different data sources. This type of issue is common in both business and scientific domains. The important distinction is that, while in business there is a single correct value, this is not always the case in scientific domains. Here, we will use intelligent techniques for the extraction and integration of heterogeneous information to resolve those problems. The details of those techniques are explained below.


The matching approach to our schema integration system can be described in the following phases.

Evaluation of schema class affinity. This step is to evaluate the level of affinity between schema classes for subsequent integration.

...read more.


Computer Applications in the Biosciences, 9(1): 49-57.

Benson, D. A., M. Boguski, D. J. Lipman, and J. Ostell (1994). Genbank. Nucleic Acids Research,22:3441-3444.

Chen, A., and V. Markowitz (1995). An overview of the object protocol model (OPM) and the OPM data management tools. Inform. Syst., 20(5).

Garcia-Molina, H., J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom (1995). Integrating accessing heterogeneous information sources in TSIMMIS. Proc. AAAI Symp. Information Gathering, Stanford, CA, pp. 61-64.

Ian Parberry (1995). Problems on algorithms, chapter 9.  Englewood Cliffs, N.J: Prentice Hall.

Kemp, G., and P. Gray (1996). Using the Functional Data Model to Integrate Distributed Biological Data Sources. In P. Svensson and J. French, editors, Proc. SSDBM: 176-185. IEEE Press.

Overton, G. C., S. B. Davidson, and P. Buneman (1997). Database transformations for biological applications. In DOE HGP Contractor-Grantee Workshop VI  Santa Fe, NM.

Shin, D. G., et al., (1997). Graphical ad hoc query interfaces for Federated Genome database, Computer Sc. & Eng. U of Connecticut. In Storrs CT DOE HGP Contractor-Grantee Workshop VI,  Santa Fe, NM.

Bergamaschi, S., S. Castano, S. De Capitani di Vimercati, S. Montanari, M. Vincini (1998). An Intelligent Approach to Information Integration.  In International Conference on Formal Ontology in Information Systems (FOIS'98), Trento, Italy.

Paton, N.W., R. Stevens, P. Baker, C. A. Goble, S. Bechhofer, and A. Brass (1999). Query Processing in the TAMBIS Bioinformatics Source Integration System. In Proc. SSDBM: 138-147. IEEE Press.

Brazma, A.,  A. Robinson, G. Cameron, and M. Ashburner (2000). One-stop shop for microarray data. Nature, 403: 699-700.

Critchlow, T., K. Fidelis, M. Ganesh, R. Musick, and T. Slezak (2000). IEEE Transactions on Information Technology in Biomedicine, 4(1): 52-57.

Paul Muhlrad (2001). DNA microarry technology to identify genes controlling spermatogenesis, Available from http://www.mcb.arizona.edu/

wardlab/microarray.html, accessed on 27-August-2002.

Altruis Biomedical Network (2002). The Web's Premier Site For DNA Arrays, Available from http://www.dna-arrays.com, accessed on 27-August-2002.

David Murphy (2002). Gene Expression studies using microarrays: principles, problems, and prospects. Advances in physiology education: 26(4).

...read more.

This student written piece of work is one of many that can be found in our University Degree Computer Science section.

Found what you're looking for?

  • Start learning 29% faster today
  • 150,000+ documents available
  • Just £6.99 a month

Not the one? Search for your essay title...
  • Join over 1.2 million students every month
  • Accelerate your learning by 29%
  • Unlimited access from just £6.99 per month

See related essaysSee related essays

Related University Degree Computer Science essays

  1. Marked by a teacher

    UK Copyright and File Sharing

    5 star(s)

    This leads us to believe that it is likely that a significant amount of music that users listen to is obtained illegally from the internet. However, this source may not be entirely reliable in the sense that the users who had to upload their statistics had to extract their own

  2. Information systems development literature review. Since the 1960s Methodologies, Frameworks, Approaches and CASE ...

    Weaver, P et al. (2002). Relationships indicating how each entity relates to another located in appendix F. The sixth and final step involved the creation of an Entity Life History (ELH) diagram documenting "...all of the events that can affect an entity.

  1. Lifecycle Management Of Information Technology Project In Construction

    in work ta?k? and ?kill ?et? of project participant?. Aim? The main aim of re?earch i? to identify and inve?tigate implementation of Lifecycle Management of Information Technology Project? In Con?truction?. Objective? The above aim? will be achieved by following number of objective?. 1. Literature review and ?earch of previou?

  2. The development of Easy office project portfolio management system

    /></td> <td width="110" id="sidebar" class="smallText"> <p><br/><a href="departments.php">Staff<img src="stuff.jpg" alt="image 1" width="110" height="110" border="0" vspace="6" /></a></p> <p> <a href="projects.php">Projects <img src="project.jpg" alt="image 2" width="110" height="110" border="0" vspace="6" /><br /> </a><br /></p> <br />  <br />  <br />  <br /> </td> <td width="45"> </td> </tr> </table> <br /> </td> <td width="168"> </td> </tr> <tr>

  1. Risk Management and Assessment for IT Projects.

    The TenStep Project Management Process(tm) is designed to be applicable to all projects, regardless of the project life-cycle methodology used. 3. The project is at higher risk of failure without active participation from the client. 4. Project managers must have a sufficient level of authority to be successful. 10.

  2. Develop a Puzzle Website for users of three different age groups, Kids, Teenagers and ...

    Changes are made to the application or site based on the findings of the usability tests. Whether the test is formal or informal, usability test participants are encouraged to think aloud and voice their every opinion. Usability testing is best used in conjunction with user-centred design, a method by which

  1. The project explains various algorithms that are exercised to recognize the characters present on ...

    INTRODUCTION 1 1.1 Introduction to License Plate Recognition System 1 1.2 Purpose of the Project 1 1.3 Significance of the Project 2 1.4 Organization of Report 2 2. BASIC INTRODUCTION TO DIGITAL IMAGING 4 2.1 Digital Imaging 4 2.2 The RGB Color Space 4 2.3 YUV Color Space 5 2.4 YCrCb (or YCbCr)

  2. STAVIES: A System for Information Extraction from unknown Web Data Sources through Automatic Web ...

    For a human this is routine, time-consuming and tiresome. Instead, being able to have this information ready for use (e-mail or sms) saves precious time and effort. Another scenario includes activities like data mining which require a vast amount of available information for statistical and training purposes.

  • Over 160,000 pieces
    of student written work
  • Annotated by
    experienced teachers
  • Ideas and feedback to
    improve your own work