The origins, applications and current research trends in Cheminformatics.

University Degree Mathematical and Computer Sciences

Sandeep Sanghera

MSc Cheminformatics

Cheminformatics Applications

Helen Cooke

Deadline: 8th December 2003

The origins, applications and current research trends in Cheminformatics

1.0 Introduction

The aim of this investigation is to examine the origins and roots of Cheminformatics, and show how the literature of the subject has evolved since 1996 and convey the current areas of research activity.

The terms “cheminformatics,” “chemoinformatics,” “chemi-informatics,” and “chemical information science” are all used to describe a range of computer techniques and applications to solve chemistry problems. The first definition of Chemoinformatics was given by Frank Brown: “The use of information technology and management has become a crucial part of drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization”

2.0 Roots of Cheminformatics

In 1990 a project was designed to grow together with the Internet and the Network Information Retrieval (NIR) along with Tools and Services. The primary goals of this project were;

To show information and data that can be used and spread between Chemists but not other related fields,
That the information would go to the usual interested parties, as well as their companies and contacts, and
To work together for the acquisition of knowledge to deliver scientific topics this would be of interest to both the present and future generation of Chemists to come.

The project took six years to develop and this lead to the start of a pilot Cheminformatics WWW Server in 1996. This server was especially made for Chemists to use; it also had information on other relating fields but to a lesser extent. The information retrieval in Chemistry used a nomenclature and organisation that is easily recognised by the user for investigations. This offered a new highly specialized scientific environment for information retrieval in Cyberspace.

So basically Cheminformatics helps chemists investigate new problems and organize and analyze scientific data to develop novel compounds, materials, and processes through the application of information technology. The data can be accessed via printed sources or in computer form (which can be accessed via databases or via the web).

Searching databases, ability to program in C++ or perl, designing drugs using molecular modelling packages and other computer-based techniques is what is being taught to undergraduates and graduates. Many companies are looking for employees with knowledge in Cheminformatics.

2.1 Is it chemo/chem–informatics?

Looking for *informatics, etc. in the Literature, March 2000

Then the same search in, 31 July 2003

So it appears that Cheminformatics is the most widely used name for this subject and so it shall be used from now on in this review.

2.2 Cheminformatics and its applications

As stated earlier Cheminformatics has become a crucial part of the drug discovery.

The application of Cheminformatics in the drug industry makes use of computers to analyze the interactions between the drug and the receptor sites. Computers can also be used to simulate the molecule in 3D to show it has optimal fit with the receptor site. Once these molecules are developed, libraries of compounds are screened for activity using High Throughput Screening (HTS). The hits are then evaluated for binding, potency, selectivity, and functional activity. In Chemoinformatics there are really only two primary questions:

what to test next and
what to make next.

The main processes involved within drug discovery are lead identification, where a lead is something that has activity in the low micro-molar range, and lead optimization, which is the process of transforming a lead into a drug candidate

Currently Cheminformatics is seeking to optimise the potency, efficacy, selectivity of drugs and so give better HTS results.

The methods and tools that Cheminformatics can offer for drug discovery are:

Structure/Activity Relationships
Genetic Algorithms
Statistical Tools (e.g., recursive pairing)
Data Analysis Tools
Visualization (seeing the molecule in 3D)

2.3 The use of Computers

Cheminformatics uses computers that are able to store and give access to vast amounts of information. Chemical information publishers saw what computer technology could do ...

This is a preview of the whole essay

Currently Cheminformatics is seeking to optimise the potency, efficacy, selectivity of drugs and so give better HTS results.

The methods and tools that Cheminformatics can offer for drug discovery are:

Structure/Activity Relationships
Genetic Algorithms
Statistical Tools (e.g., recursive pairing)
Data Analysis Tools
Visualization (seeing the molecule in 3D)

2.3 The use of Computers

Cheminformatics uses computers that are able to store and give access to vast amounts of information. Chemical information publishers saw what computer technology could do for retrieval of data. In the 1960’s computers were used initially with the intent to take all the printed sources and store them into the computer. The technology now provides the “search functionality” in order to search not only text but the structure diagrams (via substructure searches) and reactions which most of the chemical literature at the present time.

Using computers to assist Cheminformatics led to the development of gradually more sophisticated Chemical-information retrieval systems such as the Chemical Abstracts Registry File (http://), which has been searchable since the 1970’s and the Cambridge Structural database (). The first desktop systems were produced by MDL Information Systems.

2.3.2 Cheminformatics on the internet

Using a search engine like Google (www.google.co.uk) to access chemical information is not that reliable (might get a high recall but of low precision) these days. However, there are sites that can be used to access chemical information which is referred to by professionals. The internet has fundamentally changed the way we access and use chemical information. The use of internal internets called intranets has become the busy infra-structure of corporate and commercial databases. WWW servers such as www.acs.org can be used to get bibliographical data concerning different aspects of chemistry.

2.4 Types of chemical information Cheminformatics covers

Cheminformatics mainly deals with primary and secondary sources of literature. The primary sources refer to journals patents etc. The secondary sources is where these primary sources are indexed i.e. databases.

The literature contains bibliographic data amongst chemical and physical properties.

Most of the data stored will be generated in-house. Other information is available from public sources on the internet. However, there remains a desperate need for high precision data which is organised in an appropriate manner.

For some subjects, it may be the case that this data is hard to find, or not generally available in the public domain. For others, the problem may be that the data is not well indexed or qualified so that it is impossible to extract the information required.

Commercial software vendors, academic, and government institutions thus put considerable efforts into systematically gathering, indexing, and publishing databases of useful chemical information.

The Databases mainly used in academia are CrossFire Beilstien () with coverage of literature back to 1771, and Chemical Abstracts Service’s SciFinder with literature form 1907. Also Web of knowledge amongst several hindered www servers such as RSC and ACS giving access to their literature. These services became available in the mid 1990’s,

3.0 A case study showing CrossFire Beilstein development from 1998 – 2003

To see whether there is any trends in the Cheminformatics literature a case study to see how the database(s) above for the last seven years was conducted. Here are the results.

3.1 Citations per publication year analysis

Table showing the rise of citations recorded in CrossFire Beilstein.

A Graph showing that the literature in CrossFire Beilstein is increasing gradually.

This shows that since there is more and more literature being produced every year more Cheminformatics is needed to sort and analyse this growth in literature.

4.0 Growth of Cheminformatics Data

The amount of Cheminformatics data continues, so does the desire for a "magic bullet", some software or methodology that would make the valuable data visible. That is one of the main areas of growth with Cheminformatics.

Current high-throughput discovery methods are overwhelming for conventional Cheminformatics systems. So developments for storing and representing chemical structures and chemical libraries, and new methodologies for accessing and analyzing these structures are available from the desktop. This is all thanks to data mining and analysing methods that were not available prior to 2001. This could be the “magic bullet” that is required to obtain the holy grail of Cheminformatics data.

4.1 Databases and Management

Databases have been created to store chemical structure data (usually in the form of 2D chemical sketches) and related properties and then be able to retrieve and use this data effectively. This requires the creation of "chemically intelligent" tools to, for example, search all of the structures in a database for those which contain a particular chemical fragment that may have been sketched in by the user.

Researchers may wish to store the data and information they create themselves, to access publicly available data sources, or to use proprietary data from within their organization. The need to mix and match these different types of data requires that systems are as open as possible. In fact, research organizations are increasingly realizing that chemical data should not be isolated from other data types -- it should be integrated into the overall corporate data architecture.

Effective Cheminformatics systems not only provide access to data and information they help individual researchers and organizations to use it.

Tools are created to analyze chemical data and extract the information required to help researchers make good decisions. These solutions help researchers to solve specific problems in their R&D and production processes.

Cheminformatics also makes use of the flow of data and information through these processes. To allow workflow tools are created to ensure that the relevant data is gathered, organized, and indexed, that it is updated and annotated as the process proceeds, and that it is made available to the people who need it at each stage in the process.

As well as providing tailored decision support and workflow solutions, Cheminformatics technology can be embedded in standard desktop post processing tools to extend the benefits. A simple example is in enabling Microsoft's Excel spreadsheet system to handle chemical sketches as well as numbers and text.

5.0 The current research of Cheminformatics

Using SciFinder to look for “Cheminformatics” gave 76 hits at the time of writing.

This table shows the number of hits (for “Cheminformatics”) per publication year. Notice the last three years show large number of hits referring to Cheminformatics.

[2001]

Cheminformatics: a tool for decision-makers in drug discovery. Olsson, Thomas; Oprea, Tudor I. Medicinal Chemistry, AstraZeneca R and D, Moelndal, Swed. Current Opinion in Drug Discovery & Development (2001), 4(3), 308-313.

This paper looks at developments in virtual library analysis with enumeration, and describes the new methodologies to investigate chemical similarity in literature globally.

[2002]

Data pipelining for dynamic data integration in cheminformatics. Hahn, Mathew. SciTegic, Inc., San Diego, CA, USA. Abstracts of Papers, 223rd ACS National Meeting, Orlando, FL, United States, April 7-11, 2002 (2002),

This paper talks about pipelining. Pipelining is a new high- throughput system for drug discovery informatics that allows users to perform complex data integration and mining projects

[2003]

Data mining, virtual screening, and cheminformatics in integrated drug discovery. Manly, Charles J. Discovery Technologies, Neurogen Corporation, Branford, CT, USA. Abstracts of Papers, 225th ACS National Meeting, New Orleans, LA, United States, March 23-27, 2003 (2003),

This paper gives an account to the fall clinically safe drugs. This is due to the fact that current HTS is inefficient. Thus targeted and directed high-throughput synthesis and screening efforts have been created.

The papers all relate to the drug discovery, there was a paper written to showing the recent developments in Cheminformatics education. This shows informatics is not only being researched by industry but by academia as well.

The latest developments in Cheminformatics applications are that the full text electronic journals are now fully accessible. Web interfaces for chemical information applications are now being made. Enhanced 3D structure and reaction searching is being used to optimise drug fit.

The chemical literature (pre 1960’s) is being archived.

Web based databases from Physical Science Information gateway PSIgate () have been produced to give access to information from the internet search engines. Unlike Googles problems of high recall and low precision PSIgate information is refereed (high precision) and so is high quality.

5.1 The Future of Cheminformatics

A critical issue facing companies involved in scientific research and development is the management of their corporate data and access to the scientific literature at large.

Cheminformatics will continue to process this literature in order to make faster routes for drug discovery and in effect make these drugs more clinically effective.

The gap between Bioinformatics and Cheminformatics has to be shortened so that the can be a comparison of data. However the Bioinformatics groups have focused their work on genomic and proteomic data while Cheminformatics groups have focused on HTS and chemical data with little or no interaction. The problem is providing the tools for these researchers to accomplish this challenge

Due to recent advances in Cheminformatic applications the future looks bright because now data is being used to do research in areas of chemistry where it wasn’t before. The principles of Cheminformatics are being used to organise and analyse this new research.

Conclusion

The appearance of the term Cheminformatics in the last six years has been contributed to the fact that the methods used in Cheminformatics are all vitally important for drug discovery.

Cheminformatics transforms data into information and information into knowledge so that better decisions are made faster in the area of drug lead identification and organization. Thus, the transition of data into information and information in to knowledge for the storage and retrieval of chemical information is nearing completion.

Due to the advances in computers the data can be organised and viewed in the way that the user requires it. This is done by the use of post processing tools like Microsoft access which can sort the retrieved data. 3D simulations can be used to design optimal fits between drugs and receptor sites.

Growing inventories of data and techniques such as high throughput screening and combinatorial catalysis are making the management of chemical information a vital and challenging business. Effective Cheminformatics architecture ensures that scientists can maximize the value of the data and information available to them.

Amongst High Throughput Screening, other methods such Data pipelining, Data mining, etc are now available. These methods increase the efficiency of clinical trials therefore giving us more lead drugs.

While gathering, storage and registration of data transforms it to information, it is accessibility, manipulation, and data mining of chemical information that translates it to knowledge for smarter drug development. This gives a faster decision which means less work is done and so it is cost efficient. This means lower cost for companies and in return higher profits.

Brown, F.K. “Chemoinformatics, what it is and how does it impact drug discovery.” Annual Reports in Medicinal Chemistry, 1998, 33, 375-384.

What undergraduates need to know about cheminformatics. Wiggins, Gary D. Chemistry Library, Indiana University, Bloomington, IN, USA. March 23-27, 2003 (2003),

Chemoinformatics, cheminformatics, chemical informatics: What is it? Wiggins, Gary D.; Shreve, Wendie. Chemistry Library, Indiana University, Bloomington, IN, USA. March 23-27, 2003 (2003)

Cheminformatics: a tool for decision-makers in drug discovery. Olsson T; Oprea T I AstraZeneca R&D, S-43183 Molndal, Sweden Curr Opin Drug Discov Devel (2001 May), 4(3), 308-13. Ref: 68.

Bridging cheminformatics and bioinformatics by using protein structures. Chan, Ah Wing E.; Laskowski, Roman A.; Thornton, Janet M. Molecular Design, Inpharmatica, London, UK. (2001), 221st ISSN: 0065-7727.

Data mining, virtual screening, and cheminformatics in integrated drug discovery. Manly, Charles J. Discovery Technologies, Neurogen Corporation, Branford, CT, USA. Abstracts of Papers, New Orleans, LA, United States, March 23-27, 2003 (2003), COMP-344. Publisher: American Chemical Society, Washington, D. C

Use of Recursion Forests in the Sequential Screening Process: Consensus Selection by Multiple Recursion Trees. van Rhee, A. Michiel. ICAGEN, Inc., Research Triangle Park, NC, USA. Journal of Chemical Information and Computer Sciences (2003), 43(3), 941-948.

Cheminformatics and the Internet. Guner, Osman F.; Casher, Omer; Shah, Ajay V.; Hempill, Chris. Molecular Simulations Inc., San Diego, CA, USA. Book of Abstracts, 218th ACS National Meeting, New Orleans, Aug. 22-26 (1999),

2003 Centre for Molecular and Biomolecular Informatics, Toernooiveld 1, P.O. Box 9010, 6500 GL Nijmegen, +31 (0)24-3653391, http://www.cmbi.kun.nl/news/main.html

High-throughput data analysis. Rogers, David. SciTegic, Inc, San Diego, CA, USA. Abstracts of Papers, 224th ACS National Meeting, Boston, MA, United States, August 18-22, 2002 (2002),

Informatics challenges in chemical data storage, retrieval and mining are being met with the development of new cheminformatics technologies and tools. Cohen, Janet; Diller, Dave; Gund, Peter. Pharmacopeia, Inc, Cranbury, NJ, USA. Abstracts of Papers - American Chemical Society (2001)

Helen Schofield, Chemistry Department, UMIST, Sackville Street, Manchester, UK M60 1QD, Gary Wiggins, Chemistry Library, Indiana University, 800 E. Kirkwood, DDT Vol. 6, No. 18 September 2001.

Web-based technology for cheminformatics. McDaniel, Joe R. Cheminformatics, Oxford Molecular Group, Inc., Hunt Valley, MD, USA. Book of Abstracts, 218th ACS National Meeting, New Orleans, Aug. 22-26 (1999),

Cheminformatics - a career for the future. Hebert, Jonathan. MDL Information Systems, Inc., San Leandro, CA, USA. Book of Abstracts, 217th ACS National Meeting, Anaheim, Calif., March 21-25 (1999),

Mind the gap: Bridging the gulf between bioinformatics and cheminformatics. Langton, William; Higgins, Mike. Tripos, Inc, St. Louis, MO, USA. Abstracts of Papers - American Chemical Society (2001),

The origins, applications and current research trends in Cheminformatics.

This is a preview of the whole essay

Document Details

Related Essays

CLIENT SERVER APPLICATIONS

This essay is developed based on attempts to summarize the current state of...

Toronto Airport Case Study Wireless Applications

Knowing which applications the target hosts are running goes a long way tow...