The Human Proteome Project - “Genes were easy”

Genetics Essay:

The Human Proteome Project - “Genes were easy”

Proteomics is an emerging area of the post genomic era that uses a plethora of techniques to resolve, quantitate, rapidly survey the identity of proteins, and annotate, as well as to identify their interacting partners. The proteome is to proteins what the genome is to genes, and scientists are now moving towards creating a complete set of data cataloguing and describing every protein expressed by the human genome. In this essay I will talk about the history of the HPP, propose a hypothetical situation where I will describe how I would go about annotating a protein by just starting with the gene and contrast this to modern day methods which can be exploited by a large laboratory and finally talk a little about the problems that need to be overcome by the HPP.

The natural progression from the completion of the genome was to study what the genome coded for, proteins. The reasons why there have been no human transcriptome project is because that is the basis for DNA microarrays and has never been labelled as that. However the beginnings of cataloguing and annotating proteins started way before the human genome project was conceived. The first step was in a proposal for the Molecular Anatomy Program in Oak Ridge in 1960. This project intended on creating an inventory of cells at the molecular level as technology, either existing of envisioned would allow. This lead to the development of techniques such as fractionation of proteins, HPLC, and finally high-resolution 2D Electrophoresis in 1975.

In 1980 in America there was an attempt to launch the Human Protein Index Project (HPIP) as a national objective but support was lost when the supporting senators were not elected. However in 1983 proponents of the HPIP suggested that a dual effort involving the complete sequencing of the human genome and a parallel protein project. Of these two, The Human Genome Project was the first to succeed, in part because the basic technology was already available. But now proteomics has taken centre stage. The first conference of the Human Proteome Project has just taken place (April 2-4 2001) gathering companies and research institutes form all over the world to try and organise a collaborative effort to complete the human proteome.

I will now run through the techniques used in order to separate, identify and annotate a protein. I will refer to the basic techniques, which pioneered the study of proteomics and then talk about the cutting edge technology used today.

The situation is as follows, I have been given a gene from the human genome and am asked to find and annotate the protein produced by it. By annotation I mean determine it’s amino acid sequence and subsequent structure. The first step in this process is locating in what type of cell is this protein expressed. Obviously I cannot simply probe the DNA of different type of cells, as that would create nothing but positives. So I would take a range of cell types, extract their mRNA and perform a Northern Blot initially using a radiolabelled probe created from the known sequence. This will give a good indication that the cell is expressing the protein of interest and a qualitative impression of how much.

Separation and identification of the protein

Now that the cells containing the protein have been identified, the protein must be separated and purified from all the other cellular components before it can be analysed.

The method initially used is differential centrifugation where the contents of the cell are separated according to their weight. The first step is to homogenate a culture of the cells so that their contents are free to separate. This can be done via lysis osmotic shock, or careful sonication. Once the cells have been broken up into smaller pieces they can then be loaded into a centrifuge and span down. The speed and duration of the centrifuge must be carefully planned as if it is too slow then the particles will not separate and if it is too fast ...

This is a preview of the whole essay

The typical values for the various centrifugation steps referred to in the figure are

Low Speed: 1000G for 10 Minutes Medium Speed: 20,000G for 20 Minutes: High Speed: 80,000G for 1 hour Very High Speed: 150,000G 3 Hours.

The protein will now be in the supernatant. The next necessary step of purification is to separate the proteins (and other smaller molecules) from one another so that the specific protein can be identified.

The classic assay which separates proteins on a charge:mass ratio is called Western Blotting or Immunoblotting. It is the experiment which kicked off the whole field of proteomics. The process is really quite simple and occurs in two sequential steps, separation by charge and then by mass.

Proteins each have a unique overall electric charge based on the number of acidic or basic amino acids the polypeptide contains. In the first step, the supernatant is fully denatured by high concentrations (8M) of urea and then layered on a glass tube filled with polyacrylamide that is saturated with a solution of ampholytes, a mixture of polyanionic and polycationic molecules. When placed in an electric field, the ampholytes will separate and form a continuous gradient based on their net charge. The most highly polyanionic ampholytes will collect at one end of the tube and the most polycationic ampholytes at the other. This gradient of ampholyte establishes a pH gradient. Charged proteins will then migrate through the gradient until they reach their pI, or isoelectric point, the pH at which the net charge of the protein is zero. This is called isoelectric focusing (IEF) and the resolution is only one charge unit.

Separation by mass comes about through placing the IEF gel lengthwise across one end of a gel saturated with SDS. When a charge is put across the gel the proteins migrate into the SDS gel. The SDS (a powerful negatively charged detergent) bind to the proteins and causes them to unfold resulting in them having an almost identical charge:mass ratios. Therefore the only thing separating the proteins now it their mass. Smaller proteins are able to migrate faster because they can fit through the small pores in the matrix, larger proteins are retarded. See diagram:

Now that the proteins have been separated according to their charge:mass ratio in order to identify the protein of interest a method of visualising the protein must be employed. The SDS gel is placed on to a nitrocellulose membrane and a charge is run through the whole gel, which causes the proteins to migrate directly down and deposit, and bind, onto the membrane. The gel is discarded and the membrane is washed with an inert protein such as serum albumin to prevent any more proteins binding to the membrane. Then the membrane is incubated with an antibody specific to the protein of interest (see later for creation of the antibody). Only the desired protein on the membrane will bind to the antibody when it is all washed off. The membrane is then washed in a second antibody, which is covalently attached to a marker and is specific to the primary antibody. The type of marker that can be used could be a radio labelled one or alkaline phosphatase, which catalyses a chromogenic reaction. Detection is via autoradiography, or by adding a substrate that the enzyme breaks down to a deep purple precipitate. See Diagram:

Extracting the protein from the nitrocellulose membrane is difficult and so further analysis of the protein is difficult after that although you do know the charge and the mass of the protein which means you can go back to a culture and extract the protein via different methods, such as rate-zonal centrifugation. However labs that can afford High Pressure Liquid Chromatography (HPLC) to extract the necessary protein use a more reliable and efficient method for the extraction of a protein. There are three types of column you can use for this procedure and they separate the proteins involved in different ways, based on its mass, charge or binding affinity;

The best one to use in this case is antibody-affinity chromatography because the antibody for the protein has already been created. This allows for a purified sample that is ready to be used for sequence and structural analysis. Using the other columns on the now purified sample will give some indication on its gross structure and size. A more modern method involves using Mass Spectrometry to deduce the mass and charge of proteins and polypeptides, but this does not result in a usable sample at the end.

Determining the primary structure of the protein

Even though the DNA sequence of the gene is known we cannot be sure that all the exons are expressed in the gene product. The cause of this is differential splicing and can be used to create hundreds of different proteins from one single gene. So although it is possible to know the possible amino acid combination of the protein the actual amino acid composition of the polypeptide is unknown.

The classic method for determining the amino acid sequence involves Edman Degradation. In this procedure the amino group at the N-terminus of a polypeptide is labelled and it’s amino acid then cleaved from the polypeptide and identified by HPLC. The polypeptide is left one residue shorter, with a new amino acid at the N-terminus. The cycle is repeated on an ever-shortening polypeptide until all the residues have been identified. See Diagram:

However since 1985 recombinant DNA technology has allowed scientists to analyse the mRNA sequence on the protein and that allows for a much more precise and faster reading of the primary structure. The mRNA of interest would be found by performing a northern blot, probing for the mRNA using a probe based on the first few nucleotides of the gene, then the mRNA could be extracted from the gel. Once extracted a cDNA copy of the mRNA could be made using reverse transcriptase and free nucleotides and then the sequence could be bulked up via Polymerase Chain Reaction (PCR). Now that a large enough sample has been obtained it can then be sequenced, usually by the chain termination method. Once the sequence has been completed then it is simply a matter of identifying the codons and relating them to the associated amino acid.

Determining the conformation of the protein

The secondary tertiary and quaternary structures of proteins are elucidated by techniques that are not readily available to the average scientist and requires very expensive and complex machinery. There are currently 4 major types of assay used.

X-ray Crystallography

This has been used since the 1950’s to determine the 3D structure of proteins. Firstly a very pure sample is obtained and then through heating and controlled cooling a crystal of the protein is formed. This step takes many attempts as the conditions required for the protein to form are very difficult to ascertain. If you are lucky enough to obtain a crystal then the structure is found by passing X-rays through the protein. Because the wavelength of x-rays are short (0.1-0.2nm) then are able to resolve down to the atomic level. Atoms in the protein crystal scatter the X-rays, which produce a diffraction pattern of discrete spots when they are intercepted by photographic film. These patterns are extremely complex; as many as 25,000 diffraction spots can be obtained from a small protein. Elaborate calculations and modifications of the protein (such as binding of heavy metals)must be made to interpret the diffraction pattern and solve the structure of the protein. This is as complex as trying to reconstruct the pebble from the ripples it made when it struck the pond. Reconstruction is done via computers. See Diagram:

Cryoelectron Microscopy.

Due to the fact that finding the exact conditions required for the crystallisation of a protein is a very time consuming and laborious process, low-resolution views can be obtained via electron microscopy. The protein sample is frozen in liquid helium and then subjected to a low dose of electrons to determine the structure. Like X-ray crystallography the image is recorded onto film and then a sophisticated computer is used to reconstruct the protein. With recent advances in electron microscopy the models now obtained from this technique equal that of ones from x-ray crystallography.

NMR Spectroscopy

The structure of small proteins weighing up to about 200 AA residues long can be studied using Nuclear Magnetic Resonance NMR technology. In this technique, a concentrated protein solution is placed in a magnetic field and the effects of different radio frequencies on the resonances of different atoms are measured. However, the behaviour of any atom is influenced by neighbouring atoms in adjacent residues; closely spaced residues are more perturbed than distant ones. From the magnitude of the effect the distances between residues can then be calculated; these distances are used to generate a model of the 3D structure of the protein.

Computational Folding

The cutting edge of this field is the use of computers is to take the mRNA sequence and then applying the basic laws of physics and chemistry build the protein from the atoms up. This is incredibly time consuming and requires an immense amount of computing power. IBM research group is building the most advanced computer in the world. They have spent $60 million on the project called Blue Gene, the progeny of Deep Blue the supercomputer that beat Russian Grand Master Gary Kasparov three years ago. It will be 1000 time more powerful and have the power of about 2m of today’s desktop machines. It should tackle about one quadrillion instructions per second easily making it the fastest computer on the planet. However with all that power it will still take the IBM machine one year to produce the answer for just one protein.

Problems facing the Human Proteome Project

Even with all these sophisticated techniques at our disposal the project is still a massive undertaking by the scientific community. The problems they currently face are of two types, technical and conceptual.

The technical problems are that none of the techniques can yet detect and identify all the proteins in a cell. Also even with advent of automation performing many of the mentioned tasks with extremely high throughput rates the assays are not yet sensitive enough and don’t go very far in obtaining proteins as they would be expressed in vivo. Finally there is no assay that tells you all the information about a protein, rather ones that tell you very finite amounts about certain aspects of the protein.

The conceptual problems are concerning the design of the experiment to which proteomic technology can be applied, and more importantly the challenge of processing the information. Even though we only have basic level analysis of the proteins the amount of information is massive and organising that data is beginning to be a real problem. One company however has spent the last ten years collecting data about every single protein that has had something published about it and put it into a database. This can be found at www.proteome.com . There will be a time in the future where scientific research can be performed just by analysing data on the internet, without the need for getting your hands dirty, although there will always be a call for it. I think Oscar Wilde put it best when he said:

“It is such a shame that nowadays there is so little useless information”

To summarise the Human Proteome Project is inconceivably large and work will continue on it for many years to come, it is not a linear program like the HGP as full annotation includes describing how proteins interact with each other and function as a whole. However large multinational companies will perform the bulk of the project rather than government funded labs, like the HGP, as the technology is so expensive. But then we come to another problem with who has the rights over the data obtained but that debate is for another time.

Bibliography

Alberts (1994) Molecular Biology of the Cell 3rd Edition

Lodish (1995) Molecular Cell Biology 3rd Edition

Purves (1998) Life The Science of Biology 5th Edition

Stryer (1995) Biochemistry 4th edition

Watson (1994) Recombinant DNA 2nd Edition

Internet links obtained from BioOnline and Proteome.com.

- -

Teacher Reviews

Here's what a teacher thought of this essay

Rebecca Lewis

1st of Jan 2012

A far reaching and comprehensive discussion of the proteome project, that still has the potential to be improved in several ways. In places there is far too much detail: In several parts the student has discussed in depth techniques that are not actually relevant to answering the question. This has made the work unnecessarily long. There is also confusion in a number of places over some terminology. When writing a technological essay, a real effort should be made to define all terms clearly and as soon as they are discussed. The student also regularly slips into a colloquial "chatty" voice, which is not appropriate for a formal piece of writing such as this. At my (Russell Group) university, this piece of work would receive 3/5.

Did you find this review helpful? Join our team of reviewers and help other students learn