You are given a cloned gene from Homo sapiens and are asked to identify structurally similar genes (potential orthologues) in a. Arabidopsis. b. Bombyx mori (silk moth). How will you determine the expression pattern of the gene?

Chris Holland 4/2/01

Jesus College

You are given a cloned gene from Homo sapiens and are asked to identify structurally similar genes (potential orthologues) in a. Arabidopsis. b. Bombyx mori (silk moth). How will you determine the expression pattern of the gene?

Before analysis of related genes can begin it is important to extract as much information about or cloned gene is possible, as this will aid us in our search for potential orthologues. Firstly I am assuming that the cloned gene is derived from a DNA library and is contained within a vector as part of recombinant DNA. Knowledge about the type of library from which it originated is helpful (either cDNA or Chromosomal) for analysis of the sequence but I will examine that later.

The first step is to determine the DNA sequence of this cloned gene. Assuming that I have been given only one colony of the bacteria containing the cloned gene for good experimental technique it is necessary to increase the amount of bacteria and so copies of the gene, before starting analysis. I would do this by setting up a liquid phase culture (LB for example) and incubating the colony I have been given until I have an appropriately sized quantity.

In order to start analysis I must first extract the cloned gene from the bacteria and the plasmid vector to which it is attached. I would start by separating the components of the cell by size, in order to get rid of cellular debris. Initially I will have to break apart the cell to gain access to the plasmids, done by the addition of Lysozyme and EDTA. Because the plasmids used are of a lower weight than the rest of the cell contents after centrifuging they will stay in the supernatant whilst the rest will form a pellet at the bottom. See diagram.

There is now a solution containing many types of damaged plasmids and small strands of DNA, which will affect the quality of our sequencing unless properly removed. This separation is based on conformation. I would use a CsCl density gradient to separate my plasmids from the rest as although it takes longer to perform the yield is much higher and quality of the sample is better. CsCl density gradients work via spinning your sample in an ultracentrifuge. The Caesium and Chlorine molecules in solution are forced downwards but this is counterbalanced by diffusion. What you are left with is a gradient of CsCl with differing densities at different points, see diagram.

The sample, which is now in the gradient, separates according to the individual components buoyancy densities. If you also add EtBr to the solution before spinning it will bind to the uncoiled DNA and allow you to separate that from your desired plasmids. Extraction of the supercoiled DNA plasmids from the EtBr is also displayed in the diagram.

Eventually what we have now is a sample of relatively pure plasmid DNA containing our cloned gene. We cannot just simply do our sequencing now we must first extract the cloned gene from our plasmid. I would do this by cutting up the plasmid using the same restriction enzymes used in order to insert the gene into the plasmid in the first place. Taking the assumption that the approximate length of the gene is known I would separate the recombinant DNA from the plasmid DNA via running them on a gel with a ladder, and EtBr dye to see the DNA banding under UV, and cutting out the piece of agarose with the cloned gene band. I would also take steps to resuspend that DNA in solution and remove the EtBr.

Now I have a sample containing only my cloned gene I can begin to sequence it. I would choose the chain termination method of sequencing developed by Sanger and Coulson in the 60’s. This method is based on creating a complementary strand of DNA, which differs in length due to the addition of modified bases, which terminate DNA chain synthesis and are radio labelled or have a phosphorescent dye attached. Firstly the DNA must be made single stranded via alkaline denaturation and then inserted into a vector known as the m13mp vector. The insertion occurs at the polylinker region, a section of NA with many cutting sites. Then a primer must be added (near the polylinker site) to form a double-sided piece of DNA for the Klenow fragment of DNA polymerase, or a similar “sequencease”, to work on. This primer is known as the Universal Sequencing Primer as it will initiate the creation of a second strand for any piece of DNA that has been inserted into the vector.

Adding the enzyme plus each of the 4 deoxynucleotides starts the second strand synthesis reaction. In addition a single modified nucleotide is also included in the reaction mixture in smaller amounts. This is a dideoxynucleotide which can be incorporated onto the growing polynucleotide just as efficiently as the normal nucleotide, but stops further strand synthesis. This is because the dideoxynucleotide lacks the hydroxyl group at the 3’ position of the sugar component. This is needed for the next nucleotide to bind.

If dideoxyATP is added to the reaction mix, then termination occurs at positions opposite thymidines in ...

This is a preview of the whole essay

If dideoxyATP is added to the reaction mix, then termination occurs at positions opposite thymidines in the template. But this termination doesn’t always occur at the first T due to the presence of unmodified A’s, which may be incorporated. The amount of dideoxyATP present in the mix is enough for a strand to be synthesised for a long time before termination, or little time at all. The result of this is to create a family of strands, all of differing lengths, but each ending in dideoxyATP. This process is then repeated using a different termination base.

The next step is to separate the components of each family so the lengths of each strand can be determined. This can be achieved by gel electrophoresis, although the conditions have to be carefully controlled, as it is necessary to separate strands that differ in length by just one nucleotide. I would use a polyacrylamide gel, only 0.5mm thick containing urea which separates the newly synthesised strand from its template. Also the gel is run at a high voltage, heating it up to 60oC, to make sure that the DNA doesn’t reassociate. Each band in the gel only contains a small amount of DNA so labelling of it occurs at the strand synthesis step by either adding a radioactive element (32P) or a phosphorescent dye.

The gel can now be read via an autoradiograph or put through a DNA sequencing computer which is able to do the whole process of sequencing unaided and relies on lasers to recognise the phosphorescent dyes. The DNA sequence can then easily be deduced from the relative migration of the strands and the family in which they occur, see diagram.

Note that his method will only reliably produce results for sequences up to 400 bp long. Sequences longer than that must be cut up into smaller fragments first before insertion into m13mp and then the gene must be recreated via overlapping and joining of sequences from results.

Finally we have a DNA sequence for our gene in question. But going back to our original paragraph I stated that it is important to determine what type of library did the gene come from. If it was from a cDNA library then the sequence we have now is directly from the mRNA transcript and contains only the exon sequence. But if it is from a Chromosomal Library then the gene may be embedded in a DNA sequence and it will also contain introns. Luckily now we can put the code into a computer and it will remove all the introns and intergenic DNA for us using complex algorithms designed to search and remove specific sequences of DNA found before introns occur and self-splicing sections of DNA. This leaves you with a probable mRNA sequence to use.

This information can also be used to deduce the amino acid sequence of the product and if you have access to a supercomputer the polypeptide folding and final structure of the protein can be deduced. This combined with the hopeful expression of the gene in the colony earlier (if it was from a cDNA library) will allow for a much more sensitive search for the orthologues in the silk moth.

Aribidopsis Thaliana

Searching for related genes in Arabidopsis couldn’t be easier. Over the past 20 years or so efforts have been made to sequence the entire genome. That effort was accomplished last autumn and now scientists are able to compare their DNA sequences with that of the Arabidopsis genome, looking for similarities. All you have to do to search for an orthologue is to go to a website, e.g. and use the BLAST program. Then simply type in your sequence hit search and any matches within the database are displayed graphically and then with a more informative text output telling you the percentage accuracy of the match and the gene it is matched to. Here is an example of a match I performed earlier just using some randomly typed in DNA sequence.

Bombyx mori (silk moth)

The search for potential orthologues becomes increasingly more difficult in the silk moth because unlike the Aribidopsis it is not yet fully sequenced. However you can still use the partially completed database to search at http://210.145.41.132/cgi-bin/lib_proc_MX . However when I tried using the same random sequence above it returned no matches at all and only 18 thousand sequences had been determined so far.

Because I cannot simply BLAST the sequence I have to search for any orthologues manually. This can be done in two methods, depending on whether the gene is expressed frequently enough to detect in the silk moth or not at all. To determine whether the gene is expressed I would take a number of samples of tissue from the moth (muscle, neuronal) and then using the protein derived from the human gene as a template to would check for that protein by ding a biochemical analysis of the gene products. This would only be a quick search as the chances are that the gene in question is not expressed or is so rare that I will not be able to detect it.

If however I do manage to find that the gene in question does have an orthologue that is expressed I can then move onto trying to ascertain the DNA sequence from the mRNA transcript and studying the similarities. Firstly I would have to isolate all the mRNA from the cell. This would be done via lysis and centrifuging as stated before and then in order to separate the mRNA from the tRNA and rRNA I would have to use a method that discriminated between them based on the fact that most Eukaryotic mRNAs carry a poly (A) tail. I would pass the RNA over a column consisting of an inert material, probably agarose, to which oligonucleotides consisting entirely of dT residues that have been attached. The poly (A) tails would hybridise to this oligo causing the mRNA to stick to the column while the rest of the RNA runs through. After an excess of washing to remove all traced of the run-off the column would be washed through with a buffer of low ionic strength causing the mRNA hybrids to dissociate and the purified mRNA is washed out. See Diagram.

The next step it to take this mRNA, which hopefully has a transcript for the gene I am searching for and convert it into DNA. This is done by incubating the mRNA with dT to anneal to the poly (A) tails. This forms primed 3’ tails for the enzyme reverse transcriptase. The result is a collection of RNA-DNA hybrids. In order to remove the RNA so that you are left with only double stranded DNA is to use an enzyme that nicks the RNA chain and then add DNA polymerase which will slowly replace the RNA nucleotides with DNA ones. This method is often used in order to obtain the missing 5’ end of a partial cDNA clone. Another method would be to remove the RNA using alkali and then the 3’ hairpin loop end will act as a primer for the DNA polymerase to synthesize the complementary strand. The hairpin loop would be cut open by the use of an S1 nuclease. See Diagram:

I would then use PCR in order to amplify the target gene sequence using primers for both directions based on the sequence already obtained from the human gene. Although this will not be exact if I use a low enough temperature and a long primer sequence I should be able to obtain some product. My primer sequence would be about 12 bases long to ensure that the number of false positives is kept to a minimum. I could write a whole essay on PCR and it’s methods but the basic idea is to amplify a region of DNA bound by primers annealed to the unwound DNA in opposite directions. The addition of a DNA polymerase (Taq polymerase due to it’s operation at 74oC) and free bases creates a strand of DNA complementary to the template. Heat then splits the strands and the process can repeat. See diagram

However because single stranded DNA is best for the Sanger method of DNA sequencing I would slight modify the PCR technique to the one above. I would use a limiting amount of one of the primers. This way one of the primers would get used up quickly and then be unable to produce any more strands, whilst the primer in abundance uses the limited one as a template still and is able to replicate many more strands. This is known as Asymmetric PCR. Then it is a simple case of sequencing the DNA as stated before and comparing the sequences.

More than likely the gene is not expressed by the cell or is too hard to isolate the product then the much harder task of creating a chromosomal library must be used in order to isolate the gene. A chromosomal library is created by taking the genomic DNA of the organism and cutting it up into small fragments using restriction endonucleases. Then the small fragments are inserted into a vector, i.e. bacterial plasmid via the same restriction enzymes and recombinant DNA is made using DNA ligase. These plasmids are then inserted into the bacteria by making them competent. The bacteria are then plated out onto a series of agar plates. Hopefully the entire genome will be contained within this library of fragments you have just created.

Once I have the entire silk moth genome present in this chromosomal library, the gene I am looking for should be present in one of the colonies of bacteria. In order to find the gene of interest I would create a radio labelled probe (complementary sequence of DNA based on Human gene) and, using a replica of the culture plates, look for it. See Diagram.

The major problem with chromosomal DNA libraries is that although I would get a few colonies that lit up but they could either be false positives or recombinant DNA that did not include the entire gene due to the cutting process. However there will be one that lights up and the using the same process as described to extract the Human gene from the plasmid. Then once the gene has been isolated asymmetric PCR and DNA chain termination sequencing can be used to analyse the gene for homology.

Checking The Expression Patterns of Proteins

There are many different methods for looking at the expression pattern of a protein. The nature of the protein must be determined via its DNA sequence for some of the techniques to work, whereas others rely on looking at the expression of protein precursors such as the mRNA transcripts.

The best method of performing this assay is to ask some basic questions. In order to determine an expression pattern we must look at where, when and how long for is the protein expressed. Where can be accomplished by assaying cells from a multitude of tissue types in all the organisms concerned. The chance that the orthologues carry out the same kind of job in the other organisms is high so if you find the gene being expressed in one kind of tissue at a particular time then looking at related tissues in the other organisms you stand a higher chance of finding it.

When the gene is expressed is best determined by assaying cells of different tissue types at different stages in the organism’s development. Also looking at the DNA sequence for specific activator or repressor sites may give some clues as to when the gene is expressed.

Using the following methods for gene expression detection and assaying the levels of the gene expression over time can determine how long the gene is expressed for.

Studying the transcript of the cloned gene can lead to information about its expression. Most methods of transcript analysis involve hybridisation between the RNA transcript and a fragment of DNA containing the relevant gene. This can be analysed using two different methods, by electron microscopy, where the hybrid is looked for, or by the use of radio labelled probes as shown before to see if the gene is expressed.

Studying the translation product of the cloned gene can give us information as to how large the product is and how to look for it using other techniques. Hybrid-release translation (HRT) is used to create gene products using cell free translation systems. The gene products from the mRNA are created in vitro and are usually labelled through a particular amino acid. Which can then be run on a gel with another protein ladder and the size of the gene product can be determined. With this method it would therefore be possible to find out the absorbance of the product at a particular wavelength. The cells could be lysed and proteins purified and run it through an HPLC column attached to a UV detector (or mass spec if exact weight could be calculated) to see how much of the particular protein was being expressed. You could also use a GC but I think the protein would be too big for the coils and clog them up.

The only other way I can think of to determine the gene expression pattern of the different organisms is to use antibodies. These would be created by challenging a rabbit with the protein in question. The rabbit would be left for a couple of days to create antibodies in an immunological response to the foreign protein. Then the antibodies in the blood could be extracted and have a luminescent marker attached, i.e. horseradish peroxidase. Now when the cells are plated out and exposed to the antibody if the protein in question if being expressed then they will bind to the protein. If the cells are then exposed to luminol the cells expressing the gene light up due to the chemiluminescence with the modified antibody.

All the above methods are ways of detecting the gene products and hence expression patterns of the organisms in question. I have not mentioned the use of deletion sequences to determine the promoters and activators of the genes because in eukaryotes there is usually fatal damage done to the embryo if these sequences are disrupted, much more than prokaryotes.

Bibliography

Watson (1992) Recombinant DNA 2nd Edition

Williams (1993) Genetic Engineering

Alberts (1994) Molecular Biology of the Cell 3rd Edition

Lodish (1995) Molecular Cell Biology 3rd Edition

Purves (1998) Life The Science of Biology 5th Edition

Brown T.A (1998) Gene Cloning An Introduction 3rd Edition