Adding the enzyme plus each of the 4 deoxynucleotides starts the second strand synthesis reaction. In addition a single modified nucleotide is also included in the reaction mixture in smaller amounts. This is a dideoxynucleotide which can be incorporated onto the growing polynucleotide just as efficiently as the normal nucleotide, but stops further strand synthesis. This is because the dideoxynucleotide lacks the hydroxyl group at the 3’ position of the sugar component. This is needed for the next nucleotide to bind.
If dideoxyATP is added to the reaction mix, then termination occurs at positions opposite thymidines in the template. But this termination doesn’t always occur at the first T due to the presence of unmodified A’s, which may be incorporated. The amount of dideoxyATP present in the mix is enough for a strand to be synthesised for a long time before termination, or little time at all. The result of this is to create a family of strands, all of differing lengths, but each ending in dideoxyATP. This process is then repeated using a different termination base.
The next step is to separate the components of each family so the lengths of each strand can be determined. This can be achieved by gel electrophoresis, although the conditions have to be carefully controlled, as it is necessary to separate strands that differ in length by just one nucleotide. I would use a polyacrylamide gel, only 0.5mm thick containing urea which separates the newly synthesised strand from its template. Also the gel is run at a high voltage, heating it up to 60oC, to make sure that the DNA doesn’t reassociate. Each band in the gel only contains a small amount of DNA so labelling of it occurs at the strand synthesis step by either adding a radioactive element (32P) or a phosphorescent dye.
The gel can now be read via an autoradiograph or put through a DNA sequencing computer which is able to do the whole process of sequencing unaided and relies on lasers to recognise the phosphorescent dyes. The DNA sequence can then easily be deduced from the relative migration of the strands and the family in which they occur, see diagram.
Note that his method will only reliably produce results for sequences up to 400 bp long. Sequences longer than that must be cut up into smaller fragments first before insertion into m13mp and then the gene must be recreated via overlapping and joining of sequences from results.
Finally we have a DNA sequence for our gene in question. But going back to our original paragraph I stated that it is important to determine what type of library did the gene come from. If it was from a cDNA library then the sequence we have now is directly from the mRNA transcript and contains only the exon sequence. But if it is from a Chromosomal Library then the gene may be embedded in a DNA sequence and it will also contain introns. Luckily now we can put the code into a computer and it will remove all the introns and intergenic DNA for us using complex algorithms designed to search and remove specific sequences of DNA found before introns occur and self-splicing sections of DNA. This leaves you with a probable mRNA sequence to use.
This information can also be used to deduce the amino acid sequence of the product and if you have access to a supercomputer the polypeptide folding and final structure of the protein can be deduced. This combined with the hopeful expression of the gene in the colony earlier (if it was from a cDNA library) will allow for a much more sensitive search for the orthologues in the silk moth.
Aribidopsis Thaliana
Searching for related genes in Arabidopsis couldn’t be easier. Over the past 20 years or so efforts have been made to sequence the entire genome. That effort was accomplished last autumn and now scientists are able to compare their DNA sequences with that of the Arabidopsis genome, looking for similarities. All you have to do to search for an orthologue is to go to a website, e.g. and use the BLAST program. Then simply type in your sequence hit search and any matches within the database are displayed graphically and then with a more informative text output telling you the percentage accuracy of the match and the gene it is matched to. Here is an example of a match I performed earlier just using some randomly typed in DNA sequence.
Bombyx mori (silk moth)
The search for potential orthologues becomes increasingly more difficult in the silk moth because unlike the Aribidopsis it is not yet fully sequenced. However you can still use the partially completed database to search at http://210.145.41.132/cgi-bin/lib_proc_MX . However when I tried using the same random sequence above it returned no matches at all and only 18 thousand sequences had been determined so far.
Because I cannot simply BLAST the sequence I have to search for any orthologues manually. This can be done in two methods, depending on whether the gene is expressed frequently enough to detect in the silk moth or not at all. To determine whether the gene is expressed I would take a number of samples of tissue from the moth (muscle, neuronal) and then using the protein derived from the human gene as a template to would check for that protein by ding a biochemical analysis of the gene products. This would only be a quick search as the chances are that the gene in question is not expressed or is so rare that I will not be able to detect it.
If however I do manage to find that the gene in question does have an orthologue that is expressed I can then move onto trying to ascertain the DNA sequence from the mRNA transcript and studying the similarities. Firstly I would have to isolate all the mRNA from the cell. This would be done via lysis and centrifuging as stated before and then in order to separate the mRNA from the tRNA and rRNA I would have to use a method that discriminated between them based on the fact that most Eukaryotic mRNAs carry a poly (A) tail. I would pass the RNA over a column consisting of an inert material, probably agarose, to which oligonucleotides consisting entirely of dT residues that have been attached. The poly (A) tails would hybridise to this oligo causing the mRNA to stick to the column while the rest of the RNA runs through. After an excess of washing to remove all traced of the run-off the column would be washed through with a buffer of low ionic strength causing the mRNA hybrids to dissociate and the purified mRNA is washed out. See Diagram.
The next step it to take this mRNA, which hopefully has a transcript for the gene I am searching for and convert it into DNA. This is done by incubating the mRNA with dT to anneal to the poly (A) tails. This forms primed 3’ tails for the enzyme reverse transcriptase. The result is a collection of RNA-DNA hybrids. In order to remove the RNA so that you are left with only double stranded DNA is to use an enzyme that nicks the RNA chain and then add DNA polymerase which will slowly replace the RNA nucleotides with DNA ones. This method is often used in order to obtain the missing 5’ end of a partial cDNA clone. Another method would be to remove the RNA using alkali and then the 3’ hairpin loop end will act as a primer for the DNA polymerase to synthesize the complementary strand. The hairpin loop would be cut open by the use of an S1 nuclease. See Diagram:
I would then use PCR in order to amplify the target gene sequence using primers for both directions based on the sequence already obtained from the human gene. Although this will not be exact if I use a low enough temperature and a long primer sequence I should be able to obtain some product. My primer sequence would be about 12 bases long to ensure that the number of false positives is kept to a minimum. I could write a whole essay on PCR and it’s methods but the basic idea is to amplify a region of DNA bound by primers annealed to the unwound DNA in opposite directions. The addition of a DNA polymerase (Taq polymerase due to it’s operation at 74oC) and free bases creates a strand of DNA complementary to the template. Heat then splits the strands and the process can repeat. See diagram
However because single stranded DNA is best for the Sanger method of DNA sequencing I would slight modify the PCR technique to the one above. I would use a limiting amount of one of the primers. This way one of the primers would get used up quickly and then be unable to produce any more strands, whilst the primer in abundance uses the limited one as a template still and is able to replicate many more strands. This is known as Asymmetric PCR. Then it is a simple case of sequencing the DNA as stated before and comparing the sequences.
More than likely the gene is not expressed by the cell or is too hard to isolate the product then the much harder task of creating a chromosomal library must be used in order to isolate the gene. A chromosomal library is created by taking the genomic DNA of the organism and cutting it up into small fragments using restriction endonucleases. Then the small fragments are inserted into a vector, i.e. bacterial plasmid via the same restriction enzymes and recombinant DNA is made using DNA ligase. These plasmids are then inserted into the bacteria by making them competent. The bacteria are then plated out onto a series of agar plates. Hopefully the entire genome will be contained within this library of fragments you have just created.
Once I have the entire silk moth genome present in this chromosomal library, the gene I am looking for should be present in one of the colonies of bacteria. In order to find the gene of interest I would create a radio labelled probe (complementary sequence of DNA based on Human gene) and, using a replica of the culture plates, look for it. See Diagram.
The major problem with chromosomal DNA libraries is that although I would get a few colonies that lit up but they could either be false positives or recombinant DNA that did not include the entire gene due to the cutting process. However there will be one that lights up and the using the same process as described to extract the Human gene from the plasmid. Then once the gene has been isolated asymmetric PCR and DNA chain termination sequencing can be used to analyse the gene for homology.
Checking The Expression Patterns of Proteins
There are many different methods for looking at the expression pattern of a protein. The nature of the protein must be determined via its DNA sequence for some of the techniques to work, whereas others rely on looking at the expression of protein precursors such as the mRNA transcripts.
The best method of performing this assay is to ask some basic questions. In order to determine an expression pattern we must look at where, when and how long for is the protein expressed. Where can be accomplished by assaying cells from a multitude of tissue types in all the organisms concerned. The chance that the orthologues carry out the same kind of job in the other organisms is high so if you find the gene being expressed in one kind of tissue at a particular time then looking at related tissues in the other organisms you stand a higher chance of finding it.
When the gene is expressed is best determined by assaying cells of different tissue types at different stages in the organism’s development. Also looking at the DNA sequence for specific activator or repressor sites may give some clues as to when the gene is expressed.
Using the following methods for gene expression detection and assaying the levels of the gene expression over time can determine how long the gene is expressed for.
Studying the transcript of the cloned gene can lead to information about its expression. Most methods of transcript analysis involve hybridisation between the RNA transcript and a fragment of DNA containing the relevant gene. This can be analysed using two different methods, by electron microscopy, where the hybrid is looked for, or by the use of radio labelled probes as shown before to see if the gene is expressed.
Studying the translation product of the cloned gene can give us information as to how large the product is and how to look for it using other techniques. Hybrid-release translation (HRT) is used to create gene products using cell free translation systems. The gene products from the mRNA are created in vitro and are usually labelled through a particular amino acid. Which can then be run on a gel with another protein ladder and the size of the gene product can be determined. With this method it would therefore be possible to find out the absorbance of the product at a particular wavelength. The cells could be lysed and proteins purified and run it through an HPLC column attached to a UV detector (or mass spec if exact weight could be calculated) to see how much of the particular protein was being expressed. You could also use a GC but I think the protein would be too big for the coils and clog them up.
The only other way I can think of to determine the gene expression pattern of the different organisms is to use antibodies. These would be created by challenging a rabbit with the protein in question. The rabbit would be left for a couple of days to create antibodies in an immunological response to the foreign protein. Then the antibodies in the blood could be extracted and have a luminescent marker attached, i.e. horseradish peroxidase. Now when the cells are plated out and exposed to the antibody if the protein in question if being expressed then they will bind to the protein. If the cells are then exposed to luminol the cells expressing the gene light up due to the chemiluminescence with the modified antibody.
All the above methods are ways of detecting the gene products and hence expression patterns of the organisms in question. I have not mentioned the use of deletion sequences to determine the promoters and activators of the genes because in eukaryotes there is usually fatal damage done to the embryo if these sequences are disrupted, much more than prokaryotes.
Bibliography
Watson (1992) Recombinant DNA 2nd Edition
Williams (1993) Genetic Engineering
Alberts (1994) Molecular Biology of the Cell 3rd Edition
Lodish (1995) Molecular Cell Biology 3rd Edition
Purves (1998) Life The Science of Biology 5th Edition
Brown T.A (1998) Gene Cloning An Introduction 3rd Edition