There is a limitation in morphology to the number of characters that can be studied before the characters become too specific. Studies have shown that increasing the amount of characters provides a higher accuracy for the construction of a tree [2]. There is a point where additional morphological characters add little extra value (due to specificity) and the important factor is the quality of the data rather than the quantity. This limitation is compounded by the number of characters that are subject to homoplasy (similarity due to parallel or convergent evolution); this is especially prevalent when investigating the relationships between land plants. Gnetales were originally thought to be related to Angiosperms due to their net like venation, vessels in wood and possession of a precursor to a flower. This was rejected as it was discovered that vessels arose independently a few times in plants and the precursors to flowers were the Amentiferae. Therefore morphology as a basis for the construction of phylogenetic trees is a very low resolution and highly contestable method.
DNA sequencing provided a high resolution, reliable method for the construction of phylogeny. DNA sequencing has many advantages over the previous method and was able to avoid the problems inherent to morphology. Character conceptualisation is rendered more straightforward for molecular data than for morphological data [2]. The characters used to provide classification are numerous, because they can be any nucleotide sequence that shows a steady rate of evolution. Due to the chemical nature of the analysis the characters studied are well defined making the genetic (rather than phenetic) analysis objective. There is no ambiguity that the unit of comparison is the nucleotide and that adenine thymine guanine and cytosine represent different versions of the same entity. Subjectiveness concerning the data is minimal as the nucleotides are of a set length with a specific sequence to base comparisons on. Originally 5S rRNA was used to create phylogenetic trees but the data proved to be inconclusive, the character was only 120 bases long and most of them were uninformative. This lead scientists to look at larger subnits, which provided the higher resolution needed, but as seen with morphology the best results were to be obtained from multiple character analysis. The introduction of multigene analysis has greatly increased the detail of the trees produced. Most DNA sequence analysis now includes upwards of 4 genes and takes them from different plant genomes (e.g. plastid and nuclear). An example would be the use of plastid genes rbcL, atpB, rps4 and nuclear small subnit ribosomal DNA to elucidate the basal elements of plant phylogeny [3].
Homoplasy s avoided in sequence analysis because in the majority of cases in eukaryotes there is a linear inheritance of DNA from parent to offspring, meaning that characters must be related to that lineage of plants and there is very little statistical probability that an exactly comparable character could of arose independently in another lineage, even in 5S rRNA the chances of an exact match are 4120 and even if an exact match is not required the numbers are still astronomical. Another advantage of the heritable traits of DNA is that specific markers exist which are common to all plants, the characters are not reliant on being expressed in the phenotype, which allows for analysis across families and of loss of function (the gene will still be present but inactivated).
There are however problems regarding the collection of DNA sequence data. The main problem is that DNA at the moment can only be obtained from extant species; the implications of this will be discussed later. Obtaining ancient DNA from fossilised tissue is very difficult and prone to experimental contamination. 1990 was the first occurrence of ancient DNA sequence data (Goldberg et al) obtained from plant fossils 15-20 MYA. Although sequences had been identified the microscopic amounts of DNA were subject to PCR product contamination and the results were disputed. Smith [4] created these guidelines for the acceptance of ancient DNA data;
- Amplification products should make sense.
- Other associated biomolecules should be well preserved.
- Results should be replicated by another independent lab.
So far molecular studies of subfossil material have served only to confirm the placement of extinct taxa and not influenced the relationships of extant species. This is a section where morphological analysis has the advantage, as it is able to provide a much denser taxon sampling than sequence analysis and so provide much more information about the past. This is especially useful where the group has undergone a reduction of the number of species that are extant.
There are certain advantages to the processing of the data that DNA sequence analysis provides over that of morphology. In order to draw any phylogenetic tree the relationships between the different subjects must be assessed. Morphological characters have to be transformed from description into numerical values to be processed on a computer if any complex analysis is to be undertook. Scotland [1] stated that there were currently 9 different coding strategies for translating observations into discrete numerical codes for morphological cladistic analysis. This is fraught with error and even if the most accurate method is chosen the conversion from a subjective continuous variable to a discrete one cannot be entirely accurate. DNA sequencing provides a direct numerical code for analysis of the characters based on the nucleotides (A,C,G,T = 0,1,2,3). There is no error possible in the conversion, except for human and even then the current levels of automation remove that.
Once the data has been converted into a format ready for analysis the algorithms themselves used for the construction of the trees are subject to scrutiny. Similar methods for analysing and creating the trees are used for both morphological characters and DNA sequencing, such as bootstrapping, Maximum Likelihood Maximum Parsimony and the only advantage that DNA sequence data has here is that the algorithms can be smaller as the characters have the same numeric basis, leading to faster construction of trees and the data is much more reliable. This does not mean that the process is perfect though for DNA, due to the massive amount of data generated current analytical methods are inadequate, and new algorithms are need for interpreting and analysing changes in characters are needed [5]. Studies by Pryer [3] show that discrepancies can arise through the use of different algorithms. When Maximum likelihood was used Gymnosperms were resolved as monophyletic and Gnetum was a sister to pinus. However when maximum Parsimony (preferring the scheme that has the fewest state changes) resolved Gnetum as basal among seed plants and all other Gymnosperms as monophyletic and sister to Angiosperms. This may appear pedantic but knowledge of the relationships to this detail is vital for the accurate construction of a phylogenetic tree. Output also depends on the weighting each individual character is given during analysis, opening an area for subjectiveness in sequence analysis. Without the downweighting of the 3rd codon position of rbcL (due to apparent saturation), analysis of bryophyte phylogeny would of been inconclusive [6].
DNA sequence data is very good at providing the information regarding the composition of a phylogenetic tree, but where it falls down is how all this information is arranged on the tree, on a rooting and temporal basis. The disadvantage regarding rooting is best describe thought he use of an example [1]:
The DNA sequence data places the conifers, angiosperms and the Gnetales certain distances from each other based on the divergence of their genes. However the DNA sequence data does not provide the information regarding the root (location of divergence), leading to the possible construction of two different phylogenies.
In order to accurately assess the rooting position an outgroup is needed, a subject that is related to all three but diverged from all of them a long time ago in order to provide a “template” in order to refer the changes in sequence to, the one with the least parsimony is the oldest. Hanseh, from whom this diagram is taken, used the Liverwort Marchantia, a distant relative with 450 MYA divergence. The problem with using outgroups is that they are all subject to long branch attrition artefacts, reducing reliability, and the choice of outgroup, being subjective.
Because DNA sequencing can only be reliably used on extant species the construction of the tree runs into another obstacle. Accurate assessment of times of divergence and rooting is nearly impossible. The use of the molecular clock does provide some guidelines, but it assumes a constant rate of nucleotide substitution and evolution, something that is very rarely checked for and often assumed [5], negating the effects of rapid evolution and extinction. This is a major source of inaccuracy in the phylogeny of land plants based on DNA sequence data.
Fossil data based on morphological analysis provides the answer to many of these problems. Fossils provide direct evidence for the rapid divergence and mass extinction processes, allowing absolute temporal calibration of the molecular clock. They also provide information that can be used to “fill in the gaps” where DNA sequence data fails to concerning the branching of the tree if outgrouping is unsuccessful, resolving conflicts at the nodes. This allows for the correct rooting of groups, removing long branch attrition artefacts that have to be based on extant species. The ability to analyse extinct species provides key information regarding the topology of the tree constructed and larger data sets across more taxa leading to a more detailed tree.
Conclusions
DNA sequence data has not removed the need for morphological assessment of the relationships between land plants, but it has provided a new route of analysis which provides much higher resolution and statistically robust data from which new hypothesis can be drawn from. It is reassuring to know that molecular analysis has not upset the overall topology of the tree, in fact it has re-affirmed conclusions regarding groups and families drawn up on morphological characters. Although morphology is limited in its robustness it does provide the key framework to which all DNA sequence analysis is hung from and the data obtained serves to augment and refine the work already provided by 300 years of study.
Bibliography
-
Donoghue, M.J. & J.A. Doyle. 2000. Seed plant phylogeny: demise of the anthophyte hypothesis? Current Biology 10:R106-R109
- Scotland, Olmstead & Bennett (unpublished) Role of Morphology
-
Pryer, K.M., et al. 2001 Horsetails and ferns are a monpphyletic group and the closest relatives to seed plants. Nature 409:618-622
-
Smith, A.B. 1998 What does morphology contribute to systematics in a molecular world? Molecular Phylogenetics and Evolution 9:437-447
-
Soltis, P.S & D.E Soltis 2001 Molecular systematics: assembling and using the tree of life. Taxon 50(3):663-678
-
Nickrent, D.L et al. 2000 Multigene phyologeny of land plants with special reference to Bryophytes and the earliest land plants. Mol Biol Evol 17(12):1885-1895
-
Doyle, J.A. & M.J. Donoghue 1987. The importance of fossils in elucidating seed plant phylogeny and macroevolution. Review of Paleobtnay and Palynology 50:63-95
-
P.S. Soltis et al 1998 Angiosperm phylogeny inferred from multiple genes as a tool for comparative Biology. Nature 402:402-409