The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny
Posted on: Tuesday, 31 January 2006, 06:00 CST
By Morrison, David A
The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny.-Marco Salemi and Anne-Mieke Vandamme (editors). 2003. Cambridge University Press, Cambridge, UK. 406 pp. ISBN 0-521- 80390-X. US$75 (hardcover).
Data analysis is a tricky business for biologists, compared to the business of collecting data. For example, in laboratory work there are usually "gold standards" that have been developed, and these can be written down as an explicit protocol for anyone and everyone to follow. Even if you do not understand the protocol, you can still follow its instructions, secure in the knowledge that if you are even half-competent, then the results should be reliable. Thus, data can be collected to a high standard, in the sense that it is both accurate and precise, and this can be done reliably, in the sense that it is repeatable.
However, this happy scenario does not apply to data analysis. Once the data are reduced to numbers and letters, the scene changes dramatically. First, people become biologists because they want to cuddle koalas, not commune with computers. Therefore, there is usually less enthusiasm for analyzing data than there is for collecting it, and so less time gets spent doing it. second, and much more important, there are no "gold standards" for data analyses. This is not for lack of trying, as many readers of this journal will attest, but rather it is from the nature of the beast. Data analysis simply isn't amenable to the development of laboratory- style protocols (the compendium of Baxevanis et al., 2002-2005, notwithstanding).
There are two basic problems with data analysis. In the first place, there are competing philosophies as to how data analysis is best approached. For example, there are frequentist, permutation, likelihood, and Bayesian analyses available for almost any data- analysis problem that you care to name, and this smorgasbord of choices won't necessarily all lead to the same conclusion. In the second place, your data may not meet the assumptions of the analysis that you have chosen, and the assumption violations may be enough to lead you astray when you are interpreting the results. Consequently, data analysis is often a trial-and-error affair, as various possibilities for analysis are explored and their outcomes evaluated. When describing this approach in a published paper, it requires more than just a passing reference to "so-and-so's method," as can be done when describing an established protocol.
This leads to a problem if you are trying to produce an introductory book about data analysis. Comprehensive books can go into as much detail as the author feels is appropriate, so that for phylogenetic analysis Felsenstein (2004) can provide an overview of (almost) all of the available data-analytic approaches, and Semple and Steel (2003) can derive the mathematics from first principles. However, few novices in the field of phylogenetic analysis are likely to start by reading these two books, and gene jockeys will get a bit of a shock from the lack of protocols in either book. That leaves a niche in the book-publishing world, as far as phylogenetic data analysis is concerned.
This niche is not easy to fill. Too much in the way of protocols will lead to poor data analyses, because mere are no gold standards; and too much in the way of indepth explanation will lead to confusion on the part of someone whose expertise is biology, not mathematics. Nevertheless, various attempts have been made to provide suitable books. At one end of the market are the "bioinformatics" books, which treat phylogenetic analysis as being one aspect of the more general field pertaining to analysis of molecular data. These include the books by Westhead et al. (2002), Claverie and Notredame (2003), and Lesk (2005). As far as phylogenetic analysis is concerned, these books vary from naive to downright misleading, although Claverie and Notredame (2003) manage neatly to walk the tightrope between information and the gushing writing style favored by the Dummies series of books. However, these books cannot do phylogenetic data analysis any justice at all; and I suspect that it is reliance on these types of books that led, in a manuscript that I recently refereed, to the claim that one part of a phylogenetic tree was rooted while another part of the same diagram was unrooted. It is enough to make you weep.
At the other end of the market is the book by Hall (2004), now in its second edition, which was alone until the current volume put together by Marco Salemi and Anne-Mieke Vandamme. These should be a distinct step up from the bioinformatics books, because (almost) the whole book can be devoted to phylogenetic analysis. Somewhere in between are books like those of Page and Holmes (1998) and Nei and Kumar (2000), which cover more than phylogenetic analysis but still do good service to the topic.
The current book is essentially a compendium of the written notes for attendees at the annual workshop on virus evolution and molecular epidemiology, run at the Katholeike Universiteit Leuven. So, it is a far more practically oriented book than the others listed above, aimed squarely at the novice user of phylogenetic analysis, particularly those people dealing with molecular data. As such, it is as much a workbook as a textbook, with each topic divided into both a Theory section and a Practice section (not always by the same authors). There is a also a website, with links to the data sets and computer programs used.
Most of the chapters are written by acknowledged experts for each topic. The advantage of this approach is that an expert has an overview of the field that allows them to see which pieces of knowledge are essential and thus form the core of an understanding of the topic. The disadvantage is that there is no necessary reason why an expert should be a clear and careful communicator to the uninitiated. It is a fine balance, and unfortunately editors often have little control over which type of chapter their experts will produce. In this particular book we get both types, but on the whole the authors do a good job of communicating without misrepresentation. However, there are unfortunate lapses. To list one particularly egregious example, the authors are very inconsistent about the terminology of rooted versus unrooted trees, so that the words "cluster" and "clade" are used with several meanings. At one stage we are told that "a group of taxa that belong to the same branch have a monophyletic origin and is called a cluster," and so presumably a cluster can only be recognized on a rooted tree and is synonymous with a "clade"; and yet, later on, "clusters" are recognized on unrooted trees, and sometimes unrooted trees are referred to as having "clades." This sort of elementary error is unacceptable in a book of this nature-one should call a clade a clade.
Some parts of the book will seem familiar to knowledgeable readers, who have indeed seen them before, but mostly the chapters are not simply a re-hash of previous work but are original artefacts in their own right. Some of the Practice sections just provide practical advice for using the programs, while others provide more structured exercises. Both approaches are useful, but it does mean that the book is less coherent than it would be if it was an "authored" book rather than a "compiled" book.
The chapters are predominantly built around a particular computer program, which then defines the topic to be covered, rather than the other way around. Thus, the chapters tend to be written by the same people who wrote the particular computer program involved. This can potentially give a rather biased view of each topic, but in the Theory section of each chapter the authors generally provide a broader perspective. However, the depth and breadth of the coverage varies greatly between chapters, as does the length of the reference list, which provides the entr into the literature. Only three of the chapters cite references later than 2000, which indicates that the manuscripts long predate the publication date of the book. This affects a few of the chapters, where things have progressed since they were written (e.g., AIC might now be preferred to LRT when choosing among nucleotide models, and iterative sequence alignment would receive more emphasis), and the choice of topics might now be different (e.g., some discussion of bayesian analysis, as well as the analysis of combined data sets).
As far as topics are concerned, the expected subject matter is there. For example, there are chapters on nucleotide-substitution models (by Korbinian Strimmer and Arndt von Haeseler) and how to choose among them in practice (by David Posada). There are chapters on how to build trees via parsimony (by David Swofford and Jack Sullivan), distances (by Yves Van de Peer), and maximum likelihood (by Arndt von Haeseler and Korbinian Strimmer). They are all well written, although not all of them are for the mathematically faint- of-heart. There are also chapters on sequence databases (by Guy Bottu and Marc Van Ranst) and data exploration (by Xuhua Xia and Zheng Xie), and two chapters on proteincoding sequences (one by Fred Opperdoes and one by Yoshiyuki Suzuki and T\akashi Gojobori). The chapter on data exploration is particularly welcome, as it emphasizes the important point with which I started this review.
Phylogenetic analysis is more than just tree-building, and the book does a good job of emphasizing this idea. It contains a chapter on phylogenetic networks (by Vincent Moulton), which generalize the tree model to allow reticulations, as well as a chapter on detecting recombination (by Mika Salminen), which is one of the major causes of reticulate phylogenies. There is also a chapter on population genetics (by Mary Kuhner), which discusses the use of a phylogenetic perspective when estimating population parameters (because a coalescent tree is only a phylogenetic tree in reverse).
The issue of sequence alignment might be the most thorny one in all of phylogenetics, because it involves the fundamental problem of assigning homologies among the observed characters (and states). However, you would never know this from reading scientific papers, where the sequences are more often than not merely fed into a black box and the output uncritically accepted. In this regard, it is worthy of note that the creators of far and away the most popular black box, CLUSTAL, have always been among the most vocal at pointing out the limitations of their particular box. Indeed, they have also been active contributors to various attempts to provide better boxes. Des Higgins has been the most prominent of these people, and he summarizes some of this work in his chapter here. Unfortunately, sequence alignment is still seen as basically being a problem in mathematical optimization of some form of similarity measure. This may be a useful perspective for sequence comparison (e.g., database searches, molecular modeling, and prediction), which is the field where all of the active work on sequence alignment has arisen, but I am not convinced that it is the best approach to creating an evolutionary alignment. Homology ≠ similarity (although more than one author in this volume does not seem to realize this), and yet homology = similarity is the principle on which most sequence-alignment programs are based. For some reason, molecular phylogeneticists are pheneticists when assessing primary homology (sequence alignment) and then become cladists when assessing secondary homology (tree-building). The question is: which is Dr. Jekyll and which is Mr. Hyde?
All in all, this book is a good contribution to the niche that it tries to occupy. It is wide-ranging enough to be a worthwhile introduction to phylogenetic analysis, without getting lost in too much detail. It betrays its origins as a workshop companion, but that does not have to be a weakness, because the book can stand on its own. Nevertheless, some hands-on practical guidance by a tutor of some sort would definitely be a useful complement to the book. There are quite a few typographical errors, sometimes in the tables and figures, with bits of the figures missing or misaligned (the separate color section repeats some of these figures correctly). A second edition could presumably address these issues, along with the "cluster" and "homology" problems.
REFERENCES
Baxevanis, A. D., D. B. Davison, R. D. M. Page, G. A. Petsko, L. D. Stein, and G. D. Stormo (eds). 2002-2005. Current protocols in bioinformatics. John Wiley & Sons, Hoboken, New Jersey.
Claverie, J.-M., and C. Notredame. 2003. Bioinformatics for dummies. Wiley Publishing, Hoboken, New Jersey.
Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates, Sunderland, Massachusetts.
Hall, B. G. 2004. Phylogenetic trees made easy: A how-to manual, 2nd edition. Sinauer Associates, Sunderland, Massachusetts.
Lesk, A. M. 2005. Introduction to bioinformatics, 2nd edition. Oxford University Press, Oxford, UK.
Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, Oxford, UK.
Page, R. D. M., and E. C. Holmes. 1998. Molecular evolution: A phylogenetic approach. Blackwell Science, Oxford, UK.
Semple, C., and M. Steel. 2003. Phylogenetics. Oxford University Press, Oxford, UK.
Westhead, D. R., J. H. Parish, and R. M. Twyman. 2002. Instant notes: Bioinformatics. BIOS Scientific Publishers, Oxford, UK.
David A. Morrison, Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden; E-mail: David.Morrison@bvf.slu.se
Copyright Society of Systematic Biologists Dec 2005
Source: Systematic Biology
Related Articles
- Research and Markets: New Strategic Analysis Profile is an Essential Source for Data, Analysis and Strategic Insight into Hess Corporation
- Research and Markets: New Report an Essential Source for Data, Analysis and Strategic Insight Into COSMO OIL Co
- Research and Markets: New Profile Is an Essential Source for Data, Analysis and Strategic Insight Into Atmos Energy Corporation
- Essential Source for Data, Analysis and Strategic Insight into Santos Ltd
- New Profile an Essential Source for Data, Analysis and Strategic Insight into Valero Energy Corporation
- Research and Markets: New Air Liquide Strategic Analysis Profile is an Essential Source for Data, Analysis and Strategic Insight into This Company
- Research and Markets: Statoil ASA Strategic Analysis Report is an Essential Source for Data, Analysis and Strategic Insight into This Company
- Research and Markets: New Profile an Essential Source for Data, Analysis and Strategic Insight into Oil Search Limited
- Research and Markets: MOL Strategic Analysis Profile is the Essential Source for Data, Analysis and Strategic Insight into This Company
- Research and Markets: New Profile is an Essential Source for Data, Analysis and Strategic Insight into Energy Transfer Partners, L.P.
User Comments (1)
| 1. |
Posted by binlu on 05/18/2009, 04:12 research |

RSS Feeds