2014 — 2017 |
Steel, Mike Mcmahon, Michelle Zwickl, Derrick Sanderson, Michael [⬀] Stamatakis, Alexandros |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Terraces, Large Phylogenetic Trees, and Trait Evolution
Scientists are using massive data sets of DNA sequences and new computing technologies to place all 2 million biological species into a grand synthesis, the evolutionary tree of life. Evolutionary trees permit predictions about the traits that organisms possess based on traits present in closely related species. This has important practical benefits in areas like medicine, conservation biology, and crop improvement. Although building small trees from DNA from just a few genes is straightforward, many obstacles remain to scaling up this effort to all life. One of the most serious is missing data. Under some conditions, even a few gene sequences missing for a handful of species can produce evolutionary trees that lack detailed resolution. This grant is a collaboration among biologists, mathematicians and computer scientists to understand the circumstances in which missing data cause problems, and to develop methods to overcome them. The research will generate mathematically provable results and new software to help biologists build more reliable large evolutionary trees. These products will be tested with real world data sampled from flowering plants, one of the most diverse branches of the tree of life, with over 250,000 species. The high quality evolutionary trees generated because of this project will provide a societal need by providing clear evolutionary knowledge to inform conservation planning and prioritization. The research team will organize a workshop for graduate students around the country to train them in the use of these new tools, and will develop educational materials for high school students involving computer visualization of the tree of life.
The construction of very large phylogenetic trees from sequence data mined from databases can be challenging because of the recently discovered problem of terraces--potentially vast regions in "tree space" in which all trees have precisely the same optimality score due to missing data. This research focuses first on developing a better conceptual understanding of four problematic impacts of terraces on large tree construction: (i) increased ambiguity, (ii) biased confidence assessments obtained from bootstrapping or Bayesian posterior probabilities, (iii) impediments to tree search algorithms, and (iv) downstream effects on comparative inferences that rely on trees. Next, the research will develop analytical methods and software implementations to overcome these problems. Finally, it will test these new methods by applying them to 27 large-scale phylogenies newly constructed within flowering plants, each with 1000+ species, examining trait evolution, as an exemplar of downstream comparative inference, in a subset of these.
|
0.915 |