1994 — 2000 |
Warnow, Tandy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Young Investigator: Computational Problems in Evolutionary Tree Construction @ University of Texas At Austin |
0.915 |
1994 — 2001 |
Warnow, Tandy Roos, David Ewens, Warren [⬀] Searls, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Statistical and Computational Methods For Data Management and Analysis in Molecular Genetics @ University of Pennsylvania
9413215 Ewens This award provides support for an interdisciplinary training program in computational aspects of modern genetics. Topical areas include algorithm development, molecular evolution, sequence alignment, statistical inference, and other aspects of biological informatics. These areas are central to one of the most multidisciplinary approaches to modern biology. The faculty group includes 19 investigators from the departments of computer and information science in the school of engineering, the departments of mathematics and biology in the school of arts and sciences, and the departments of genetics and human genetics in the school of medicine. The group includes junior and senior investigators, several of international reputation. The training program will center around 8 new or existing core, additional new elective courses, a journal club and seminars. Students are expected to enter with backgrounds in computer science, mathematics or biology. Training will emphasize molecular biology, and genetics, computer science, statistics and probability, but will be tailored to the needs of individual students and an optional industrial internship program is planned. This award is also being supported by the Computational Biology Activity and by the Database Activity in the Biological Science, both in the Division of Biological Instrumentation and Resources. ***
|
0.951 |
2000 — 2003 |
Warnow, Tandy Vin, Harrick (co-PI) [⬀] Burger, Doug Keckler, Stephen (co-PI) [⬀] Dhillon, Inderjit (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Instrumentation @ University of Texas At Austin
EIA-9985991 Doug Burger University of Texas-Austin
CISE Research Instrumentation
The Department of Computer Science at the University of Texas will purchase a cluster of high performance, multigranular workstations, dedicated to support research in computer and information science and engineering. The equipment will initially be used to support four research projects in the department.
The first project will use the cluster for detailed evaluations of future high-performance microprocessor designs. The second project will use the cluster to improve algorithms for reconstructing large evolutionary trees, with applications in evolutionary biology, pharmaceutical design, and linguistics. The third project will use the cluster to improve I/O -intensive data mining simulations using numerical techniques. The fourth project will develop software support for scientific cluster-based computing, including support for effective heterogeneous job scheduling, low-overhead application fault-tolerance, and run-time dynamic load balancing.
The cluster will consist of heterogeneous machines, with varied quantities of memory, disk, and processors per machine. We will support several types of jobs submitted through a common interface: uniprocessor jobs, shared-memory parallel jobs, PVM message-passing jobs, and distributed-shared memory jobs. Our long-term goal is to build a centralized but scalable resource that can meet the needs of numerous scientific workloads concurrently.
|
0.915 |
2001 — 2009 |
Warnow, Tandy Bull, James (co-PI) [⬀] Jansen, Robert (co-PI) [⬀] Hillis, David (co-PI) [⬀] Linder, C. Randal |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Itr/Ap Reconstructing Complex Evolutionary Histories @ University of Texas At Austin
EIA-0121680 Warnow, Tandy J University of Texas at Austin
Collaborative Research: ITR/AP: Reconstructing Complex Evolutionary
Reconstruction of the evolutionary history of a group of organisms has changed the face of biology and is being used increasingly in drug discovery, epidemiology, and genetic engineering. Unfortunately, such reconstructions typically involve solving difficult optimization problems, so that even moderately large datasets can require months to years of computation. In addition, almost all evolutionary reconstructions presently assume that the historical pattern is one of strict divergence that can be represented by a binary tree. This assumption is frequently violated, especially by plants which often hybridize readily and thus produce networks of relationships.
This project brings together computer scientists and biologists from two institutions to develop new models and algorithms to address these two problems. Successful completion of this project will have an enormous impact by providing tools for reconstructing phylogenies of large datasets, and the first tools for inferring network models of evolution appropriate to hybridizing speciation. Such network models will alter how biologists think about speciation, while the development of methods for large-scale analyses will strongly benefit medical and pharmaceutical practice. Information technology will be advanced in fundamental ways as well, as the project will demonstrate how algorithm design and high-performance algorithm engineering can jointly solve very difficult discrete optimization problems.
|
0.915 |
2001 — 2007 |
Warnow, Tandy Jansen, Robert [⬀] Raubeson, Linda |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Comparative Chloroplast Genomics: Integrating Computational Methods, Molecular Evolution, and Phylogeny @ University of Texas At Austin
0120709 Jansen, Raubeson, and Warnow A Biocomplexity grant has been awarded to an interdisciplinary team of researchers from the University of Texas, Central Washington University, University of New Mexico, the DoE Joint Genomics Institute, and Penn State University to undertake comparative evolutionary analyses of complete chloroplast genomes from more than 50 representative land plants. The team of four biologists (Jansen, Raubeson, Boore and dePamphilis) and five computer scientists (Warnow, Moret, Bader, Sankoff and Miller) will address a number of important issues in three areas at the intersection of biology and computer science: phylogeny of land plants, chloroplast genome evolution, and computational genomics. Fifty-five complete genomic sequences will be generated (greatly augmenting the 10 or so now known), new computational approaches for examining relationships using genomic data will be designed and implemented, and bioinformatic tools and resources for genomics will be developed. Then, the data and approaches will be used to study the relationships of plants and the patterns and processes of mutation as they affect the chloroplast genome. These results will be made available to both the scientific and lay communities. In addition, students in the fields of computational biology, bioinformatics, phylogenetic analysis, and genomics will be trained. Understanding relationships among organisms is an essential prerequisite for all areas of Biological Science, including such diverse fields as ecology, evolution, forensics, medicine, and molecular biology. Land plants, the focus of this study, include over 300,000 species and form the basis of terrestrial ecosystems. The phylogenetic history of this important group of organisms, only imperfectly understood, will be clarified by this research. This project also will make major contributions to our understanding of the mutational mechanisms and evolutionary processes acting within the chloroplast genome. This genome contains genes essential to plant function; studying its evolution should provide basic information of fundamental importance to plant scientists. Finally, this project will have important implications for computational biology, one of the fastest growing fields of science today. This includes the development and testing of new algorithms in comparative genomics, such as gene-order changes, that will increase the scope of theoretical computational biology. All software developed by the team will be made freely available.
|
0.915 |
2001 — 2008 |
Jansen, Robert (co-PI) [⬀] Hillis, David [⬀] Warnow, Tandy Gutell, Robin Linder, C. Randal |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Igert: Computational Phylogenetics and Applications to Biology @ University of Texas At Austin
Phylogenetics , the study of the relationships among genes, individuals, populations, and species, forms the basis for all of comparative biology. This IGERT grant will support a comprehensive, interdisciplinary graduate training program in Computational Phylogenetics and Applications to Biology. The program involves 27 faculty participants from the computational and biological sciences at the University of Texas at Austin, and it will support 12 graduate trainees each year for five years. Two major research areas will be emphasized: computational phylogenetics and applied phylogenetics. Phylogenies provide a fundamental framework for all of biology, and present the computational scientist with many technical challenges. Computational phylogenetics is concerned with the computational aspects of phylogenetic inference, and applied phylogenetics uses estimated phylogenies to address a wide diversity of biological questions. The training program will involve a series of new and existing courses and seminars, a summer training program for students from underrepresented areas of science, co-advisement of each graduate student by one computational and one biological faculty participant, placement of students into well-established research groups in biology and computer science, participation in spring recruitment conferences and fall phylogenetics retreats, and opportunities for internships in the bioinformatics industry, national laboratories, and non-government organizations. The goals of this project are: (i) design and implement an interdisciplinary training curriculum for graduate students across computational and biological sciences that prepares students to understand and contribute to both sides of computational biology; (ii) stimulate interdisciplinary graduate research and interdisciplinary interactions in general between computational scientists and biological scientists that will lead to development and testing of novel approaches to unsolved problems in phylogenetics and their application to problems in biology; (iii) prepare trainees for their careers beyond graduate school and help them achieve visibility in the larger research community; and (iv) evaluate and improve the program in computational and applied phylogenetics to ensure its success beyond the proposed IGERT project. This program will create a unique collaborative environment for graduate students and faculty from the computational and biological sciences.
IGERT is an NSF-wide program intended to meet the challenges of educating Ph.D. scientists and engineers with the multidisciplinary backgrounds and the technical, professional, and personal skills needed for the career demands of the future. The program is intended to catalyze a cultural change in graduate education by establishing new, innovative models for graduate education and training in a fertile environment for collaborative research that transcends traditional disciplinary boundaries. In the fourth year of the program, awards are being made to twenty-two institutions for programs that collectively span all areas of science and engineering supported by NSF. The intellectual foci of this specific award reside in the Directorates for Biological Sciences; Computer and Information Science and Engineering; and Education and Human Resources.
|
0.915 |
2001 — 2005 |
Amenta, Annamaria (co-PI) [⬀] Warnow, Tandy Hillis, David (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
`Itr/Ap: Collaborative Research: Exploring the Tree of Life @ University of Texas At Austin
0121682 and 0121651 Amenta, Hillis, and St. John Defining and understanding the evolutionary relationships among species is fundamental to contemporary biology and the application of the comparative method in the life sciences. The results of such evolutionary research can be represented by a branching sequence of relatedness among species known as a phylogeny. Because of the geometric resemblance of a phylogeny to the branches of a tree, a phylogeny can be thought of as a tree of life. The proposed collaborative research by biologists and computer scientists at University of Texas-Austin and at CUNY-Lehman College in New York will provide specialized visualization and data mining tools to facilitate creation of a "Tree of Life" for all living organisms on the earth. This includes the development and refinement of algorithms to visualize and analyze multiple complex data sets for large numbers of species. More specifically, this project will: (1) integrate biological data through visualization and clustering techniques developed by computer scientists, and (2) apply these tools to taxa which comprise very large numbers of species with topologically complex and varied tree structures. The interdisciplinary team of biologists and computer scientists will integrate their newly developed software with existing computational tools in systematic biology, and make them freely available to and easily used by the scientific community. The project involves substantive efforts to provide undergraduates and students from under-represented groups with the opportunity to collaborate with scientists throughout the academic year and summer.
|
0.915 |
2003 — 2009 |
Warnow, Tandy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Collaborative Research, Algorithms For Inferring Reticulate Evolution in Historical Linguistics @ University of Texas At Austin
With National Science Foundation support, Dr. Tandy Warnow and Dr. Donald Ringe will conduct three years of linguistic research aimed at recovering the evolutionary history of families of languages. Traditional models and methods assume that languages evolve in a bifurcating manner, in which case trees are appropriate graphical models of language evolution. Earlier work by the project leaders Warnow and Ringe (also funded by NSF) produced accurate computational methods for inferring evolution when the evolution of the languages is tree-like and led to advances in the understanding of how the Indo-European family evolved. However, a full resolution of the evolution of Indo-European (and of other language families) requires methods and models that can reconstruct evolutionary histories that are not treelike.
The project combines research in computer science algorithms, statistical inference and modeling, and historical linguistics to produce effective means for reconstructing evolutionary histories of language families. It also draws on related work in molecular evolution. The project thus impacts several fields, and it will bring novel techniques and research methods to the field of historical linguistics. The open source software tools developed during this project will impact other fields as well, and in particular will help population geneticists and anthropologists resolve questions related to human origins and migrations. The project will include workshops in historical linguistics and will thus provide training in this multidisciplinary area to a broad group of researchers.
|
0.915 |
2007 — 2014 |
Warnow, Tandy Pingali, Keshav (co-PI) [⬀] Linder, C. Randal |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Large-Scale Simultaneous Multiple Alignment and Phylogeny Estimation @ University of Texas At Austin
In this project, a team of investigators will develop new algorithms and software to simultaneously align DNA sequences and reconstruct phylogenetic trees. This methods and theory-oriented project addresses an important problem in phylogenetic reconstruction: relatively poor performance of existing tools in the face of insertions, deletions, and duplications in large datasets. This project will develop a simultaneous approach to DNA sequence alignment and phylogenetic analysis that will allow researchers to overcome these problems. Specific goals for the project will be to develop a portal and open-source software for simultaneous alignment and phylogenetic analysis, develop new simulators to model DNA sequence evolution, establish a working group on alignment methods with the Assembling the Tree of Life (AToL) community, and develop training programs in alignment and phylogeny estimation with outreach activities to minority institutions. The project includes many members of the Cyberinfrastructure for Phylogenetic Research (CIPRES) project and will provide significant new analytic capabilities for that data resource.
By making simultaneous alignment and phylogenetic analysis feasible for very large datasets, this project will provide software tools that will serve a broad community of researchers conducting phylogenetic analyses of DNA sequence data. These tools will enable consideration of DNA regions for phylogenetic analysis that cannot be aligned using existing tools. An open-source, portal interface will open multiple sequence alignment and tree-building to a broader range of users and engagement of existing AToL users will provide input and evaluation early in the software development process.
|
0.915 |
2011 — 2016 |
Pingali, Keshav (co-PI) [⬀] Warnow, Tandy Linder, C. Randal |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Novel Methodologies For Genome-Scale Evolutionary Analysis of Multi-Locus Data @ University of Texas At Austin
Rice University, the University of Michigan, and the University of Texas at Austin are awarded collaborative grants to develop and implement algorithms and software tools for the analysis of gene genealogies and inference of species phylogenies from them. A gene genealogy, also known as gene tree, models how genes replicate and get transmitted from one generation to the next during evolution. A species phylogeny models how species arise and diverge. A species phylogeny is traditionally inferred by a three-step process: (1) a genomic region from the set of species under study is sequenced; (2) a "gene tree" is inferred for the genomic region; and, (3) the gene tree is declared to be the species tree. However, recent evolutionary genomic analyses of various groups of organisms have demonstrated that different genomic regions may have evolutionary histories that disagree with each other as well as with that of the species. Further, evolutionary processes such as horizontal gene transfer, result in network-like, rather than tree-like, species phylogenies. This joint project will develop accurate computational methods for determining the causes of gene tree discordance, and inferring species phylogenies (trees as well as networks) from gene trees despite their discordance. Special emphasis will be put on the efficiency of the methods so that they allow for analysis of genome-scale data sets. All methods will be implemented and extensively tested for performance.
All methods developed will be made publicly available in software packages that we have been developing in the respective groups. The material will be integrated into courses that the PIs regularly teach at their respective institutions. Last but not least, the project will culminate with a two-day workshop, open to students and post-doctoral fellows from around the country, with presentations by the investigators on the methodologies developed, as well as hands-on tutorials on using the tools in analyzing data.
|
0.915 |
2015 — 2018 |
Warnow, Tandy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Abi Innovation: New Methods For Multiple Sequence Alignment With Improved Accuracy and Scalability @ University of Illinois At Urbana-Champaign
Multiple sequence alignment (MSA) is one of the most basic bioinformatics steps, in which a set of molecular sequences (i.e., DNA, RNA, or amino acid sequences) are arranged inside a matrix to identify corresponding positions. MSA calculation is a fundamental first step in many biological analyses. Because of its broad applicability and importance, many MSA methods have been developed and are in wide use today. Unfortunately, many real world biological datasets have features (large size and fragmentary sequences, for example) that make accurate MSA calculation very difficult. Because poorly estimated alignments result in errors in downstream biological analyses, new MSA techniques are needed that can produce accurate alignments on difficult datasets. This project will develop MSA methods with greatly improved accuracy, and that can analyze the large and heterogeneous sequence datasets being assembled in different biology projects nationally. The project also has a substantial outreach component to women's colleges and minority serving institutions, and summer software schools to train biologists in the use of the project software.
Multiple sequence alignment (MSA) and phylogeny estimation are two very basic bioinformatics problems, which sit at the intersection of machine learning, statistical estimation, and evolutionary and structural biology. MSA has particular importance in constructing evolutionary trees, understanding the function and structure of proteins, detecting interactions between proteins, and even genome assembly. Large-scale MSA and phylogeny estimation also require high performance computing and parallel algorithms, in order to provide adequate scalability. The team will develop new machine learning techniques to greatly improve MSA methods, and hence also phylogeny estimation, since it depends on accurate multiple sequence alignments. The core of this project is algorithm development, utilizing a variety of machine learning techniques (including Hidden Markov Models), statistical estimation methods (especially Bayesian MCMC and maximum likelihood), and novel algorithmic strategies, all focused on improving scalability and accuracy. More information about the project can be found at: http://tandy.cs.illinois.edu/MSAproject.html
|
0.915 |
2015 — 2019 |
Chekuri, Chandra (co-PI) [⬀] Warnow, Tandy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Aitf: Full: Collaborative Research: Graph-Theoretic Algorithms to Improve Phylogenomic Analyses @ University of Illinois At Urbana-Champaign
Understanding the history of life on earth ? how species evolved from their common ancestor ? is a major goal of biological research. These evolutionary trees are very hard to construct with high accuracy, because nearly all of the most accurate approaches require the solution to computationally hard optimization problems. Furthermore, research has shown that the evolutionary tree for a single gene can be different from the evolutionary tree for the species, and current methods do not provide adequate accuracy on genome-scale data. As a result, large evolutionary trees, covering big portions of ?The Tree of Life?, are very difficult to compute with high accuracy. This project will develop methods that can enable highly accurate species tree estimation. The key approach is the development of novel divide-and-conquer strategies, whereby a dataset is divided into overlapping subsets, species trees are constructed on the subsets, and then the subset species trees are merged together into a tree on the full dataset. These approaches will be combined with powerful statistical estimation methods, to potentially transform the capability of evolutionary biologists to analyze their data. This project will also provide open source software for the new methods that are developed, and provide training in the use of the software to biologists at national meetings. The project will also contribute to interdisciplinary training for two doctoral students, one at Illinois and one at Berkeley, and course materials for computational biology will be made available online.
Understanding evolution, and how it has operated on species and on genes, is a major part of biological data analysis. Statistical estimation approaches often provide the best accuracy, but cannot scale to dataset sizes that are required for modern biology. In addition, species tree estimation is challenged by the heterogeneity of evolutionary trees across the genome, and no current methods are able to provide highly accurate species trees for genome-scale data. These challenges make it essential that new methods be developed in order to make highly accurate large-scale evolutionary tree estimation possible under these complex evolutionary scenarios. This project will develop novel algorithmic strategies to address three key problems: supertree estimation, species tree estimation in the presence of gene tree heterogeneity, and scaling statistical methods to large datasets. In addition to developing graph-theoretic algorithms, the project team will establish mathematical guarantees for these methods using chordal graph theory and probabilistic analysis, under stochastic models of gene and sequence evolution.
|
0.915 |
2015 — 2019 |
Gropp, William (co-PI) [⬀] Warnow, Tandy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Af: Medium: Collaborative Research: Scalable and Highly Accurate Methods For Metagenomics @ University of Illinois At Urbana-Champaign
Metagenomic studies of microbial communities can generate millions to billions of sequencing reads. The assignment of accurate taxonomic labels to these sequences is a critical component in many analyses, but is complicated by the fact that the majority of the organisms found in environmental or host-associated communities cannot be easily cultured in a laboratory. Even among the organisms that can be cultured, relatively few have been sequenced, even partially. Thus, many commonly encountered organisms are largely absent from existing databases of known genomes and genes. Providing taxonomic labels to metagenomic sequences, thus, requires extrapolating the knowledge contained in sequence databases to previously unseen DNA strings. Simple similarity-based approaches (e.g., picking the best database hit as the best guess at the taxonomic label) have been shown to be insufficiently accurate, leading to the development of more sophisticated methods. Further developments are necessary to handle the characteristics of emerging sequencing technologies, such as high error rates with large numbers of insertions and deletions. To date, metagenomic taxon identification methods have been evaluated with respect to their ability to estimate the distribution of bacterial taxa (species, genera, families, etc.) within a metagenomic sample. Yet, different scientific and clinical settings may require specific types of analyses, and this one type of evaluation may not be the most appropriate for all settings. For example, in a clinical setting the most important question may be to detect whether a specific pathogen is present, while in a scientific setting the most interesting question may be to be able to determine if an observed read comes from a never-been-seen-before species. New evaluation strategies must be developed that specifically target the specific needs of the application domain. All the methods developed in the project will be made into open-source software that is freely available to the scientific public. Researchers will provide training activities each year with funds available to students and postdocs from around the country, and an outreach program to minority serving institutions and women?s colleges. A summer REU program will also be provided at the University of Maryland, College Park.
The team will develop a new framework for integrating the formal definition of biological use-cases with evaluation datasets and metrics in order to ensure the software being developed adequately addresses the needs of the end-users. Second, they will develop new approaches for marker-based taxon identification and abundance profiling that can leverage multiple sources of information (e.g., multiple markers) as well as handle the high error rates of third-generation sequencing technologies. These approaches will build upon experience developing TIPP - a taxonomic profiling package recently published by the team that outperforms the leading metagenomic taxonomic profiling software, in particular for novel sequences, or for longer, high-error sequences. Finally they plan to develop high-performance computing implementations of these methods in order to enable rapid analysis of sample. Speed of analysis is particularly important in clinical settings where medical treatments may depend on the rate at which the method can return an analysis. Speed is also important in non-medical applications where faster analyses enable researchers to perform deeper or broader analyses of microbial communities.
|
0.915 |
2020 — 2023 |
Warnow, Tandy Peng, Jian |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iibr Informatics: Advancing Bioinformatics Methods Using Ensembles of Profile Hidden Markov Models @ University of Illinois At Urbana-Champaign
Many steps in biological research pipelines involve the use of machine learning models, and these have become standard tools for many basic problems. Elaborations on basic machine learning models ("ensembles" of machine learning models) can provide improvements in accuracy compared to standard usage, for various biological questions. However, the design of these ensembles has been fairly ad hoc, and their use can be computationally intensive, which reduces their appeal in practice. This project will advance this technology by developing statistically rigorous techniques for building ensembles of machine learning models, with the goal of improving accuracy. The project will also develop methods that use these ensembles for new biological problems, including protein structure and function prediction. Broader impacts include software school, engagement with under-represented groups, and open-source software. Profile Hidden Markov Models (i.e., profile HMMs) are probabilistic graphical models that are in wide use in bioinformatics. Research over the last decade has shown that ensembles of profile HMMs (e-HMMs) can provide greater accuracy than a single profile HMM for many applications in bioinformatics, including phylogenetic placement, multiple sequence alignment, and taxonomic identification of metagenomic reads. This project will advance the use of e-HMMs by developing statistically rigorous techniques for building e-HMMs with the goal of improving accuracy and improving understanding of e-HMMs, and will also develop methods that use e-HMMs for protein structure and function prediction. Broader impacts include software schools, engagement with under-represented groups, and open-source software. Project software and papers are available at http://tandy.cs.illinois.edu/eHMMproject.html.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |