2006 — 2008 |
Hahn, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Comparative Genomics of Gene Family Evolution
In his recent NSF Postodoctoral Research Fellowship, the PI has developed a stochastic birth and death model for gene family evolution. Using the genomes of five yeast species, the PI has shown that the model can be efficiently applied to multi-species genome comparisons. In the current project, the PI will use the model to identify large-scale patterns in genome evolution and make stronger inferences regarding the role of natural selection in gene family expansion or contraction. The goals of this project are to extend the statistical methodology available for making inferences using the birth and death model. The project will include both improvements to the computational and statistical methods used to study gene family evolution and the analysis of an ever-growing number of genomes. The project will result in the production of a freestanding software package that the investigator will make available to the research community. The PI has just very recently moved to his new position as assistant professor, so has not yet begun training students.
|
0.915 |
2006 — 2010 |
Hahn, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Statistical and Computational Methods For the Study of Gene Families
Indiana University is awarded a grant to develop a statistical framework that would allow for inferences regarding gene family evolution among species. In order to take full advantage of the data being produced by various genome sequencing projects, this project aims to extend the statistical and computational tools necessary for biological researchers to study gene families. There are three main goals of this proposal: 1) Development of improved statistical tools. This work will enable more refined estimates of gene duplication and deletion parameters between species, and will provide new ways in which to study gene families within single genomes. The inclusion of methods for detecting and incorporating whole genome duplications will greatly extend statistical inferences. 2) Creation of easy-to-use software. A free software package will be implemented that can be used by researchers studying whole genomes or individual gene families. Statistical tools created in this project will be quickly disseminated to the community via this package. 3) Providing annotated gene families for Drosophila. The sequencing of 12 Drosophila species will be a boon to comparative genomics studies. By working with FlyBase to provide a well-annotated set of gene families from across these species, we will present new ways for biologists to connect this information with functional and comparative genomic data. The products of this research will provide a broad statistical and computational framework for all future studies of gene families. The research will also provide a diverse training environment for undergraduates, graduate students, and postdoctoral researchers in molecular evolution, statistics, and bioinformatics. The participation of under-represented groups and women via multiple scholarship programs will ensure that this specific research priority is achieved. The research will also be used in the development of classes and programs for understanding the relationship between biodiversity and genetic variation, and in graduate education for bioinformatics students.
|
0.915 |
2009 — 2014 |
Hahn, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Computational and Statistical Genomics of Gene Families
(This award is funded through the American Recovery and Reinvestment Act of 2009: Public Law 111-5).
This is a CAREER award to support the research of Dr. Matthew Hahn in the Department of Biology and School of Informatics at Indiana University. Dr. Hahn is a third-year, tenure-track Assistant Professor. Genome sequencing projects have revealed large and frequent changes between species in the size of gene families. These changes have been shown to be responsible for morphological, physiological, and behavioral differences between species, and to contribute to much of the genetic and genomic diversity we observe in nature. To further understand the importance of these changes, researchers must be able to understand the mechanisms and modes by which gene families evolve. Despite the growing body of data on gene families, until recently we lacked a statistical framework that would allow for inferences regarding gene family evolution among species. In earlier work from his dissertation research, Dr. Hahn proposed such a framework for studying gene family evolution, and showed that it could be used for hypothesis testing, inference of ancestral states, and estimation of gene duplication and deletion rates. This project will be developing novel statistical and computational methods for studying gene families, and examining the biological mechanisms underlying gene family evolution. This work is enabling more refined estimates of gene duplication and loss rates, and will provide new ways for detecting and studying whole genome duplications. Methods for studying gene families from low-coverage genomes will also be developed. Gene duplication can distribute paralogous genes across the genome. Locations of individual genes will allow study of both within-genome and between-genome dynamics of gene families. Lineages differ in their rates of gene turnover which raises the question of how these differences come about. This research is identifying the biological factors determining observed rate variation among lineages and among individual gene families. Dr. Hahn is developing new computational models and free software, which will be available at http://www.bio.indiana.edu/~hahnlab/.
This research will contribute to many fields, including studies of gene and genome duplication to studies of gene regulation, transposable elements, genetic robustness, and RNA interference. As a part of his CAREER project, the PI is integrating knowledge from these diverse fields at high school, undergraduate, and graduate levels to inform biological reasoning and to create new lines of scientific inquiry. Further, the PI will prepare and implement a curriculum for students at a local technology-focused high school. This curriculum will integrate computers into the biology classroom by introducing the basic principles of programming alongside the basic principles of biology.
|
0.915 |
2011 — 2015 |
Hahn, Matthew Stewart, Craig [⬀] Lynch, Michael (co-PI) [⬀] Barnett, William (co-PI) [⬀] Fox, Geoffrey (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Abi Development: National Center For Genome Analysis Support
Intellectual Merit: This award to Indiana University (IU) is to establish the National Center for Genome Analysis Support (NCGAS) in partnership with the Texas Advanced Computing Center (TACC). The NCGAS will be an innovative service center (core facility) that supports the national community of NSF-funded researchers who use genome assembly software, particularly software suitable for assembly of data from next-generation sequencers; large-scale phylogenetic software; and other genome analysis software requiring large amounts of memory. This center will be a general source of software support and services that will be provided on the Mason large memory computer cluster at IU, on the TACC Gordon system, and on the San Diego Supercomputer Center Dash system. The NCGAS will provide services such as use of cluster-based genome analysis software, storage of submitted data sets, and a repository of open source genome analysis software. Services will particularly support analyses of next-generation sequencer output for de novo assembly, metagenomic projects, and resequencing.
Broader Impacts: The NCGAS will develop innovative solutions to current needs in genome assembly and analysis. It will establish a core of experts and software tools to support research on a variety of nationally funded cyberinfrastructure systems, and will add to the suite of available systems a large memory cluster ideal for this work. By developing a community of investigators and technologists and exploring new modalities of provisioning computational resources, such as "on demand" computing, this project aspires to become a sustainable model for the ongoing, and increasing, need for sequence analysis. The NCGAS website provides up-to-date information at http://pti.iu.edu/ncgas/.
|
0.915 |
2011 — 2017 |
Moyle, Leonie [⬀] Hahn, Matthew Haak, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dimensions: Integrating Dimensions of Solanum Biodiversity: Leveraging Comparative and Experimental Transcriptomics to Understand Functional Responses to Environmental Change
Responses to environmental variation and change can be facilitated or constrained by the genetic basis of adaptations that mediate organism-environment interactions. This research aims to understand and interpret the genetics of functional adaptive biodiversity by integrating analyses at multiple levels of functional trait variation (DNA sequence, gene expression, phenotypes) with data on the environmental, ecological, and evolutionary history of an entire clade of species: Solanum section Lycopersicum (the wild tomatoes). The project focuses on two traits critical to plant-environment interactions: leaf ecophysiology (which influences plant responses to water, light, and carbon dioxide), and constitutive and induced defense responses (which influence the interaction between plants and their natural predators). Both traits differ within and among Solanum species, likely as the result of rapid adaptive change. Using 'next generation' sequencing technologies to quantify DNA sequence and gene expression differences among genotypes and environmental conditions, the project has three components: 1) Comparative transcriptomics (i.e., gene expression profiling) which will provide a evolutionary genomics framework for understanding genetic variation in the group, and identify candidate genes for important ecological transitions among species; 2) Experimental transcriptomics of gene expression responses to both benign (unstressed; noninduced) and stressed (drought-stressed; induced defense) conditions, which will identify molecular responses to ecologically-relevant environments, and evaluate the genetic constraints on current and future evolutionary responses critical to organism-environment interactions; and 3) Integration of results from this study with existing data to generate a core set of loci underpinning functional responses to abiotic and biotic environmental variation, and to develop an integrated understanding of natural adaptive trait variation across an entire group of species.
Global environmental change is expected to fundamentally alter patterns of biodiversity, but predicting the direction and magnitude of this change is extremely difficult. This research aims to understand how current biodiversity is shaped by, and reacts to, environmental variation. The sequence and trait data uncovered will contribute to understanding how plants are able to respond to and cope with stress imposed by both their physical environment and by their predators. The project will also contribute to human resources by training researchers in a broad set of skills at the interface of experimental genetics, genomics, and bioinformatics. This research focuses on the wild tomatoes, a diverse group of Andean species within the economically important genus Solanum. Uncovering the genetic basis of diversification is particularly pertinent in the Andean biological hotspot, where the impacts of land-use and climate change threaten a cradle of biodiversity that holds an estimated 12% of global flowering plant diversity. In addition, by examining the wild relatives of several important crop species, including tomato, potato, and pepper, this research has the potential to identify valuable natural variation that confers plant tolerance to critical environmental stresses.
|
0.915 |
2012 — 2015 |
Hahn, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Genome Construction in Non-Model Organisms Using Recombinant Populations
Whole genome sequence information has transformed research in traditional model organisms, and now next-generation sequencing technology is at a turning point in its application to non-model organisms. Researchers working in data-rich, but sequence-poor, organismal systems are eager to apply these technologies to their own species. However, the challenges involved in generating useful, high quality, whole genome sequences in these naive systems are substantial. Non-model systems can have many biological features that directly hamper genome assembly, and therefore suffer from limited capacity to benefit from genomic tools. This research aims to develop a new approach to genome assembly, one that combines next-generation sequencing with recombinant populations made using genetic crosses between individuals. If the full potential of the method can be demonstrated, it will bring the power of genome-enabled science within reach of a large range of non-model organisms.
A tremendous amount of biological knowledge lies hidden within the genetic make-up of every organism on earth. The secrets of these organisms potentially hold the keys to understanding the origins of biodiversity and the answers to many other biological puzzles. This research will provide new tools that can be used to unlock the genetic information contained within any species. The methods developed will therefore be of great benefit to a large number of researchers.
|
0.915 |
2016 — 2019 |
Hahn, Matthew Henschel, Robert |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Abi Development: Cafe For Very Large Comparative Genomic Datasets
Genome sequencing projects have revealed frequent gains and losses of genes between species. These changes have been shown to be responsible for morphological, physiological, and behavioral differences, and to contribute to the diversity observed in nature. Advances in sequencing technology are making new genome data available at faster rates than ever before. As the number of species with sequenced genomes grows, so will the number of researchers wanting to take advantage of these valuable resources. They will come from a wide range of biological fields, and have an equally wide range of experience with computational tools. CAFE (Computational Analysis of gene Family Evolution) is a software package that allows researchers to better understand rates of gene gain and loss. This project will result in a version of CAFE that adds to the national infrastructure by enabling new biological discoveries to the benefit of scientists working in many fields. CAFE will be a useful tool in science education, and will also improve and accelerate biological research that can be expected to have multiple societal benefits, including understanding the genetic basis for important biological phenotypes. A vigorous outreach and information dissemination plan will ensure that researchers and faculty engaged in research education are aware of CAFE and able to use it effectively, and will promote the development of a technology-savvy 21st century biology research community.
Studies of gene families are essential to a number of research areas, including gene regulation, human disease, and evolutionary genomics. CAFE enables these and other studies into cutting-edge areas by providing a likelihood method for analyzing gene gain and loss over a phylogeny. This method has been shown to work well with the error-prone genome assemblies currently available for most organisms, as well as when analyzing dozens of genomes at a time. This project will extend these capabilities to hundreds or thousands of genomes. To accomplish this goal, several of the maximum likelihood methodologies implemented by CAFE will be re-designed. These changes will include allowing rate variation among gene families, optimizing likelihood calculations on trees, and improving specification of several probability distributions used by these calculations. The quality of the code will be enhanced through best practices in software engineering and the development of better, faster, and more scalable supercomputer versions of the software. All software will be available at www.indiana.edu/~hahnlab/.
|
0.915 |