2004 — 2008 |
Li, Kai [⬀] Funkhouser, Thomas (co-PI) [⬀] Rusinkiewicz, Szymon (co-PI) [⬀] Troyanskaya, Olga |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: Software Tools For New-Generation, Display-Centric Applications
The goal of this research project is to develop new software tools and applications for scalable display systems. These primary focus is on methods that coordinate multiple displays, multiple users, and multiple applications to enable true display-centric computing. For coordinating multiple displays, the project will develop dynamic feedback to build adaptive layered multi-resolution display systems and to study how to achieve integrated, continuous calibration capable of delivering high-quality information display. For coordinating multiple users, software tools that manage information display intelligently and securely for seamless exchange of visual information will be developed. For coordinating multiple applications, the project will study how to design an adaptive infrastructure that enables multiple applications to share a scalable display efficiently.
|
0.915 |
2005 — 2009 |
Charikar, Moses (co-PI) [⬀] Li, Kai [⬀] Cook, Perry (co-PI) [⬀] Troyanskaya, Olga |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr-Pdos-Content-Searchable Storage For Feature-Rich Data
Storage capacity and data volume have been doubling every 18 months during the past two decades. A key challenging issue in building next-generation storage systems is to manage massive amounts of feature-rich (non-text) data, which has dominated the increasing volume of digital information. Comparing noisy, feature-rich data requires fast similarity match instead of exact match, and thus exploring such data requires similarity search instead of exact search. Current file systems are designed for named text files; they do not have mechanisms to manage feature-rich data. To date, there is no practical storage system with the ability to do similarity search for noisy, high-dimensional data and there is no index engine design for efficient similarity search. This research addresses this problem by studying how to design and implement a content-addressable and -searchable storage (CASS) system to manage and explore diverse feature-rich data. The system includes a built-in similarity search engine for general-purpose, noisy, highdimensional metadata using compact data structures and novel indexing methods. The research will also develop segmentation methods and feature extraction methods for audio, image and genomic data, and develop similarity search benchmarks and to evaluate the CASS system.
This research will advance knowledge and understanding in the area of storage system designs such as data structures, mechanisms, and APIs for managing, searching and exploring noisy, high-dimensional feature-rich data. The research will accelerate the development of next-generation storage systems which will revolutionize how to access, search, explore and manage massive amounts of feature-rich data in many disciplines.
|
0.915 |
2005 — 2018 |
Troyanskaya, Olga G |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Integration and Visualization of Diverse Biological Data
? DESCRIPTION (provided by applicant): The onset of most human disease involves multiple, molecular-level changes to the complex system of interacting genes and pathways that function differently in specific cell-lineage, pathway and treatment contexts. While this system has been probed by the thousands of functional genomics and quantitative genetic studies, careful extraction of signals relevant to these specific contexts is a challenging problem. General integration of these heterogeneous data was an important first step in detecting signals that be used to build networks to generate experimentally-testable hypotheses. However, only by dealing with the fact that disease happens at the intersection of multiple contexts and by integrating functional genomics with quantitative genetics will we be able to move toward a molecular-level understanding of human pathophysiology, which will pave the way to new therapy and drug development. The long-term goal of this project is to enable such discoveries through the development of innovative bioinformatics frameworks for integrative analysis of diverse functional genomic data. In the previous funding periods, we developed accurate data integration and visualization methodologies for most common model organisms and human, created methods for tissue-specific data analysis, and applied these methods to make novel insights about important biological processes. We further enabled experimental biological discovery by implementing these methods in publicly accessible interactive systems that are popular with experimental biologists. Leveraging our prior work, we now will directly address the challenge of enabling data-driven study of molecular mechanisms underlying human disease by developing novel semi-supervised and multi-task machine learning approaches and implementing them in a real-time integration system capable of predicting genome-scale functional and mechanism-specific networks focused on any biological context of interest. This will allow any biomedical researcher to quickly make data-driven hypotheses about function, interactions, and regulation of genes involved in hypertension in the kidney glomerulus or to predict new regulatory interactions relevant to Parkinson's disease that affect the ubiquitination pathway in Substantia nigra. Furthermore, we will develop methods for disease gene discovery that leverage these highly specific networks for functional analysis of quantitative genetics data. Our deliverable will be a general, robust, user-friendly, and automatically updated system for user-driven functional genomic data integration and functional analysis of quantitative genetics data. Throughout this work, we (with our close experimental and clinical collaborators) will also apply our methods to chronic kidney disease, cardiovascular disease/hypertension, and autism spectrum disorders both as case studies for the iterative improvement of our methods and to make direct contribution to better understanding of these diseases.
|
1 |
2005 — 2009 |
Schapire, Robert (co-PI) [⬀] Troyanskaya, Olga |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sei (Bio): Integrated Analysis of Heterogeneous Genomic Data For Accurate Prediction of Gene Function and Interactions Between Proteins
ABSTRACT
The objective of the proposed research is to develop a general and robust machine learning system for integrated analysis of high-throughput biological data for the purpose of prediction of gene function and protein-protein interactions. Achieving this goal requires addressing multiple challenges that include data heterogeneity, variable data quality, high noise levels in data, and a paucity of training samples. These challenges have prevented the successful application of traditional machine learning methods to diverse biological data. The research team will leverage diverse bioinformatics, machine learning, and biology expertise of the co-PIs and collaborators to develop accurate and effective approaches optimized for integrated analysis of genomic data. For prediction of protein-protein interactions, this investigation will focus on Bayesian approaches based on successful preliminary research. For gene function prediction, the focus will be on developing novel machine learning methods. These learning methods will use heterogeneous biological data as well as protein-protein interactions predicted by the system. The proposed research will lead to development of a general bioinformatics system that will utilize diverse large-scale biological data, including gene expression microarrays, physical and genetic interactions datasets, sequence and literature data, to produce an accurate map of protein-protein interactions and predictions of function for each of the proteins. This system will address the critical need in genomics to extract accurate biological information from disparate high-throughput data sources, enabling the first step in accurate and comprehensive study of cellular processes on a whole-genome level. Additionally, the proposed analysis will provide genomics researchers with quantitative rankings of the relative reliability of high-throughput experimental technologies, thereby providing biologists with data on which high-throughput technologies are more accurate than others. A significant advantage of this plan is that the research team will work closely with biologists to evaluate the predictions and feed the information back into the investigation to further improve the system and the quality of the resulting predictions.
The proposed system will provide predictions that will drive biological experimentation, enabling genome-wide annotation of unknown genes. The system will be publicly available to genomics researchers through its integration with the Saccharomyces Genome Database, a model organism database for yeast, and also via distribution of this integrated framework to other model databases. The interdisciplinary approach of this proposal will further the impact of advanced computer science on biology and will precipitate further interactions between the two fields, both through research and through interdisciplinary education.
|
0.915 |
2006 — 2013 |
Troyanskaya, Olga |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: An Integrated Approach to the Study of Biological Process Specific Networks
Princeton University is awarded a grant from the Faculty Early Career Development program (CAREER) to develop an integrated computational and experimental approach for modeling biological pathways and networks. This technology will consist of three integrated components: a computational component for generalizable, efficient, and accurate integration of diverse genomic data, an analytical component for network/pathway modeling based on the integrated data, and an experimental component for validation and feedback. The integrated analysis of diverse genomic data and experimental verification will allow iterative refinement of computational methods and lead to highly accurate network-level pathway models that can serve as a scaffold for mechanistic models of complex biological processes. The key contribution of this work is in the tight integration of computational modeling and experimental testing to create a combined approach that uses iterative refinement of predictions to improve both the models and the algorithms. The success of this integrated approach will lead to more accurate and complete models of biological processes and pathways than those created by purely computational methods, and yet it will be substantially faster than study of the same processes by experimentation alone. The interdisciplinary nature of this proposal will further the impact of advanced computer science on biology and will precipitate further interactions between the two fields, both through research and through interdisciplinary education. In concert with this research program, two graduate courses in bioinformatics will be developed. The PI will also continue to participate in development and teaching of a cross-disciplinary genomics curriculum for undergraduates in collaboration with biology, physics, and chemistry faculty at the Lewis-Sigler Institute for Integrative Genomics. Both undergraduate and graduate curricular materials developed at Princeton will be made available via the Internet. In addition, a systems biology symposium at Princeton University will be organized to catalyze collaboration among computational and experimental researchers and to introduce more students to systems biology.
|
0.915 |
2011 — 2013 |
Troyanskaya, Olga G |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Context-Sensitive Search of Human Expression Compendia
DESCRIPTION (provided by applicant): Gene expression experiments are an abundant and robust source of functional genomics data, with thousands of microarray and a growing number of high throughput RNA sequencing studies publicly available, most interrogating clinical and biological systems relevant to disease. They hold the promise of data-driven characterization of gene function and regulation, including in specific tissues, cell lines, and disease states, and can advance the understanding and modeling of regulatory changes that form the basis of human disease. However, these data remain largely underutilized, as biology researchers do not have effective tools to explore and analyze the entire data collection to generate novel hypotheses and direct experiments. The situation is similar to that of the Internet before the search engines - a biology researcher has to know a priori which datasets pertain to the biological question she is asking, reflect the tissue/cell-lineage specific signals of interest to her, and accurately measure the expression of genes related to her pathways of interest. There is a clear need for methods that will enable biology researchers to use their domain-specific knowledge to direct their exploration of public human expression data, enabling them to generate hypotheses and direct experiments addressing challenging biomedical questions. Such a system should provide users with ability to effectively explore automatically identified datasets relevant to their biological question of interest, leverage metazoan complexity including cell lineage and disease specific signals, and allow the researcher to securely include their unpublished data in the analysis. To address these challenges, this proposal describes a "Google-style" public search engine for large collections of gene expression data built using novel search algorithms and leveraging cloud-computing technologies. This system implements a novel query-based context-sensitive algorithm for search of large expression compendia that exploits the complexity of metazoan organisms, including cell-lineage complexity and disease aspects inherent to human expression studies. Furthermore, the challenge of heterogeneity in human samples will be addressed by developing novel hierarchical learning methods to predict cell-lineage or tissue-specific gene expression based on the compendium and to identify these signals in each dataset. This will enable users to explore tissue-specific expression and also will be integrated with the search algorithm to improve search accuracy. Proposed algorithms, search engine, and user interface will be extensively evaluated in close collaboration with biology researchers, and top predictions will be tested experimentally. These methods will be implemented in a user-friendly public search system that will leverage cloud computing to provide robust interactive query response and will enable biology researchers to explore both published data collections and their own pre-publication datasets in a context-specific, integrated, and secure manner. PUBLIC HEALTH RELEVANCE: We will develop a "Google-style" search engine for massive collections of human gene expression data. Our system will enable researchers to use their domain knowledge to explore the entirety of public human expression data to generate hypotheses and direct experiments addressing a diverse range of challenging biomedical questions. Public availability of our system will advance genome-level understanding of human biology and facilitate development of novel drugs, therapies, and personalized medical treatments.
|
1 |
2012 — 2016 |
Troyanskaya, Olga G |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Core F & G: Systems/Modeling & Computation @ University of Pennsylvania
A central challenge for PENTACON in identifying molecular-level causes of NSAID-specificity with respect to efficacy vs. adverse effects is the integrative computational modeling of the activity of the diverse cellular components that are perturbed upon NSAID administration. It is in these two cores that integrate our approaches to systems biology and modeling. In the Systems, Modeling and Computation Cores we describe how PENTACON will produce an outcome that is greater than the sum its component parts. We will use network-based discrete and dynamic computational models to integratively model behavior and functional conservation of diverse biomolecules, accounting for cell-lineage and environmental effects in the context of genomic and environmental variation. We are aware of the levels of complexity in this system that we may not be directly modeling. Indeed, some elements may even violate our assumptions, for example splice variants and discrete impacts of the epigenome. We address these challenges through (1) a combination of diverse approaches, including exploratory studies (such as for microbiome data) and (2) directed modeling efforts and close iteration of modeling and experimental verification with quantitative functional outputs that translate directly to humans. We will also adopt a flexible approach, updating the modeling approaches to information that emerges from our analyses.
|
0.954 |
2017 — 2021 |
Hacohen, Nir (co-PI) [⬀] Hodgin, Jeffrey Benton (co-PI) [⬀] Kretzler, Matthias Troyanskaya, Olga G |
UG3Activity Code Description: As part of a bi-phasic approach to funding exploratory and/or developmental research, the UG3 provides support for the first phase of the award. This activity code is used in lieu of the UH2 activity code when larger budgets and/or project periods are required to establish feasibility for the project. UH3Activity Code Description: The UH3 award is to provide a second phase for the support for innovative exploratory and development research activities initiated under the UH2 mechanism. Although only UH2 awardees are generally eligible to apply for UH3 support, specific program initiatives may establish eligibility criteria under which applications could be accepted from applicants demonstrating progress equivalent to that expected under UH2. |
Precision Medicine Through Interrogation of Rna in the Kidney (Premiere)
ABSTRACT The kidney has developed a complex, three-dimensional architecture to serve its key functions, including excretion of waste substances, maintenance of the internal balance for fluid and salt, blood pressure control, and hormonal function. Understanding the roles of the individual renal cell types in these processes in health and disease is critical to develop novel targeted therapies. Extensive studies lead by this investigative group and others have started to identify molecular disease mechanism in renal biopsy tissues and helped to develop novel disease markers and therapies. However, up to now these studies were limited using biopsy tissue homogenates, making it difficult to discern the specific pathways activated in cell types. The PREcision Medicine through IntErrogation of Rna in the kidnEy (PREMIERE) Network will bring investigators from three leading biomedical research institutions with diverse, complementary expertise together. Our team has an established track record in working jointly to develop state of the art approaches in molecular analysis of renal disease. We will set out to mine our existing compendium of thousands of gene expression profiles from renal biopsy tissues to extract single cell signatures using advanced data mining tools. In parallel we will develop technologies in our laboratories towards single cell analysis of renal tissues and scale them down so that they can work on single cells extracted from small renal biopsies. These single cell profiles will be linked to the disease states of the patients. Key signatures associated with specific cells and diseases will be extracted and localized in the three dimensional context of the kidney using specific RNA staining techniques in the first phase of the project. In the second phase, the analytical strategies will be scaled up so that single cell profiles from specific groups of patients can be obtained in a robust and reproducible manner. To this end the PREMIERE investigators will work closely with the tissue procurement sites and the Central Hub of the KPMP, using their 20 years of experience in team science, so that at the end of the first KPMP funding cycle novel, cell type specific treatment targets are identified fueling the therapeutic pipelines of the future.
|
0.954 |
2019 — 2021 |
Troyanskaya, Olga G |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Lntegration and Visualization of Diverse Biological Data
PROJECT SUMMARY The onset of most human disease involves numerous molecular-level changes to the complex system of interacting genes and pathways that function differently in specific cell-lineage, pathway, and treatment contexts. This system is probed by thousands of functional genomics and quantitative genetic studies, and integrative analysis of these data can generate testable hypotheses identifying causal genetic variants and linking them to network level changes in cells to disease phenotypes. This can enable deeper molecular-level understanding of pathophysiology, paving the way to genome-based precision medicine. The long term goal of this project is to enable such discoveries through integrative analysis of high- throughput biological data in a disease context. In the previous funding periods, we developed accurate data integration methods, created algorithms for the prediction of disease genes through context-specific and mechanistic network models and analysis of quantitative genetics data, and made novel insights into important biological processes and diseases. We further enabled experimental biological discovery by building public interactive systems capable of real-time user-driven integration that are popular among experimental biologists. We now propose to connect these gene-level functional network approaches with the underlying genomic variation by deciphering how genomic variants lead to specific transcriptional and posttranscriptional effects. We propose to develop ab initio sequence-level models capable of predicting biochemical effects of any genomic variant (including rare or never observed) on chromatin state and RNA regulation, then link these effects with gene-level regulatory consequences (including tissue-specific transcription and RNA splicing), and finally put genomic sequence directly into the network context via a statistical approach for detecting genes and network neighborhoods with a significantly elevated mutational burden in disease. Our key deliverable will be a user- friendly, interactive web-based framework enabling systems-level variant impact analysis in a network context and an open source library for computational scientists. In addition to systematic analysis across contexts and diseases, we will collaborate with experimentalists to apply our methods to Alzheimer?s, autism spectrum disorders, chronic kidney disease, immune diseases, and congenital heart defects as case studies for the iterative improvement of our methods and to directly contribute to better understanding of these diseases.
|
1 |