2003 — 2007 |
Batzoglou, Serafim |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Algorithms For Folding and Detection of Rna Genes
ABSTRACT
EF-0312459
Batzoglou, Serafim
This proposal considers the problem of computational folding and detection of noncoding RNA genes in DNA sequences. RNA genes are among the most important biological features in DNA, but at the same time they are very challenging to detect experimentally or computationally. There is a significant body of work in algorithms and tools for folding and detecting RNA genes. Their practical application is limited, however, because of high computational demands. At the same time, recent advances in DNA sequencing, such as the completion of the entire human, mouse, and rat genomes, suggest that sequence comparison is a promising direction. Unfortunately, comparison-based algorithms are considerably more time-consuming. The goal of this research is to develop a framework, and a collection of individual algorithms and tools, for efficient folding and detection of noncoding RNA genes. New algorithms that are significantly more efficient than existing ones will be developed by exploring the structural properties of RNA genes and excluding a large set of "unlikely structures" from the search space of possible configurations. The added efficiency will allow development of more accurate algorithms, practical application at a whole-genome scale on the human genome, and development of algorithms based upon sequence comparison. This research will help educate students at the undergraduate and graduate levels. Undergraduate students through the CURIS undergraduate research program at the Computer Science department will be part of the project, located at the Clark Center, which is part of the BioX initiative whose goal is to foster cooperation across disciplines such as biology and computer science, and to educate a new generation of students that will be "bilingual" in computational and biological sciences. Software will be disseminated.
|
1 |
2004 — 2010 |
Batzoglou, Serafim |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Methods For Comparative Genomics
Proposal # DBI-BDI: 0347952 PI: Batzoglou
Abstract
Stanford University has been awarded a CAREER grant to evaluate emerging methods of genome comparison, to improve existing methods and to develop new ones when appropriate. The concentration will be on devloping comprehensive reference maps, or alignments, of whole genomes to locate biologically functional and evolutionarily constrained elements, and to trace their evolutionary history. The results will be included in collaborations with multiple whole genome sequencing efforts. There will also be a significant course development component and several undergraduate and graduate students will be included in the effort.
|
1 |
2005 — 2007 |
Batzoglou, Serafim |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Annotation of Constrained Elements in the Human Genome
DESCRIPTION (provided by applicant): Computational identification and characterization of constrained elements in the human genome is 1 of the major goals of the next phase of the human genome project (Collins et al. 2003). To generate the requisite comparative sequence data, the sequencing centers will generate whole genome sequence from a number of mammals chosen primarily for their large diversity in terms of neutral substitutions. Currently existing, in production, or scheduled for production are at least 8 mammalian genomes at high coverage and 8 mammalian genomes at 2x coverage (www.genome.gov/12511858). This proposal focuses on the detection and annotation of constrained elements on the basis of global sequence alignments and high-resolution estimates of evolutionary rates. In short, we propose to generate the data that provided the justification for making a large investment in comparative sequencing of diverse mammals.
|
1 |
2006 — 2011 |
Levitt, Michael [⬀] Fedkiw, Ronald (co-PI) [⬀] Pande, Vijay (co-PI) [⬀] Sidow, Arend Batzoglou, Serafim |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Acquisition of a Hybrid Shared-Memory / Massively-Parallel Commodity Cluster For Cost-Effective Super-Computing At Stanford
This project, acquiring a 96-node/1536-core Opteron cluster with Infiniband interconnect and 10TB storage, facilitates a rich diversity of research at the interface of computer science and biology. The research to be enabled has many applications with a remarkable range of scale, from the sub-molecular to the organismal. The work is motivated by a common desire to push novel computational approaches to the limit that most significant problems can be tackled with available computational resources (both in terms of algorithmic advances and in terms of solving the largest). The project represents a broad range of methods, from physics-based simulation, to genomics and proteomics, to biostatistics, to joint experimental/ computational methodology. The enabled research can be grouped in four areas:
-Simulation and modeling of macromolecular structures, -Analysis of sequence and genomic/proteomic datasets, -Modeling of very large datasets, and -Fundamental computer science.
Besides enabling research, the instrument is an advance in commodity computing combining the low cost of Linux clusters with the power of shared memory machines. Out comes a supercomputer with a very low total cost of ownership. Highlights include:
-Molecular dynamics simulation based on quantum mechanically derived force fields: to understand hydrophobic effect that drives protein folding and to get closer to the goal of accurately modeling protein folding -Modeling of structure water in ribosome: to understand protein structure and function in the cellular milieu -Integration of genetic networks using genome sequence and experimental data: to appropriately combine disparate information into a single unifying framework based on common gene function and evolutionary descent -Whole genomic alignments and inference of evolutionary constraints: to predict impact of population genetic variation on the function of the organism, at the interface of population genetics and evolutionary theory -Simulation of human motion based on accurate, yet tractable, models of the neuromusculoskeletal system and simulation of blood flow in aorta: areas known for applied value -Development of algorithms for fluid dynamics, solid mechanics, graphics, segmentation, computer vision: areas of computer science with a strong mathematical component, as well as applied aspects such as movie animations
|
1 |
2007 — 2012 |
Batzoglou, Serafim |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Protein Interaction Networks: Integration and Alignment
Stanford University is awarded a grant to computationally predict, compare, and analyze functional associations between proteins in sequenced microbes. Protein interaction networks have emerged as a canonical way to represent functional associations between proteins in a cell. In such a network, proteins are nodes and edges connect proteins that physically interact, or more broadly participate in the same biological process. For the molecular biologist, protein interaction networks are an invaluable resource, as they can aid in functionally annotating hypothetical proteins that are connected to proteins of known functions, identifying new multiprotein complexes, and suggesting promising target proteins for experimentation. This project will computationally predict interaction networks that integrate all available functional genomics data sources, compare predicted networks across microbes to extract functional modules, and generate concrete experimental recommendations. It will develop novel methodology for inferring transcriptional cascades on a global level and aligning these cascades to find conserved regulons. In addition it will create tools for computer-guided experimental validation in Caulobacter crescentus, which will guide efforts in two experimental laboratories. The result will be an experimentally tested means of predicting protein interactions ? and hence biology ? directly from genome sequence. The team will develop a powerful and user-friendly web interface where protein interaction networks from all sequenced microbes will be browsed and compared across species. The resulting data will be downloadable in convenient formats, and the software will be available through the General Public Licence (GPL). The team will directly involve several undergraduate students in the research, through the CURIS undergraduate research program at the Computer Science department. Research will be conducted at the Clark Center, a new building that is part of the BioX initiative at Stanford University. The BioX initiative's goal is to foster cooperation across disciplines such as biology and computer science, and to educate a new generation of students that will be bilingual in the languages of both the biological and computational sciences.
|
1 |
2009 — 2010 |
Barron, Annelise Emily [⬀] Batzoglou, Serafim Quake, Stephen R (co-PI) [⬀] Shaqfeh, Eric S (co-PI) [⬀] |
RC2Activity Code Description: To support high impact ideas that may lay the foundation for new fields of investigation; accelerate breakthroughs; stimulate early and applied research on cutting-edge technologies; foster new approaches to improve the interactions among multi- and interdisciplinary research teams; or, advance the research enterprise in a way that could stimulate future growth and investments and advance public health and health care delivery. This activity code could support either a specific research question or propose the creation of a unique infrastructure/resource designed to accelerate scientific progress in the future. |
A Universal Front End to Improve Assembly Outcomes For Next-Gen Sequencing and Re
DESCRIPTION (provided by applicant): DNA sequencing is currently in the midst of disruptive technological shifts, with 454, Illumina, and Solid providing us with enormous throughput increases and large reductions in cost per base. Massively parallel technologies deliver a few Gbp of sequence per week as short fragments, or reads. New applications of sequencing only recently considered impractical are enabled: personal genome sequencing, "metagenomics" analysis of 'soups'containing several, to hundreds of unique organisms, and finally, de novo sequencing of novel genomes of complex organisms. No matter how the sequencing is done, reads must be assembled computationally, if they are to be useful. Given the read length and read quality limitations of new instruments and the massive volume of data generated, the computational assembly problem is becoming critical, with the cost of computational infrastructure and personnel exceeding reagent and instrument-related costs. Moreover, the results of assembly are currently far from ideal;for example, much of the human genome remains invisible due to high percentage of repeats. We propose to develop a new "front end" to next-gen sequencers for DNA preparation, the "Read-Cloud Method", which can reduce computational cost of genome assembly by 2-3 orders of magnitude, produce more complete and accurate genomes, and make metagenomics tractable. We propose a hierarchical sequencing approach, without any need for bacterial cloning. We will achieve this by handling single DNA molecules, tiled across the genome with high redundancy, on microfluidic devices. We will design, prototype, and thoroughly test technology to (i) shear genomic DNA into 200- kbp fragments with narrow size distributions;(ii) randomly amplify each individual, 200-kbp DNA in isolation, within a porous gel microcontainer that will be formed around the dsDNA molecule within a microdevice;(iii) digest micro-encapsulated DNA into small fragments, of tunable size;(iv) bar-code the progeny of each 200-kbp DNA with a 12mer oligonucleotide, to identify each read as associated with a particular 200-kbp DNA. A planar microfluidic device will be fabricated to allow one unique bar- code sequence to be blunt-end-ligated to both DNA termini. Bar-coded DNA is pooled, and next-gen sequencing is done. The results are a highly reducible data set. The method and algorithm are applicable universally, to next-generation platforms. The PIs (Batzoglou, Barron, Shaqfeh, Quake) will collaborate to make an efficient approach to hierarchical sequencing in microfluidic devices. PUBLIC HEALTH RELEVANCE: Project Narrative Gene sequencing is important to medicine. Our DNA sequencing method has the potential for reducing computational cost by orders of magnitude while making the assembled genomes significantly more complete and accurate. The key to this step is using microfluidic handling technologies to subdivide genomic DNA into 200kbp fragments, which are then amplified in isolation from each other and uniquely-labeled to form a highly reducible dataset for genomic assembly.
|
1 |
2015 — 2016 |
Batzoglou, Serafim Sidow, Arend West, Robert B (co-PI) [⬀] |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Genomic Evolution of Breast Cancer
DESCRIPTION (provided by applicant): Every cell present in our bodies is related to every other cell by mitotic division, and the history of each of our somas is a bifurcating cell lineage tree whose root is the zygote. We will use this concept to establish lineage relationships among neoplastic lesions and carcinomas from each of 100 HER2-positive breast cancer cases. We will accomplish this by sequencing the genomes of several distinct tissue samples (normal, neoplastic, carcinoma) from each case and by performing expression profiling. The somatic variation we will identify (single nucleotide variants, structural variants, and aneuploidies) willbe used to build lineage trees that serve as roadmaps to determine when during evolution genomic driver events (HER2 overexpression and/or amplification, aneuploidies, and mutations in key cancer genes) and gene expression changes occurred. Several additional neoplastic samples that are too small for whole-genome sequencing will be identified and typed by targeted PCR and sequencing, for 192 of the identified somatic mutations from each case. These additional samples will substantially broaden the phylogenetic tree and facilitate finer resolution as to which types of mutations and other genomic changes happen first during neoplastic evolution. They will also allow us to determine if there are mutations that recur within the same case. Remarkably, in our previous work we have shown, on the basis of such tree analyses, that H1047R in PIK3CA has arisen multiple times within several patients. We will identify additional such mutations, if they exist. Our proposed work is distinct from other studies of tumor evolution, which have so far focused exclusively on within-tumor subclone evolution or metastatic changes, and which cannot order the earliest driver changes. Our study will distinguish drivers of the initial proliferative phenotype from those that cause a full-blown carcinoma. This can only be done by comparative analyses of early neoplasias with normal tissue and with carcinomas. We note that this concept is well-established in species phylogenetics and evolution, where past events are routinely inferred by comparison among extant species, and which have broadly facilitated insight into both gene function and evolutionary mechanisms. Just like evolving species, cells in our somas are governed by inheritance, change, and divergence, and our understanding of the origins and evolution of neoplasias towards tumors will benefit from a phylogenetic and evolutionary perspective. Our proposed study is one of the first that is dedicated to examining cancer in the light of evolution. We believe that a fuller understanding of mutational mechanisms, the order of driver changes during progression, and the role of hypermutable sites, will be essential for improving diagnostics, prediction, and drug development of this fundamentally evolutionary disease.
|
1 |