2005 — 2009 |
Bafna, Vineet |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Novel Algorithms For Ncrna Discovery and Rna Structure Prediction @ University of California-San Diego
University of California - San Diego is awarded a grant to develop novel algorithms for discovering non-coding RNAs. Non-coding RNAs are rapidly regaining importance as a molecule of interest. Most ncRNA, with few exceptions, have been discovered through experiments, and effective, general computational tools for ncRNA discovery remain an unmet need. This award focuses on a comparative approach, typified by the following problem: given a query ncRNA sequence, and a sequence database, find all sequences in the database that match the query in sequence and secondary structure. A novel part of the proposal is the development of computational filters that rapidly eliminate much of the database, while retaining the true homologs to be evaluated using more expensive functions. Similar to BLAST for DNA/protein database search, filter based comparative ncRNA search has the potential to greatly accelerate discovery of novel ncRNA. This proposal is the first systematic study of RNA filters. Likewise, the idea of constructing alignments constrained by conserved seed structures is novel in the context of RNA and has yielded exciting preliminary results. The tools that are developed in the proposal will be free for all academic, research, and non-commercial purposes, and should have an impact on the larger community. The tools developed here will be an integral part of the curriculum, and allow students to explore other functional RNA.
|
1 |
2008 — 2012 |
Bafna, Vineet |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii-Cxt-Small: Algorithmic Strategies For Genotype-Phenotype Correlations @ University of California-San Diego
In the wake of new sequencing and genotyping technologies, whole genome studies are now being undertaken to understand the genetic basis of phenotypes. Many of the principles underlying the measurement of genotype-phenotype relationships, as well as computing related population genetic parameters, are relatively well understood. However, the upcoming technologies dramatically change the scale and scope of these studies, which already encompass tens of thousands of individuals over a genome-wide region. The analysis of this data requires novel algorithmic and statistical techniques.
This project focuses on a subset of the problems that could arise in a typical whole-genome based association study. These include:
(a) Phasing of genotypes into haplotypes using overlapping sequence data, and the application of this algorithm to phasing individual human sequences; the availability of high coverage long sequence data will make this approach the method of choice for phasing in the near future.
(b) Fast filtering for pairs of loci that interactively influence a phenotype and its application to multiple-locus testing of common disease phenotypes. The proposed work reduces the computational bottleneck in multiple locus testing.
(c) Detection of regions under balancing selection. Available tests are focused on detection of regions under positive selection. The proposed research looks for evidence of balancing selection in the genome, with specific attention on genes associated with bipolar disorder.
(d) Reconstruction of regulatory pathways using associations between genetic variation and gene-expression.
All software from this research is freely available as source-code, or as web-tools for academic, research and non-commercial purposes in accordance with University policy.
Further information on the project may be found at the project web site: http://bix.ucsd.edu/algen
|
1 |
2009 — 2012 |
Bafna, Vineet |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Algorithmic Strategies For Detecting Structural Variation in Genomes @ University of California San Diego
DESCRIPTION (provided by applicant): Fine-scale nucleotide changes, along with genetic recombination, are often cited as the major source of human genetic variation [1, 13, 14]. Less is known about larger scale (>10kb) genomic structural variations. As genomic technologies improve, we are detecting structural variation in ever-increasing numbers, including genomic inversions [24, 48, 71, 65, 31];insertion/deletion polymorphisms [12, 26, 42];and, copy number polymorphisms [28, 59, 60]. These large variations can completely disrupt coding and regulatory sites and copy number of genes, and thereby have a huge impact on human phenotypes and disease susceptibility [23, 61]. Deleterious effects have indeed been observed in cancer and other diseases [70, 43]. Our understanding of the scale and impact of these variations can be enhanced by improving computational tools for mining the data from these technologies. Here, I propose the development of algorithms and computational tools to improve detection and resolution (location of breakpoints) of structural variation. Specifically, I will develop algorithms for (a) experimental design of sequencing projects for detecting and resolving structural variations;(b) fine-mapping of breakpoints using end sequence profiling, to detect gene-disruption and gene-fusions;(c) reconstructing tumor genome architectures;(d) detection of targeted genomic variations in a heterogeneous mix of normal versus mutated cells via multiplex PCR;and (e) detection of balanced structural variation in genotype data. The tools will be designed using techniques from statistical machine learning and combinatorial algorithms. Validation will be performed using known structural variations, simulation studies, and extensive experimental collaborations with technology developers and early technology adopters. All of the data, and software will be freely available for academic and non-commercial uses. PUBLIC HEALTH RELEVANCE: The proposed computational tools will be used to detect structural variations in human populations as a starting point for understanding their role in normal evolution and disease, specifically cancer. The architecture of tumor genomes will help reveal genes that are disrupted and differentially expressed in tumor cells. The targeted detection of genomic lesions in a heterogeneous mix of mutated and wildtype cells, will find application as an early diagnostic for cancer. Thus, our computational methods will have an immediate and long term effect on human health.
|
1 |
2009 — 2013 |
Gaasterland, Theresa (co-PI) [⬀] Macagno, Eduardo [⬀] Bafna, Vineet |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bioinformatics Tools For the Analysis of the Spatiotemporal Organization of Protein Expression in Neural Functional Units @ University of California-San Diego
The University of California at San Diego has received a grant to develop computational tools to support the study of the development and repair of nervous system development at the molecular level. Knowledge of the tissue distribution of essential molecules is necessary for understanding how biological systems function, how they grow, and how they repair themselves following trauma or disease. In this project, a multidisciplinary team of investigators will design, test and implement new tools to analyze data obtained by means of the recently developed technique of mass spectrometry imaging applied to the mapping of peptides and proteins in biological tissues. Application of these new methods will yield detailed maps of the temporal and spatial distributions of thousands of individual molecules and the capacity to examine patterns of expression as well as correlations in expression within ensembles of molecules. These new methods will be developed and tested first in simple model organisms, to characterize and compare the molecular components in the embryonic, adult and regenerating nervous system. Later, they will be applied in studies of mammalian nervous system slices in order to answer, among other questions, how stem cells are intercalated into and how they mature in adult nervous systems, during normal replacement or artificial replacement following cell loss due to disease or aging. All computational and bioinformatic tools developed in the course of this project will be made available openly to other scientists. The project will train a group of scientists at multiple levels, from undergraduates to postdoctoral fellows, in this exciting new area of basic and applied research. Addiitonal information may be found at http://genomes.ucsd.edu/leechmaster/.
|
1 |
2009 — 2013 |
Smith, Laurie (co-PI) [⬀] Bafna, Vineet Briggs, Steven [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Discovery, Revision and Validation of Maize Genes by Proteogenomics @ University of California-San Diego
PI: Steven P. Briggs (University of California - San Diego)
CoPI: Vineet Bafna and Laurie G. Smith (University of California - San Diego)
Collaborators: W. Joan Chen (San Diego State University), Laura J. Olsen (University of Michigan - Ann Arbor), Steven Rodermel and Patrick Schnable (Iowa State University), and Frank Hochholdinger (Universität Tübingen, Tübingen, Germany)
The most fundamental goal of genome science is to discover all of the protein-coding genes, and then to discern the abundance, location, and exact chemical composition of every protein made during the life cycle of an organism; this is called the proteome. A complete and accurately annotated proteome provides the foundation for studies of systems biology and molecular evolution, as well as for hypothesis-driven research. Recent progress in proteogenomics (using proteomic information to annotate the genome) has established it as a data-driven method that complements nucleotide (DNA and RNA)-based annotation strategies. Genome-wide, quantitative proteomics also makes possible the creation of a protein atlas that reveals the anatomical distribution of the proteome and protein sub-cellular locations. This project has two research aims. Aim 1 is to create an Atlas of Maize Proteins. The atlas includes the identity and relative amount of 40,000-50,000 proteins in each of 37 different tissues and stages of maize development. The atlas also includes the protein composition of the plasma membrane, chloroplast, mitochondrion, and peroxisome along with information about the protein changes caused by abiotic and biotic stress. Aim 2 provides proteogenomic discovery, revision, and confirmation of 40,000-50,000 maize gene models, including the identification of exons, the definition of translation start sites and exon borders, and the determination of the correct exon reading frames. This project enhances genome-enabled maize research and breeding by increasing the completeness and accuracy of maize genome annotation. Furthermore, investigations of maize physiology, development, cell functions, and breeding benefit from knowledge of the anatomical and sub-cellular distribution of maize proteins provided by the Atlas of Maize Proteins. Interdisciplinary educational and outreach opportunities are provided to post-docs, graduate students, undergraduates, high school students and Cal State researchers, with an emphasis on involvement of under-represented minorities.
This project will provide interdisciplinary educational and outreach opportunities for post-docs, graduate students, undergraduates, high school students and San Diego State University researchers, with an emphasis on involvement of under-represented minorities. All project participants in San Diego including post-docs, graduate students and undergraduates are receiving unique, interdisciplinary training made possible by the collaboration this project involves between investigators with expertise in mass spectrometry, bioinformatics, maize developmental and cell biology, and plant responses to stress. High school students are participating in the research via a module developed for BioBridge, a UC San Diego outreach program that brings hands-on learning activities into San Diego public schools. Researchers at San Diego State University will receive training and education in proteomics and bioinformatics through workshops. Access to the biological materials used in the project is provided by the Germplasm Resources Information Network (GRIN, http://www.ars-grin.gov/). Access to the project results, including data and software, is provided by websites and publications by the investigators (http://briggs.ucsd.edu/; http://www-cse.ucsd.edu/~vbafna/). The long-term repository for project data is Tranche (https://proteomecommons.org/index.jsp) and Gramene (http://www.gramene.org).
|
1 |
2011 — 2015 |
Bafna, Vineet Tesler, Glenn |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Af: Small: Algorithms For Genetics: Epistatic Interactions, Haplotype Assembly, and Selection Signatures @ University of California-San Diego
Algorithms for genetics: epistatic interactions, haplotype assembly, and selection signatures
Variation in our DNA (often inherited) can have important functional consequences, including susceptibility to diseases. However, much of variation is due to random drift and may have no functional consequence. Identifying the small subset of variations that are functionally important is key to a deeper understanding of the genetic basis of diseases and other phenotypes, and is the mainstay of statistical genetics and other fields. However, rapidly falling costs of genome sequencing implies that genomes of entire populations will be completely sequenced. The availability of tremendous amounts of genetic data, and the complexity of relations between genotypes and phenotypes changes the nature of inference problems from statistical to computational, and demands the use of algorithmic (combinatorial and machine learning) techniques. In this proposal, the PIs propose specific goals in three broad areas, which involve the use of algorithmic techniques in solving problems in genetics.
1. Epistatic interactions and geometric embedding: Epistatic interactions where two distant loci interact to jointly mediate the phenotype often confound analyses. However, with millions of loci, testing all pairs for interactions is computationally intractable. The PIs propose to develop fast algorithms for this problem. The approach depends upon the development of a metric embedding that maps the genotypes at a locus to a point in a high dimensional Euclidean metric, such that interacting pairs have small Euclidean distances. This metric embedding is novel, and allows the use of geometric algorithms for fast detection of epistasis. 2. Haplotype assembly: Haplotyping refers to the separation of the maternal and paternal chromosomes. Successful resolution has great impact in improving the efficacy of genetic association, and in understanding the genetic history of the population. The PIs propose the use of modern strobe-sequencing technologies and single genome amplification to dramatically expand the length of achievable haplotypes. One of the formulated problems maps naturally to connectivity in a new class of random graphs. 3. Pooled selection: The PIs propose the identification of regions under genetic selection, using next generation sequencing data. Specifically, the proposed tests work on pooled DNA, and partially sampled DNA, and employ a combination of techniques from population genetics and combinatorial optimization.
Broader Impact and Intellectual Merit The great promise of genomics is that our complete sequence will be an integral part of our medical record, and the major health prognostics will be informed by variation. However, the early research in correlating genotypes and phenotypes is stymied by lack of analysis tools. The problems addressed here are central to the domain and will clearly add to the toolkit of geneticists and biologists. The research also contributes directly to the CISE-CCF mission of developing novel algorithms for Computational Biology, as the proposed problems are uniquely at the intersection of algorithmic and genetics, and open new avenues of research in Computer Science.
Dissemination and outreach will continue through the length of the project contributing to the broader impact of this research. It will include invited and contributed presentations, publications, classroom projects, and collaborations. Software will be freely available as source-code, or web-tools, for academic, research and non-commercial purposes adding to the infrastructure of genetic analyses tools.
|
1 |
2013 — 2017 |
Bafna, Vineet |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Algorithms For Decoding Complex Patterns of Genomic Variation @ University of California-San Diego
Genomes evolve and diversify through different mechanisms, including small point mutations, and large structural variations (SV). As entire populations of individuals get sequenced, we observe a complex mosaic of patterns. Some of these are characteristic of a selective constraint such as tolerance to lack of oxygen (for highlander populations), or lactose tolerance. In one aim of the proposal, the investigators develop computational techniques for identifying characteristic genetic patterns to identify genes that are adapting to these selective constraints. The other aims to reconstruct regions with complex variation patterns such as the Killer cell Immunoglobulin-like Receptor (KIR) region. KIR diversity plays a significant role in mediating immune response, helping with an understanding of diseases including rheumatoid arthritis, control of HIV disease progression as well as the success rate of cell replacement therapy for certain leukemias (blood cancer). The investigators will use a mix of techniques from combinatorial algorithms, machine learning, and population genetics to decode the genetic patterns. The proposal has broader impact in the field as part of a larger effort to develop efficient computational tools for genetic analysis; a critical problem in the modern era of inexpensive sequencing. The tools and technologies described here will have a direct impact on understanding the genetic diversity of populations, and towards a personalized approach to healthcare.
The proposal seeks to decipher the observed genetic variation across populations using two thrusts. In one thrust, it looks to haplotype genomic structural variation, and discover the genomic architecture of complex immunological regions like KIR and HLA. In a second thrust, the investigators analyze patterns of variation that are indicative of selective constraints. For selection signatures, the investigators will provide a better understanding of currently available tests using the scaled site frequency spectrum, and use an algorithmic approach to identify a better discriminator. For the rearranged genomic regions, the investigators will use optimization algorithms to adjust read coverage in highly repetitive regions. The proposal has broader impact in the field as part of a larger effort to develop effcient computational tools for genetic analysis; a critical problem in the modern era of inexpensive sequencing. The tools and technologies described here as well will have a direct impact on understanding the genetic diversity of under-represented populations, and towards a personalized approach to healthcare. The proposed research is tightly connected to undergraduate and graduate education, as all research here will be directly incorporated in interdisciplinary classes. The PI has a strong track record mentoring womena and other under-represented students in Computer Science.
|
1 |
2014 — 2018 |
Bafna, Vineet Bandeira, Nuno Filipe Cabrita Pevzner, Pavel A [⬀] |
P41Activity Code Description: Undocumented code - click on the grant title for more information. |
Center For Computational Mass Spectrometry @ University of California San Diego
DESCRIPTION: Mass spectrometry is based on fragmenting biological molecules into smaller pieces, and using the fragment masses as a fingerprint for identifying and quantifying bio-molecules. It is the dominant technology for studying active molecules in healthy and diseased tissue, and identifying protein targets and natural products for novel therapeutics. When the initial proposal Center for Computational Mass Spectrometry (CCMS) was submitted in 2007, the lack of adequate computational tools for analyzing mass spectrometry data was the the key bottleneck. With great success in enabling applications of new experimental techniques such as FTMS, ETD, HCD, top-down mass spectrometry, and many others, the mandate of CCMS continues to be the development of next generation computational technologies and to apply them to open experimental. In this proposal, we will capitalize on our recent results in diverse subfields of computational proteomics and will further branch into previously unexplored MS applications. We will focus specifically on bridging proteomics and genomics technologies using 6 technology research and development platforms. Specifically, we will (a) apply proteogenomics approach for the discovery of abberant cancer genes and analyzing antibody repertoires; (b) sequence natural antibiotics; (c) collate spectral data through spectral archives and networks; (d) develop universal tools for peptide identification; (e) develop tools for top-down proteomics; and, (f) analyzing multiplexed spectra. The technology platforms are driven by a multitude of col- laborative biomedical studies where the use of CCMS developed tools is essential for their success. These studies include (a) unraveling the combinatorial histone code in human diseases; (b) a proteogenomics approach to studies of oral microbiome and polybacterial infections; (c) detecting inter-species chemical in- teractions; (d) developing a systems approach towards the therapeutic modulation of the acetylome ; (e) developing tools for monoclonal and polyclonal antibody sequencing; (f) development of breast cancer vac- cines; (g) clinical cancer proteogenomics; (h) discovery of lantibiotics; (i) discovering proteomic biomarkers for drug toxicity in cancer patients; and, (j) identifying protein-protein interactions and post-translational mod- ifications in cataractous lens. These projects require three-way collaborative efforts on a wide range of topics involving biomedical scientists, mass spectrometrists, and computational scientists from various institutions. CCMS will also train students and practicing scientists from all over the world in computational proteomics, and educate the proteomics community about modern computational mass spectrometry to encourage its wide adoption.
|
1 |
2014 — 2018 |
Bafna, Vineet |
P41Activity Code Description: Undocumented code - click on the grant title for more information. |
Technology Research and Develeopment Project 1: a Proteogenomics Approach For the Discovery of Aberrant Cancer Genes and Analyzing Antibody Repertoires @ University of California, San Diego
Project Summary Mass spectrometry is based on fragmenting biological molecules into smaller pieces, and using the fragment masses as a fingerprint for identifying and quantifying bio-molecules. It is the dominant technology for studying active molecules in healthy and diseased tissue, and identifying protein targets and natural products for novel therapeutics. When the initial proposal Center for Computational Mass Spectrometry (CCMS) was submitted in 2007, the lack of adequate computational tools for analyzing mass spectrometry data was the the key bottleneck. With great success in enabling applications of new experimental techniques such as FTMS, ETD, HCD, top-down mass spectrometry, and many others, the mandate of CCMS continues to be the development of next generation computational technologies and to apply them to open experimental. In this proposal, we will capitalize on our recent results in diverse subfields of computational proteomics and will further branch into previously unexplored MS applications. We will focus specifically on bridging proteomics and genomics technologies using 6 technology research and development platforms. Specifically, we will (a) apply proteogenomics approach for the discovery of abberant cancer genes and analyzing antibody repertoires; (b) sequence natural antibiotics; (c) collate spectral data through spectral archives and networks; (d) develop universal tools for peptide identification; (e) develop tools for top-down proteomics; and, (f) analyzing multiplexed spectra. The technology platforms are driven by a multitude of collaborative biomedical studies where the use of CCMS developed tools is essential for their success. These studies include (a) unraveling the combinatorial histone code in human diseases; (b) a proteogenomics approach to studies of oral microbiome and polybacterial infections; (c) detecting inter-species chemical interactions; (d) developing a systems approach towards the therapeutic modulation of the acetylome ; (e) developing tools for monoclonal and polyclonal antibody sequencing; (f) development of breast cancer vaccines; (g) clinical cancer proteogenomics; (h) discovery of lantibiotics; (i) discovering proteomic biomarkers for drug toxicity in cancer patients; and, (j) identifying protein-protein interactions and post-translational modifications in cataractous lens. These projects require three-way collaborative efforts on a wide range of topics involving biomedical scientists, mass spectrometrists, and computational scientists from various institutions. CCMS will also train students and practicing scientists from all over the world in computational proteomics, and educate the proteomics community about modern computational mass spectrometry to encourage its wide adoption.
|
1 |
2015 — 2019 |
Bafna, Vineet |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Abi Innovation: Computational Population-Genetic Analysis For Detection of Soft Selective Sweeps @ University of California-San Diego
The molecular process of adaptation-the rise in frequency of genetic variants that enable organisms to succeed in their environments-is a central process in evolutionary biology. Surmounting significant challenges such as the ability of infectious agents to evolve resistance to drugs and the ability of crop pests to defeat a diverse array of increasingly powerful insecticides requires an understanding of the nature of adaptation. Recent advances have demonstrated that adaptation often occurs via "soft selective sweeps," in which an adaptive genetic variant originates multiple times or has become favored only after it has been present at a substantial frequency in the population. This project contributes to advancing knowledge of the fundamental evolutionary process of adaptation by developing new computational tools to detect and study the occurrence of adaptation by soft selective sweeps. Through the interactions of a multidisciplinary team spanning evolutionary biology and bioinformatics, the project integrates advances in evolutionary simulation with modern and efficient computational methods in order to produce progress on understanding adaptation, while simultaneously developing efficient computational tools applicable in the modern "big-data" era of inexpensive sequencing. In addition, its joint mentorship efforts from evolutionary and bioinformatics perspectives promote interdisciplinary training of graduate students and postdoctoral scientists.
The project has four objectives: (1) To design new tests for detecting selection in the case in which soft selective sweeps occur from standing genetic variation; (2) To identify haplotypes that carry a beneficial allele in genomic regions known to be experiencing positive selection; (3) To enhance new methods of analysis of natural selection to make them robust to confounding demographic scenarios; (4) To apply new selection methods in a series of data sets from multiple species, including humans, Drosophila, and Plasmodium malaria parasites. The project will use algorithmic techniques from combinatorial optimization and machine learning, and it will exploit ideas from population genetics and coalescent theory. It breaks ground on several fronts, providing a deeper understanding of the patterns in site-frequency spectra and haplotype data as a basis for selection signatures, and assisting in the design of subtyping studies for complex regions of the genome. As it becomes increasingly possible to sequence whole genomes of multiple individuals within a population, the intellectual challenge of designing tools for detecting selection to accommodate new phenomena such as soft sweeps coincides with the computational challenge of incorporating genomic data sets into selection studies. These challenges are addressed by the project, whose results will be available at http://proteomics.ucsd.edu/vbafna/research-2/nsf1458059/.
|
1 |
2016 — 2021 |
Bafna, Vineet |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Computational Methods For Detecting Patterns of Complex Genomic Variation @ University of California San Diego
? DESCRIPTION (provided by applicant): Genomes evolve and diversify through di?erent mechanisms, including small point mutations, but also larger, structural variations (SV). SVs can be mediated by simple repeats and microhomology based recombination (termed 'progressive SVs' in this proposal). However, progressive SV mechanisms cannot explain all forms of large genomic variation; sometimes, more 'complex mechanisms' are needed; examples include Breakage Fusion Bridge, and Chromothripsis. Moreover, there is little understanding of the genetic mechanisms of genome instability that lead to complex SV formation. It is suspected that random viral genome insertions into the genome can on occasion disrupt key genes, causing genome instability and hyper-variability. To address these problems, the proposal will design and implement computational methods to (a) reconstruct and validate episomal structures of viral genome insertions; (b) determine if genomic sequence sampled from tumor genomes has a signature of complex variation; and, (b) phase and sub-type regions with complex SV including KIR and HLA; As clinical/translational applications of genomics come to the forefront, the impact of complex SVs on the phenotype of an individual become increasingly important. Understanding the computational signatures of BFB and Chromothripsis will help sub-type and characterize cancers. The knowledge of KIR/HLA sub-type will be correlated with immune related phenotypes, and the reconstruction of viral episomes will help clarify the etiology of virus mediated cancers. Thus, the proposed set of computational tools will directly impact the translational/medical aspect of genomics.
|
1 |
2016 — 2021 |
Bafna, Vineet Gaasterland, Theresa (co-PI) [⬀] Subramaniam, Shankar [⬀] |
T32Activity Code Description: To enable institutions to make National Research Service Awards to individuals selected by them for predoctoral and postdoctoral research training in specified shortage areas. |
Graduate Training Program in Bioinformatics @ University of California, San Diego
Graduate Training Program in Bioinformatics Program Abstract Biology is increasingly becoming an information-driven science. To harness the opportunities of the post- genomic era in furthering health sciences research and improving health care, there is an enormous demand for biologists who are trained in mathematics and computer science and can think quantitatively. However, current disciplinary graduate training programs are not designed to accommodate these rapid changes in the biological research perspective. This need serves as the motivation for the development of specialized graduate training programs that will train students at the interface between biology, engineering and computer science. To address this need, UCSD established an interdisciplinary Graduate Program in Bioinformatics in 2001 under the directorship of Dr. Shankar Subramaniam. In 2008, it was renamed Graduate Program in Bioinformatics and Systems Biology and reorganized. The current program directors Drs. Trey Ideker, Euegene Yeo, and Theresa Gaasterland work closely with the Training Grant co-PIs Dr. Subramaniam, Vineet Bafna, and Theresa Gaasterland, and with an active steering committee containing representative faculty from all five participating UCSD schools and academic divisions. The primary objectives of this (renewal) application of the Training Grant by the three co-PIs are to continue and expand this premier Graduate Program, and support the highest quality students in their truly interdisciplinary training which blends biomedicine, computer science and engineering. The Program will continue to evolve the curriculum (including online offerings) and develop and offer electives that will prepare students for the challenges of big data and computational biomedical research. The program will continue its mode of training that begins with a set of research rotations in laboratories of faculty members, and continues through doctoral research work under the supervision of a PhD advisor and co-advisor who provide complementary interdisciplinary expertise. The Program will also continue a recently established weekly Colloquium, the student Journal Club, and annual retreat. In the course of their training, program students have contributed important discoveries and impactful advances in health sciences research. Alumni of the program are placed in leading positions in Academia and industry. Given the extraordinary number and quality of applicants, the capacity and eagerness of the Program faculty to train the Program?s students, and the institutional support for the Program, this application seeks to increase the number of trainee slots to 12. Following Training Grant support of Graduate students during their course work education and initial research training, all graduate students will be supported by their thesis advisors for the duration of their PhD studies. The Proposal outlines our past success in training students and discusses significant novel strategies for enhancing the training program. The new inception of the UCSD Halicioglu Data Sciences Institute provides great synergy for our Graduate Program and offers the trainees an opportunity to become leaders in the emerging areas of biomedicine that are heavily becoming data science disciplines.
|
1 |
2018 — 2021 |
Bafna, Vineet Mir Arabbaygi, Siavash |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: New Algorithms For Genome Skimming and Its Applications @ University of California-San Diego
Anthropogenic pressure and other natural causes have resulted in severe disruption of the global ecosystems in recent years, including loss of biodiversity and invasion of non-native plants and animals. A particular problem is that it is not enough to simply determine the population of various species; it is also important to determine whether there exists enough genetic diversity within a species to ensure its survival. It is therefore necessary to estimate the genetic biodiversity of various areas in order to decide where and which plants and animals are in most need of protection, and to predict the outcome of proposed interventions. However, current algorithms for computing biodiversity, which are based on taking on computing the genetic "distance" between samples of organisms, are too computationally intensive and slow to be applied at large scale. This project will overcome the problem by developing new, highly efficient algorithms for computing biodiversity. As a result, this work will provide tools needed to improve our knowledge of ecosystems and make better decisions for managing plant and animal natural resources.
In place of the currently popular technique of isolating and sequencing specific phylogenetically informative regions, the PIs propose a low-pass whole genome sequencing (genome skims) and alignment-free methods for barcoding. To enable this approach, the PIs will develop algorithms and tools to identify all genome-skims in a given library, use them for phylogenetic reconstruction and use meta-barcoding and genome-skims as a mechanism for examining populations of organisms. The proposed activities will allow the estimation of genomic bio-diversity for a fraction of the current costs of labor and genome sequencing. The proposal uses a number of innovative and novel algorithmic and statistical techniques and describes the first systematic study of the feasibility of computing the genomic distance using only a small, random fraction of the genome. The project will advance the field by providing a simple and inexpensive protocol for measuring biodiversity with higher sensitivity than is currently achievable.
The proposed activities will allow the estimation of genomic bio-diversity for a fraction of the current costs of labor and genome sequencing. The proposal uses a number of innovative and novel algorithmic and statistical techniques and describes the first systematic study of the feasibility of computing the genomic distance using only a small, random fraction of the genome. If successful, the project will advance the field by providing a simple and inexpensive protocol for measuring biodiversity with higher sensitivity than is currently achievable. The investigators have a strong history of prior research in related fields, but have complementary expertise, in evolution and phylogenetic reconstruction and computational population genomics. There are three aims: given genome-skims of two organisms, estimate the hamming distance and use that to search a given library; use genome-skims for phylogenetic reconstruction; and given a meta-barcoding query (genome-skims of a mix of organisms), identify the constituent organisms and their relative abundance in the sample.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2018 — 2021 |
Bafna, Vineet Bansal, Vikas (co-PI) [⬀] Gymrek, Melissa |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Refining Mendelian Disease Analysis Via Detection of Clinically Relevant Repeat Variants @ University of California, San Diego
Project Summary Next-generation sequencing (NGS) has the potential to pro?le all clinically relevant genetic variants simultane- ously in a single genetic test. However, clinical variant discovery pipelines have mostly focused on coding single nucleotide variants (SNVs), regulatory SNVs and small indels. This proposal aims to make repeat analysis a standard component of existing pipelines, focusing in particular on short tandem repeats (STRs), variable number tandem repeats (VNTRs), and low-copy repeats or segmental duplications. Together, these repeats account for 8% of the human genome, but are implicated in a disproportionately large number of Mendelian diseases. The proposed methods are primarily aimed at Illumina sequencing, which forms the vast majority of current Mendelian sequencing pipelines, but also includes alternative technologies such as Paci?c Biosciences and 10X Genomics. The ?rst aim develops algorithms for discovery of repeat variants currently inaccessible from NGS. In the second aim, the PIs propose to generate gold-standard validation data for Mendelian repeats using multiple technologies. In the third aim, the PIs will integrate the proposed methods into existing NGS pipelines for clinical variant discov- ery, and also apply them to large existing data-sets to obtain genotype frequencies of large control populations. The project serves an unmet need by augmenting Mendelian variant pipelines to include highly relevant disease variants.
|
1 |
2021 |
Bafna, Vineet |
P01Activity Code Description: For the support of a broadly based, multidisciplinary, often long-term research program which has a specific major objective or a basic theme. A program project generally involves the organized efforts of relatively large groups, members of which are conducting research projects designed to elucidate the various aspects or components of this objective. Each research project is usually under the leadership of an established investigator. The grant can provide support for certain basic resources used by these groups in the program, including clinical components, the sharing of which facilitates the total research effort. A program project is directed toward a range of problems having a central research focus, in contrast to the usually narrower thrust of the traditional research project. Each project supported through this mechanism should contribute or be directly related to the common theme of the total research effort. These scientifically meritorious projects should demonstrate an essential element of unity and interdependence, i.e., a system of research activities and projects directed toward a well-defined research program goal. |
Core C- Bioinformatics Core @ University of California, San Diego
PROJECT SUMMARY ? Core C: Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. It has emerged as a field unto itself, as the datasets that are generated by modern biomedical researchers easily exceeds what can be directly visualized. The vast amount of data increases the chance of false-negative and false-positive results, and argue for robust statistical models and reproducible workflows. Core C will work with the data generated from massive parallel sequencing from human, frog and mouse in Project I, II and III and Core B to extract variants that have potential to cause meningomyelocele or influence neural tube phenotypes. The PIs of the Projects and Cores have worked together extensively in the past, and have an established track record of productivity in the area of next generation sequencing (NGS) data analysis. Dr. Bafna has worked broadly in bioinformatics and genomics in the development computational methodologies employing novel algorithms and statistical techniques for NGS datasets. We envision that the DNA sequencing derived from Project I in the form of whole genome or whole exome sequencing from patients and their parents will be delivered to Core C for determination of potentially pathogenic risk-associated variant prioritization. RNA sequencing, single cell sequencing and epigenetic sequencing data generated from Core B, as well as imported from Project I, II and III, will be delivered to Core C for extraction of expression changes, which will be delivered to each of the Projects for segregation analysis and further validation. The Bioinformatics Core will provide these analysis pipelines to identify and annotate variants, and to develop innovative network analyses, RNAseq, Methylseq and single cell analysis to discover novel genetic mechanisms of MM based on Protein-Protein Interaction (PPI) and gene co-expression networks, to interpret large datasets from current genetic and genomic technologies, and to apply these in the different components of this Program Project. Although our primary goal is to provide service using existing computational methods, we expect that the Core B will also develop novel computational methods as required by the Projects and Cores, as we have done to develop our current WGS analysis pipeline. Methods development will be geared towards fundamental unsolved problems underlying the above four key functions, such as algorithms for correlating variants to phenotypes, further improvements in methods for computing epistatic interactions, detection of short tandem repeats and mobile elements from WGS, advanced methods for integration of genotypes with pathways, use of next- generation sequencing (NGS) in analysis of gene association, and discovery of genetic variants that influence protein expression or function.
|
1 |
2021 |
Bafna, Vineet |
U24Activity Code Description: To support research projects contributing to improvement of the capability of resources to serve biomedical research. |
Software and Algorithms For Elucidating the Structure, Function, and Evolution of Extrachromosomal Dna @ University of California, San Diego
Project Summary Somatic copy number amplification (SCNA) of tumor promoting oncogenes, and focal copy number amplifi- cations specifically, are a major driver of cancer pathogenicity. Recent results have revealed that that focal oncogene amplification is mediated to a large extent by extrachromosomal DNA (ecDNA) i.e., large (1.3 Mb on average), highly amplified, oncogene-containing circular molecules that occur in nearly 25% of cancers across all sub-types, but rarely in normal cells. Unresolved questions regarding the formation, evolution, het- erogeneity, and pathogenicity of ecDNA are becoming central to uncovering vulnerabilities that can be targeted for diagnostics and therapy. The proposed project will enhance and disseminate ?Software and algorithms for elucidating the structure, function, and evolution of extrachromosomal DNA.? Specifically, we will (1) de- velop CAPER (a Community Accessible Pipeline for EcDNA Reconstruction) by leveraging the GenePattern ecosystem to provide an easy point and click interface to running the CPU, memory and storage heavy soft- ware; (2) design and implement novel algorithmic improvements to the CAPER work flow, including support for long-reads and integration of Omics data; and, (3) enable the broad adoption of CAPER through strategic collaborations, outreach and education.
|
1 |