2005 — 2008 |
Eskin, Eleazar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Seiii: Estimating Haplotype Frequencies @ University of California-Los Angeles
The etiology of complex diseases involves multiple genes and environmental factors. Since each individual gene locus is only a small part of the whole picture, association studies based on correlating variation at one or a few gene loci to disease outcomes may miss significant larger-scale associations. An attractive alternative that may be more revealing is to base association studies on correlations between disease outcomes and haplotypes across selected genomic regions, A prerequisite for association studies, whether they are based on a few loci or on larger-scale haplotypes, is an accurate method for haplotype frequency estimation in a given population. The differences between the haplotype frequencies in a healthy population and in a population of affected individuals may be subtle. Thus, getting an accurate estimate for the haplotype frequencies is extremely important for disease association studies. Estimating haplotype frequencies is a non-trivial task because current sequencing methods may produce noisy or incomplete data and typically yield genotypes, whose resolution into pairs of haplotypes is ambiguous. Existing methods for haplotype frequency estimation are mainly heuristic in nature, and they are only suitable for large samples of unrelated individuals from a homogenous population over short genomic regions. Any deviation from these conditions may result in inaccurate estimates. The main goal of this project is to develop efficient and accurate tools for haplotype frequency estimation under different conditions, and to integrate these methods with novel tools for disease association studies. In particular, the following activities are proposed: develop accurate, efficient and robust methods for haplotype frequency estimation over short and long genomic regions; extend these methods to deal with small sample size and deviations from Hardy-Weinberg equilibrium due to population substructure, and incorporate pedigree information into the haplotype frequency estimator; integrate the resulting tools with a systematic tool for disease association studies that looks for candidate loci automatically using multiple calls to the haplotype frequency estimator; and launch a web server that will allow geneticists to upload their data and run the programs developed in the project on the fly through the web server.
The direct effect of the project would be to reduce the sample size needed for association studies, thus making more studies possible under the same budget constraints. This in turn will lead to a better understanding of complex diseases, which may speed up the search for diagnosis and treatment tools. The mathematical models introduced in this project may shed light on haplotype structure and on evolution. Furthermore, the project will address optimization problems and statistical learning problems that may be of use beyond the scope of genetics. The diverse tasks of this project include algorithm design and implementation, software integration and biological modeling. Thus, there is a wide range of activities that are suitable for students of all levels. This will give students an exciting exposure to multidisciplinary research involving computer science, statistics, genetics and mathematics. The methods developed in this project will be integrated in bioinformatics courses at UCSD, and the material will be publicly available as PowerPoint presentations on the web. The software developed in this project will be integrated with the existing publicly available web server HAP.
|
1 |
2005 — 2009 |
Eskin, Eleazar |
P41Activity Code Description: Undocumented code - click on the grant title for more information. |
Grid Enabled Hap Analysis @ University of California San Diego |
0.975 |
2006 — 2010 |
Eskin, Eleazar |
K25Activity Code Description: Undocumented code - click on the grant title for more information. |
Discovering the Genetic Basis of Hypertension @ University of California Los Angeles
DESCRIPTION (provided by applicant): With the completion of the human genome project, much of the progress in understanding the genetic basis of disease relies on computational analysis of genomic data. Some of the most useful data for this analysis is human variation data. This data consists of information on the variation in genes associated with a disease for a population of individuals. Understanding the relation between variation and disease is a fundamental challenge, which can shed light on the genetic basis and mechanisms of human disease. This challenge spans three research fields: genetics, bioinformatics and medicine. Understanding the genetic basis of disease involves two steps. First, we must determine the functional variants in each gene locus that is linked to the disease and the effect of functional variants on the regulation and gene products of the gene. Second, we must understand how these intermediate phenotypes affect disease outcomes. Using this information, we can identify subtypes of the disease which are candidates for different drug response. In this proposal we outline our approach for this problem and propose to build tools for modeling the function of variation in a gene locus, correlating the intermediate phenotypes to disease outcomes and identifying subtypes of the disease based on genetic variants. The core of our approach involves haplotype analysis and we leverage previously developed tools for this analysis. We demonstrate initial results over the Chromogranin A locus. The disease focus of this proposal is hypertension and the tools will be applied to the large amount of data collected at UCSD through the pharmacogenomics project. This proposal contains of an extensive training plan for Eleazar Eskin including courses at UCSD in order for him to obtain the necessary background in hypertension and genetics for the project. Daniel O'Connor and Nicholas Schork will mentor Eleazar throughout the project. This training and mentoring will put Eleazar in a position to have his research have a larger impact in medicine. This research is relevant to public health because it attempts to understand the relation between an individual's genetic variation and disease outcomes. Identification of the variants that are implicated in complex disease is the first step in the ultimate goal of tailoring treatments to an individual's genetics.
|
1 |
2007 — 2011 |
Eskin, Eleazar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Design and Analysis of Compressed Sensing Dna Microarrays @ University of California-Los Angeles
The diverse functions performed by a living cell during her life cycle are controlled and regulated through complicated gene- and protein- interaction networks. Any pattern of irregular behavior of genes in the network can lead to cell malfunctioning, cell death, or the emergence of diseases like cancer. It is therefore of crucial importance to recognize erroneous gene interaction patterns and compare them to those in healthy cells. For this type of study, one of the most frequently used bioengineering systems is the well known DNA microarray device. DNA microarrays consist of grids of spots containing unique genetic identifiers for each of the tested genes, capable of generating snapshots of gene activity in terms of selective DNA sequence annealing. Microarrays have also found many other applications in the field of molecular biology, most notably for the purpose of detecting hostile microbial agents in food, water, and in the air. One of the main drawbacks of current microarray designs is that they are, for the purpose of whole genome studies, severely underutilized; similarly, for biosensing applications, existing microarray systems cannot be used for simultaneous identification of a large number of microorganisms and their strains due to technological limitations.
The investigators study novel array architectures, termed compressed sensing DNA microarrays. The research involves finding DNA probes that serve as group identifiers for classes of microorganisms; designing sparse sensing matrices for DNA group identifiers; developing compressed sensing reconstruction algorithms capable of handling saturation effects arising due to high agent concentration levels; characterizing the fundamental trade-offs between distortion and sensor dimension for non-linear arrays; and, analyzing the complexity of integrating compressed sensing microarrays into existing biosensor networks.
|
1 |
2009 — 2013 |
Eskin, Eleazar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Inference of Causal Regulatory Relationships From Genetic Studies @ University of California-Los Angeles
III:Small: Inference of Causal Regulatory Relationships from Genetic Studies
Inference of biological networks from high-throughput genomic data is a central problem in bioinformatics where many different types of methods have been proposed and applied to a wide diversity of datasets. Several recent studies have collected data which contain both genetic variation information as well as gene expression information from a set of genetically distinct strains of an organism which have several advantageous properties for inferring causal regulatory relationships between genes. A principled way of representing causal relationships is using graphical causal models and a rich theory of inference of such models from observational data and interventions has been developed. However, this theory assumes full knowledge of the joint distribution which is equivalent to having very large samples and so is only guaranteed to work asymptotically. In this proposal, the team will extend causal inference methods in several directions motivated by applications to genetic views of genomics datasets where there are relatively small samples. In particular they will apply their new methods to detecting the presence and absence of causal relationships between yeast genes. While the focus of this proposal is on applying the developed techniques to a specific problem in bioinformatics, the causal inference issues addressed in this proposal are the general issues faced when applying causal inference to finite samples. Many of the approaches developed in this proposal will be applicable to a wide range of problems. The resulting methods developed in this proposal will be made available to the scientific community through publicly available software.
The project involves the training of a graduate and undergraduate students. The collaborative nature of the project will expose the students to the medical and genetics worlds, and at the same time, it will improve their abilities to design and implement solutions to complex algorithmic and statistical problems. The research will be converted into course materials for the interdisciplinary course, Computational Genetics, which is taken by both undergrad and graduate students as well as students from the medical school.
|
1 |
2011 — 2017 |
Sahai, Amit (co-PI) [⬀] Eskin, Eleazar Ostrovsky, Rafail (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Medium: Private Identification of Relatives and Private Gwas: First Steps in the New Field of Cryptogenomics @ University of California-Los Angeles
The field of human genetics has undergone a revolution in the past 10 years with the advent of high-throughput genomic technologies which can measure human variation at low cost. The flagship application of these technologies has been the genome-wide association study (GWAS) where genetic variation information is collected from hundreds of thousands of individuals, a portion of which have a specific disease and a portion of which are healthy individuals. Identification of correlation between genetic variants with disease status has led to the identification of hundreds of new genes involved in dozens of human diseases. All applications of these technologies, including GWAS, require individuals to "share" their genetic data. In today's typical GWAS, thousands of individuals must consent to have their genetic information collected and incorporated into a database which also contains information on their disease status. Unfortunately, an individual's genetic data is extremely sensitive as it is considered medical information about an individual. In this proposal, the team addresses the natural tension between privacy and the application of personal genomics technologies by capitalizing on recent breakthroughs in cryptography. They present a novel technological approach to keep one's genetic data private, yet taking full advantage of genetic information - in a privacy-preserving way, by taking advantage of several techniques that have been recently developed in an area broadly referred to as secure computing, which address the problem of allowing a collection of individuals to compute some output that depends on all their inputs, without having to reveal their individual inputs to each other. The core of this proposal focuses on the application of secure computing to two specific problems in personal genomics: The first is the problem of identification of relatives from genetic variation information while preserving privacy of genetic material. The second, is the identification of disease causing variants without sacrificing individual patient's genetic privacy.
The development of the techniques presented in this proposal will have a profound impact on personal genomics and the field of genetics in general for several reasons. First, the easing of privacy fears will drop a major barrier to participation in personal genomics likely increasing the utilization of recent advances in genetic and genomic technologies for the public. This increased utilization will accelerate the medical benefits of these technologies. Second, the current thinking is that it is impossible to protect privacy in personal genomics and the results of this project will surprise many in the field, leading to a rethinking of the how to handle privacy in genetic studies. Finally, this research direction will likely lead to new problems and research directions for the cryptography research community and foster new collaborations between genetics researchers, cryptographers and mathematicians.
This project also contributes to training the next generation of interdisciplinary scientists. The investigators all teach advanced undergraduate courses in both genetics and cryptography and it is likely that the topics developed in this proposal will be included in the curriculum of the courses. In addition, the graduate students involved in this proposal will obtain interdisciplinary training in both genetics and computer science theory.
|
1 |
2013 — 2017 |
Eskin, Eleazar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bsf:2012304:Methods For Preprocessing Population Sequence Data @ University of California-Los Angeles
This project is funded as part of the United States-Israel Collaboration in Computer Science (USICCS) program. Through this program, NSF and the United States - Israel Binational Science Foundation (BSF) jointly support collaborations among US-based researchers and Israel-based researchers.
In recent years, many genetic studies have been performed, revealing many new associations between human genetic variation and complex diseases. These studies, referred to as genome-wide association studies, are limited to common genetic variants because the technology which collected the genetic variation was limited to only collecting common variants. There is evidence suggesting that rare variants have an important role in disease architectures. Recently, sequencing technologies have been introduced which are capable of collecting both genetic common and rare genetic variation. Sequencing technologies generate enormous amounts of data, raising new computational challenges. In this project, the PIs will develop methods for addressing these computational challenges including the design of efficient algorithms and the modeling of the sequencing process. In addition, the researchers will develop methods for incorporating rare variants into the analysis of genetic studies. The immediate broader impact of our project is the availability of these tools for general use by geneticists, leading to an improved understanding of the disease genetics. Particularly, the PIs will apply their methods to studies of non-Hodgkin's lymphoma, bipolar, dyslipidemia, neurodegenerative dementia, and Tourette syndrome, which will result in a direct impact on our understanding of these particular conditions.
Current computational methods for the analysis of sequencing data exist, however they are limited to the analysis of a single sample. In this project the PIs will design efficient computational methods for the analysis of sequence data across a population. For population samples, the tremendous size of the data requires the design of highly efficient algorithms in terms of memory and runtime. Specifically, the PIs propose to design algorithms for the compression of sequencing data, for the search of regions identical by descent across multiple samples, and for high-resolution haplotype inference from sequence data. The PIs will explicitly model rare variants and the sequencing process, and use machine learning techniques and convex optimization to estimate the model parameters efficiently. These methods will allow for a fine-scale analysis of population data, resulting in improved understanding of complex diseases and human history. The collaborative nature of the project will expose the students involved in the project to the medical and genetics worlds, both in Israel and in the US, and it will improve their abilities to design and implement solutions to complex algorithmic problems. The methods developed in this project will be part of the teaching material of courses in UCLA and Tel-Aviv, and these materials will be made publicly available.
|
1 |
2013 — 2015 |
Eskin, Eleazar |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Correcting For Population Structure in Gene-by-Environment Interaction Studies @ University of California Los Angeles
DESCRIPTION (provided by applicant): Over the past few years genome-wide association studies (GWASs) have identified numerous genes associated with common human diseases. In these studies, genetic variation in thousands of individuals is collected and correlated with the disease status in these individuals. A challenging aspect of GWAS is that the collected individuals are related to each other by differing degrees. This can lead to spurious associations which are genes that appear to be associated with the disease, but in fact are an artifact of the relatedness between individuals. Several methods have been proposed to address this problem and are implemented in publicly available software packages. Environmental factors often interact with genetic variation to increase risk of disease. Identifying these interactions, referrd to as gene-by-environment (GxE) interactions, is now a major focus of research in both human studies and model organism studies. Discovering GxE interactions can provide insight into disease pathways, an understanding of the effect of environmental factors in disease, better risk prediction and personalized therapies. Model organisms such as mouse are ideal environments for studying GxE interactions because environmental exposures can be carefully controlled. Unfortunately, for the same reasons that relatedness can cause spurious associations in association studies, relatedness can cause spurious gene-by- environment interactions. In this proposal we propose to develop methodology that corrects for relatedness in studies that search for gene-by-environment interactions. The results of our project will be a set of methods that are can detect gene-by-environment interactions consistently even when the individuals in the study are related. These methods can then be widely used by many researchers involved in studies to discovery gene-by-environment interactions. We will apply our developed methods to the Minnesota Center for Twin and Family Research (MCTFR) data to investigate how gene-environment interplay influences the development of substance abuse (SA) and to mouse genetic studies investigating the genetic factors which influence response to high fat diet and susceptibility to heart failure. We will make implement our methods available to the research community through publicly available software packages and webserver resources.
|
1 |
2013 — 2017 |
Eskin, Eleazar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Causal and Statistical Inference in the Presence of Confounding Factors @ University of California-Los Angeles
Technical: The presence of unmeasured confounding factors can result in incorrect statistical and causal inferences if the confounding factors are correlated with the observed data. This phenomenon has been well documented in at least two important applications. One application is identifying genetic variation involved in disease from populations of related individuals. A second application is identifying genes active in a disease when comparing disease and health samples. In this proposal we propose a new approach to correct for unobserved confounders in taking advantage of insights into how confounders affect high dimensional data. These insights motivate a formal definition for a specific type of confounder which we term a 'low-rank confounder.' Formalizing this definition allows us to motivate methods for correcting for the effects of these types confounders even when the confounders are not observed. Our proposal will develop a theory of how confounders affect data and under what conditions unobserved confounders can be corrected. The proposed theory is related to recent developments in understanding sparsity which has been well studied in electrical engineering, computer science and statistics. The result of our proposed methods will lead to improved methods for applications where such confounders are present.
Non-technical: Inference of knowledge from high dimensional data is a fundamental problem affecting virtually all areas of science including physics, astronomy, chemistry, computer science, social science and many areas of biology. Many of these problems are driven by recently available large sources of data and advances in measurement or data collection technologies. A major challenge is the presence of unknown (and unmeasured) confounding factors. Confounding factors are variables that are often not observed in the data, but are correlated with various features of the data. Unfortunately, confounding factors can cause incorrect inferences. This phenomenon has been well documented in at least two important applications: one application is identifying genetic variation involved in disease from populations of related individuals, and a second application is identifying genes active in a disease when comparing disease and health samples. There are traditional approaches to perform inference if the confounders are observed in the data. However, dealing with unobserved confounders is more difficult. This project will develop and study a new approach to correct for unobserved confounders, taking advantage of insights into how confounders affect high dimensional data. The project has broad impact due to its utility in a wide range of scientific questions, through the interdisciplinary research opportunities provided to undergraduate and graduate students, and through the distribution of software and data.
|
1 |
2013 — 2017 |
Pearl, Judea (co-PI) [⬀] Eskin, Eleazar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Medium: Meta-Analysis Reinterpreted Using Causal Graphs @ University of California-Los Angeles
Statistical conclusions from research studies may often be misleading due to a variety of reasons including small sample sizes for the studies or confounding factors which are unknown to the investigators of the study. One way to reduce the possibility of misleading conclusions is to combine the results of multiple research studies using a technique referred to as "meta-analysis." Meta-analysis is one of the most widely used techniques to infer knowledge from data in science. The idea behind meta-analysis studies is that the combined statistical conclusions from multiple research studies reflect the information in all of the studies and are more likely to be accurate. The conclusions from meta-analyses are considered "better" or "more likely to generalize" than conclusions from single studies. However, this notion is not well formalized and formalizing this question is a goal of this project. In addition, existing meta-analysis methods do not take into account any knowledge of the similarities and differences between the studies. Taking advantage of these similarities and differences can improve the effectiveness of meta-analysis.
This project takes advantage of recent developments in the area of "causal inference" which is the study inferring cause and effect relationships from data. These types of inferences utilizes a type of graph called a causal graph which graphically represents cause and effect relationships. This project develops an alternate framework for meta-analysis based on a novel type of causal graph, a selection graph. A selection graph formally represents the similarities and differences between the studies. This project provides a unifying framework and powerful powerful methodology for meta-analysis. The methods developed in this project are applied to genetic studies where meta-analyses have discovered thousands of variants involved in common human disease in the past few years.
Causal graphs have had a major impact on the way causality is taught and understood in cognitive science, statistics, and the health and social sciences. The proposed research promises to have similar impacts by transforming the approach to meta-analysis, one of the work horses of statistical inference in the physical, life and social sciences. The resulting techniques will be used to perform meta-analyses of genetic studies which can lead to the discovery of variation involved in disease. The results of the project, including publications, software, data sets, and course materials will be made freely available through the project web site: http://zarlab.cs.ucla.edu/causal-meta-analysis/.
|
1 |
2014 — 2017 |
Reinman, Glenn (co-PI) [⬀] Eskin, Eleazar Cong, Jason [⬀] Bui, Alex Chang, Mau-Chung (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Accelerator-Rich Architectures With Applications to Healthcare @ University of California-Los Angeles
Many healthcare applications present significant computational challenges. For example, the computational demand for personalized cancer treatment is prohibitively high for the general-purpose computing technologies, as tumor heterogeneity requires great sequencing depths, structural aberrations are difficult to detect with today's methods, and the tumor has the ability to evolve i.e., the same tumor might be assayed a great many times during the course of treatment. The goal of this project is to apply the domain-specific customized computing techniques developed by the Center for Domain-Specific Computing (CDSC) at UCLA to greatly accelerate computation for some key healthcare applications.
The CDSC, established in 2009 with the support of the NSF, looks beyond parallelization, and focuses on domain-specific customization as the next disruptive technology for power-performance efficiency improvement. In the past four years, CDSC has demonstrated significant performance and energy efficiency with innovation in developing customizable heterogeneous computing technologies. The current proposal under the NSF Innovation Transition program leverages the research results from CDSC, and focuses on key research problems and solutions to make domain-specific customizable computing feasible and practical for innovation transition to the industry, Specifically, the project will develop accelerator-rich architectures along with unified adaptive runtime systems for personalized cancer treatment, medical image processing, and will enable deployment in several energy efficient programmable platforms capable of handling huge volumes of state of the art real time patient data.
The center will continue its already successful outreach program, through a partnership with the UCLA Center for Excellence in Engineering and Diversity, to involve highly diversified high school and undergraduate students for summer research. The success of our project will enable significant advances in medical imaging analysis and personalized cancer treatment, which will greatly improve healthcare quality while reducing cost. The participation of the industrial partner in this InTrans project will greatly facilitate the innovation transition of research results from this project to industry for energy-efficient computing.
|
1 |
2015 — 2019 |
Eskin, Eleazar |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Mathematical and Computational Approaches in High-Throughput Genomics Training @ University of California Los Angeles
? DESCRIPTION (provided by applicant): The recent advances in high-throughput genomic technology have profoundly transformed biomedical research and provided tremendous opportunities for mathematical and computational research to contribute to biomedical research. The next generation of biomedical scientist will have significant strength in both the biological sciences and the mathematical and computational sciences. However, training such interdisciplinary researchers is very difficult. Most graduate programs and other training programs focus heavily on one of these disciplines. We propose to partner with the UCLA Institute for Pure and Applied Mathematics to offer an educational program targeted at fostering interdisciplinary research. This training program will consist of a one week didactic portion covering current research in the intersection of biological and computational research. The remainder of the program will be three week research practicum designed to provide training in research and provide participant's access to leading researchers as mentors.
|
1 |
2015 — 2018 |
Eskin, Eleazar |
P30Activity Code Description: To support shared resources and facilities for categorical research by a number of investigators from different disciplines who provide a multidisciplinary approach to a joint research effort or from the same discipline who focus on a common research problem. The core grant is integrated with the center's component projects or program projects, though funded independently from them. This support, by providing more accessible resources, is expected to assure a greater productivity than from the separate projects and program projects. |
Informatics Center For Neurogenetics and Neurogenomics - Computing Core @ University of California Los Angeles
PROJECT SUMMARY This application is to renew an NINDS P30 Institutional Center Core, the NINDS Informatics Center for Neurogenetics and Neurogenomics, ICNN. The ICNN, founded in 2009, facilitates research in genetics and genomics for members of the large and highly interactive neuroscience community at UCLA. Over the past funding cycle, ICNN has 1) accelerated the progress of ten projects from eight NINDS-funded investigators specified in the initial application; 2) through an ARRA supplement and a sales and service agreement, extended its scope and support to additional neuroscientists at UCLA, so that is it is now involved in one third of all the NINDS R01s awarded to UCLA neuroscientists, essentially all the ones encompassing genomic studies; 3) contributed to 29 publications (+8 submitted), and multiple successful grant applications, including mentoring on training awards; and 4) expanded its faculty and analysis portfolio, to emphasize collaborations between leaders in computational biology and neurogenetics, and the implementation of innovative methods for data analysis. In summary, ICNN, in its initial award period, laid the foundation for a long-term, self- sustained entity supporting genomics neuroscience research at UCLA and has facilitated a number of projects and training grants. With an expanded faculty and a solid computational infrastructure put in place in the previous funding cycle, ICNN proposes in the renewal to continue to support NINDS grants (15 from 9 named users), to continue to develop innovative genetics and genomics approaches, and to continue to foster neurogenetics and neurogenomics for the neuroscience community at UCLA.
|
1 |
2015 |
Eskin, Eleazar |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Accounting For Effect Size Differences Between Sexes in Genome-Wide Association Studies Using Mixed Models and Meta-Analysis @ University of California Los Angeles
Over the past few years, genome-wide association studies (GWASs) have identified numerous genes associated with common human diseases. In these studies, genetic variation in thousands of individuals is collected and correlated with the disease status in these individuals. Environmental factors often interact with genetic variation to increase risk of disease. Identifying these interactions, referred to as gene-by-environment (GxE) interactions, is now a major focus of research in both human studies and model organism studies. Discovering GxE interactions can provide insight into disease pathways, an understanding of the effect of environmental factors in disease, better risk prediction and personalized therapies. Model organisms such as mouse are ideal environments for studying GxE interactions because environmental exposures can be carefully controlled. In this proposal we propose to develop methodology that determine whether or not gene-by-environment interactions are present and can quantify the total amount of these interactions. The results of our project will be a set of methods that can then be widely used by many researchers involved in studies to discovery gene-by-environment interactions. We will apply our developed methods to the Minnesota Center for Twin and Family Research (MCTFR) data to investigate how gene-environment interplay influences the development of substance abuse (SA) and to mouse genetic studies investigating the genetic factors which influence response to high fat diet and susceptibility to heart failure. We will make our methods available to the research community through publicly available software packages and webserver resources.
|
1 |
2016 — 2020 |
Eskin, Eleazar Freimer, Nelson B. |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Undergraduate Research Experience in Neuropsychiatric Genomics @ University of California Los Angeles
? DESCRIPTION (provided by applicant): This proposal is to develop a UCLA summer undergraduate research experience (R25) in Next Generation Sequencing (NGS) analysis. NGS has revolutionized the field of genomics and is increasingly permeating all aspects of biosciences research and medicine. This development has triggered a huge need for data analysis skills among researchers and professionals. As biosciences graduate programs and biomedical professional schools are rapidly adapting their curricula towards a greater emphasis on quantitative analysis skills, it has become apparent that the current pool of applicants who have an understanding and hands-on experience in quantitative analysis of `omic datasets remains exceedingly shallow. Indeed, without prior computational biosciences research experience it is difficult to evaluate applicants for their aptitude in quantitative analysis skill. In this post-genome era of biosciences research, a lack of aptitude in quantitative analysis skills will hamper graduate students and biomedical researchers through their careers. The proposed research education plan will address the need for an applicant pool with the hands-on experience in genomics analysis necessary to be successful in either graduate or professional schools. Specifically this new R25 program will leverage existing infrastructure that provides such an experience through workshops in NGS analysis: the Bruins in Genomics (BIG-SUMMER) program of the UCLA Quantitative and Computational Biology Institute (QCB). The new R25 program will enable UCLA to provide undergraduates, recruited nationally, with a specialized summer experience focused on neuropsychiatric genomics (BIG-NPG). BIG-NPG will combine the NGS workshops of the QCB with individual mentored research projects and journal clubs overseen by faculty in the UCLA Center for Neurobehavioral Genetics (CNG). These faculty are investigating the genetic and genomic basis of neuropsychiatric disorders through several collaborative NIMH grants. These projects will offer the undergraduate students the opportunity to conduct analyses on exceptional datasets covering a wide range of NGS applications (including whole genome sequencing, transcriptome sequencing, and epigenome sequencing). Participating in the program will enable students to gain a potentially career defining research experience in the genomics of neuropsychiatric disease.
|
1 |
2017 — 2021 |
Eskin, Eleazar Halperin, Eran Sul, Jae Hoon |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Medium: Detecting Low Dimensional Structures in Genomic Data @ University of California-Los Angeles
New sequencing technologies have made genomics a big data science. These data have complexity and represent many variables. In trying to get biological information from genomic sequence, it is often necessary to reduce the complexity. There are a number of different approaches to use computationally, but these often introduce errors because of assumptions made about the data. This project will lead to the development of novel approaches specific to the type of genomic data collected. One of these types of data represents the DNA sequence and the other comes from natural modifications to the sequence when genes are expressed. These new methods will identify important differences more accurately in the two data types by correctly modeling unique properties of these data in a statistical framework. Methods developed during this project will have a great impact on the genomics field, where researchers may discover the genetic basis of complex diseases. The broader impacts of this project are gaining a deeper insight into the genetic basis of complex diseases, distributing the novel methods through public webservers and software tools for academic research and educational purposes, and training undergraduate students, graduate students, and postdoctoral scholars. In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.
Discovering a low dimensional structure from the high dimensional genomic data is a very important procedure in genomic studies because this structure may infer unknown confounding factors in genomic data as well as other important properties of data such as ethnicity of individuals. There are several dimensionality reduction methods prevalently used in the genomics, they may not generate an accurate low dimensional structure from genomic data because their underlying assumption on the statistical model is often violated in the data. This project proposes to develop dimensionality reduction methods aimed for genomic data, especially for methylation and genotype data. These methods will incorporate unique properties present in genomic data such as the discrete nature and correlation structure of genotype data, and different methylation patterns across different cell types and tissues. This project will also analyze asymptotic behavior of the novel methods using random matrix theory. Three strategies will be used to validate the methods. First, for all genomics applications, there are datasets where there is gold standard information, Second, simulated data based on current practices in the genomics community will be used to perform evaluate genomics applications. For example, it is standard in the community to simulate the genetics of admixed individuals by combining the genotypes of individuals of known ancestry from a reference dataset such as the 1000 Genomes project. Third, the team will evaluate the general algorithms by generating simulated data using various generative models to validate that the algorithms have the asymptotic behavior expected and also examine how these algorithms perform when their assumptions are violated. The methods will contribute both to the statistical field by improving current low dimensionality methods and to the genomics field by releasing software tools. The broader impacts of this project are gaining a deeper insight into the genetic basis of complex diseases, distributing the methods through public webservers and software tools for academic research and educational purposes, and training undergraduate students, graduate students, and postdoctoral scholars. In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.
|
1 |
2017 — 2021 |
Eskin, Eleazar Jentsch, J. David Smith, Desmond James [⬀] |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Genetic Pathways For Impulsivity and Drug Reinforcement: Dna and Transcriptome Variation in Mice @ University of California Los Angeles
PROJECT DESCRIPTION This project will chart with high-resolution the genetic pathways that give rise to defined clinical phenotypes related to cocaine addiction. We will utilize the power of the hybrid mouse diversity panel (HMDP), combined with high quality behavioral phenotyping and massive-scale RNA sequencing, to trace the interconnections from DNA to RNA to clinical trait and provide layered information on the mechanisms of cocaine abuse. The HMDP consists of 100 inbred and recombinant inbred strains and has a wide array of meiotic breakpoints. The panel has been densely genotyped with more than 200,000 single nucleotide polymorphisms (SNPs), allowing very fine mapping of quantitative trait loci (QTLs). Further, the HMDP is genetically stable and renewable and can be assayed for multiple phenotypes, yielding cumulative biological information. We will expand our previous HMDP-based studies of cocaine abuse-related traits by adding massive-scale RNA sequencing (RNA-Seq), in order to detail the genetic control of the pathways to addiction. Our approach offers the distinct advantage of mapping loci for known pathways as well as those involving variant and exotic RNA species. The following aims are proposed: (1) We will quantitate intravenous self-administration of cocaine in HMDP mice. This phenotype is regarded as being one of the most faithful models of cocaine abuse in animals and will allow evaluation of QTLs that regulate addiction-related phenotypes. (2) We will perform RNA-Seq on two key areas of the brain that play a role in the control of drug abuse, the medial prefrontal cortex (mPFC) and nucleus accumbens (NAc) shell region. (3) We will analyze the combined datasets with powerful statistical tools to understand the genetic regulation of drug abuse-related phenotypes. These studies will map genetic networks and inter-tissue regulatory pathways for cocaine addiction and suggest new, highly specific therapeutic strategies.
|
1 |
2019 — 2022 |
Eskin, Eleazar Halperin, Eran Flint, Jonathan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii:Small: Replication Studies For High Dimensional Data: Insights Into Confounding and Heterogeneity @ University of California-Los Angeles
In order for the scientific community to reach consensus on a scientific finding, the finding must be replicated in multiple studies by different groups. Unfortunately, not all scientific studies successfully replicate. When a scientific finding is reported, its confidence is quantified by a p-value. In principal, p- values should quantify how often a study should replicate. Over the past decade, researchers have shown that scientific studies replicate at a much lower rate than the reported p-values predict. This has led to a vigorous discussion on the causes of replication failures as well as developing guidelines for study design to improve the replication rate. In this project, the research team will show that when studies are collecting large amounts of data, it is possible to use this data to identify differences between the studies and gain some insight into why studies do or do not replicate. This information can be used to improve the individual studies and increase the replication rate of the resulting findings. As replication is a fundamental tool in scientific discovery, developing new approaches to analyzing replication studies will have an impact in many areas of science. The team has a long standing interest in involving undergraduate students in their research as well as working to broaden the diversity of participants.
In this project, the replicability of high dimensional studies is considered. In a high dimensional study, not only one p-value is reported, but typically thousands or even millions of p- values are reported in each study. Genomic studies are a motivating example of high dimensional studies as genomic data is inherently high dimensional and thus in genomic studies, a p-value is computed for each genomic features such as a gene expression level or genetic variant. Typically, in a genomic study, out of all of the p-values, only a small subset of them are considered significant (taking into account for multiple testing). When a replication study is performed, the features of interest are the features which were significant in the original study. The key idea behind this project is that there is information on all reported features, even those that are not significant. By analyzing them, insights can be obtained about the studies and these insights can both improve the replication rate as well as the analysis of each of the studies. The framework can be leveraged to address the following problems: (1) Reduce the effect of confounders in each replicate -- improving power and reducing false positives; (2) Accounting for ascertainment biases in the reported results; and (3) Interpreting the differences between each replicate or study to gain insights into the underlying causes of the difference. The approach will be evaluated using 5 genomic datasets.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2021 |
Eskin, Eleazar Ophoff, Roel A |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Undergraduate Research Experiences in Neurogenetics and Neurogenomics @ University of California Los Angeles
The proposed program is to provide a summer research experience in next generation sequencing (NGS) analysis for neurogenetics and neurogenomics for 10 undergraduate students. NGS has transformed the field of genomics and is increasingly permeating all aspects of biosciences research and medicine. Using NGS to understand the brain and neurological disease is one of the major frontiers in biomedical science and will drive advances in the field in the near future and for decades to come. There will be a huge need for researchers in neuroscience with strong data analysis skills relevant to genomics. This need is evident in the increasing emphasis on quantitative analysis skills in biosciences graduate programs as well as in biomedical professional schools. To address this need, we will establish the program called ?Bruins in Genomics in Neurogenetics and Neurogenomics (BIGNGG)?. It will leverage the infrastructure of an existing summer program ?Bruins in Genomics (BIGSummer)?, supported by the Institute for Quantitative and Computational Biosciences (QCB) at UCLA, and the QCB Collaboratory. BIG Summer consists of a combination of practical workshop tutorials that focus on NGS data analysis skills, and a hands-on research experience in genomics. The proposed R25 program will allow 10 additional students to participate in BIG, and specifically to learn about the growing and cutting-edge research field of neurogenetics and neurogenomics. The students will be quantitative or biosciences majors entering their junior and senior years. We expect that many of our participants will have this research experience serve as a launching point for continuing in biomedical research in either quantitative biosciences Ph.D. programs such as a Bioinformatics Ph.D. program or a Neuroscience Ph.D. program.
|
1 |
2020 |
Eskin, Eleazar Halperin, Eran |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Computational Genomics Summer Institute and Mentoring Network @ University of California Los Angeles
Abstract The goal of the program is to provide training in computational methods development in genomics and related areas to researchers at the graduate student or postdoctoral level as well as provide exposure of the participants to faculty from multiple institutions with the goal of providing opportunities for career transitions (graduate student to postdoc and postdoc to faculty). While there are plenty of short courses that provide training in how to use computational methods to analyze data, very few exist to provide training on how to develop such methods and our program will fill this need. With more and more types of data becoming available in large quantities, the ability to develop new and adapt existing methods in innovative ways is a critical skill for a successful researcher. Our program is useful for those trainee researchers whose career focus is computational methods development, but will also be useful for those researchers who want to add a methods development component to their research program. Our program will consist of a one month summer program at the University of California, Los Angeles integrated with a year round mentoring program building upon relationships that are established during the one month program. An outreach program is integrated into the program.
|
1 |
2021 — 2025 |
Pearl, Judea (co-PI) [⬀] Eskin, Eleazar Sankararaman, Sriram (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Medium: Causal Inference in Biobanks: Leveraging Genetics to Infer Causal Relationships Using Electronic Health Records @ University of California-Los Angeles
The past several years have witnessed major efforts to collect genetic data from patients in large health systems to enable research aimed at improving patient health. This data can potentially identify risk factors that cause disease and improve treatment. However, the observational nature of these datasets makes such inferences challenging due, in part, to the difficulty of differentiating between correlation and causation which can obscure true relationships. This project will utilize and extend recently developed techniques in causal inference to allow for the identification of causal relationships within the medical data and overcome this difficulty. Advancing this research is critical for improving the outlook for individuals who suffer from today’s most prevalent common, complex disorders and will also provide general insights into the analysis of observational data. The project leverages efforts at UCLA to broaden participation in computing and will incorporate graduate and undergraduate students from diverse backgrounds.
We propose to leverage modern techniques for causal inference coupled with the unique characteristics of genetic data collected in Biobanks to solve three key problems in biomedicine and epidemiology: the identification of risk factors for disease, predicting likely responders to a potential treatment, and identifying latent disease subtypes. The advance in causal inference that is directly relevant to our problem is the development in theory on causal graphs as a unifying framework to represent and reason about causal effects. We will use these graphs to test and estimate causal relationships between relevant exposures measured in the biobank and diseases (for example, LDL cholesterol and heart attack). Crucially, we will leverage the availability of genetic data to serve as causal anchors (or instrumental variables) that can enable the estimation of causal effects even in the presence of confounders expanding the technique of mendelian randomization that is widely used in epidemiology.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |