1990 — 1992 |
Roeder, Kathryn |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Mixture Analysis
The research in mathematical statistics is divided into three parts: testing for overdispersion in discrete distributions, testing that a sample comes from a normal distribution rather than a finite mixture of normal distributions, and semi-parametric estimation of a mixed normal density. A new test will be constructed for the presence of a mixture. Also a new test and diagnostic for the number of components in a mixture will be derived. This will be achieved by exploiting the geometry of ratios of mixed and unmixed densities. A further goal is to estimate the mixed normal density using an approach in which a smoothing parameter is selected via cross- validation.
|
0.97 |
1992 — 1994 |
Pollard, David Hartigan, John [⬀] Chang, Joseph Roeder, Kathryn |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences Computing Research Environments
The Department of Statistics at Yale University will purchase workstations and peripherals that will be dedicated to support research in the mathematical sciences. The research to be supported lies in: 1. Product partition models for evaluating and selecting classifications; 2. Theory of empirical processes, with applications to econometrics and semiparametric models 3. Mixture models, the development of graphical diagnostics and formal tests for identifying the presence of mixing in generalized linear models, and for determining the numbers of components of finite mixtures; 4. Probabilistic models for molecular evolution, including alternatives to the traditional branching tree.
|
0.97 |
1992 — 1998 |
Roeder, Kathryn |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Nsf Young Investigator @ Carnegie-Mellon University |
1 |
1992 — 1996 |
Roeder, Kathryn |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Semiparametric Mixture Models
This research on mixture models will develop graphical diagnostics and formal tests to identify the presence of mixing in generalized linear models and to determine the number of mixing components for finite mixtures. Methods will also be developed to incorporate full information for case-control studies with errors in variables. The theory will draw on ideas and techniques from nonparametric maximum likelihood, empirical processes, asymptotics, convex geometry, total positivity and from simulation. Mixture distributions arise when homogeneous populations are combined in such a way that the origins of individuals are lost. A fundamental problem involves determining whether or not this combining has occurred and if so, how many different originating populations there were. This research will investigate graphical techniques and will use mathematical theory to develop practical statistical methods to answer these kinds of questions in many scientific applications.
|
0.97 |
1994 — 1996 |
Pollard, David Hartigan, John [⬀] Barron, Andrew (co-PI) [⬀] Chang, Joseph Roeder, Kathryn |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences Computing Research Environment
9406617 HARTIGAN The Department of Statistics at Yale University will purchase a SPARCstation 10 (Model 512MP) server, two X-terminals a laser printer and peripherals. This equipment will be dedicated to supporting research in the mathematical sciences, including (i) neural network computations; (ii) probabilistic models for constructing and evaluating classifications, and the role of classification in data collection and justification of probability distributions; (iii) exact finite sample confidence bands for nonparametric functionals; (iv) symbolic computation to guide asymptotic theory; (v) information in semiparametric mixtures and hierarchical mixture models.
|
0.97 |
1998 — 2001 |
Kass, Robert [⬀] Roeder, Kathryn Wasserman, Larry (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bayesian Inference and Mixture Models @ Carnegie-Mellon University
9803433 Robert E. Kass
This research is focused on reference Bayesian methods (Bayesian inference with prior distributions chosen by some formal rule), mixture models, Bayes factors, and causal inference, with an emphasis on hierarchical models (including classical mixed models and their generalizations). Both parametric and nonparametric or semiparametric models are studied. Many of the results are obtained by asymptotic methods, but ``exact'' computation (typically via simulation) also play a substantial role.
Elaboration of simple statistical models has been a major theme in the discipline in the latter part of this century. Previously, models have involved a small number of parameters, the values of which have been determined from observed data. With increased computing power, more complicated statistical models involving many more parameters have become central to much current statistical activity. Yet, despite recent progress, fundamental issues remain. This research is motivated in part by problems in statistical genetics, cognitive neuroscience, and the study of criminal behavior.
|
1 |
1999 — 2004 |
Greenberg, James Eddy, William [⬀] Eddy, William (co-PI) [⬀] Kass, Robert (co-PI) [⬀] Lehoczky, John (co-PI) [⬀] Williams, William (co-PI) [⬀] Roeder, Kathryn Shreve, Steven (co-PI) [⬀] Junker, Brian (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Vigre: Vertical and Horizontal Integration of Research and Education in Statistics and Mathematical Sciences At Carnegie Mellon @ Carnegie-Mellon University
9819900 Eddy
At Carnegie Mellon University, the Department of Statistics and the Department of Mathematical Sciences will build on their complementary strengths to develop a joint, vertically-integrated program of education and research. Responding to national needs, the program will (i) train postdoctoral fellows for careers emphasizing research in settings that require versatility, (ii) aim to recruit and retain U.S. graduate students, avoiding excessive time to complete Ph.D.s while providing students with a high probability of success after graduation, and (iii) help increase the numbers of U.S. undergraduates, including women and minorities, who pursue advanced degrees in mathematical and statistical sciences. The program emphasizes cross-disciplinary research and understanding the needs of learners in a context of disciplinary advancement. Many of the activities grow from two previously-funded Group Infrastructure Grants to our respective departments, and from a very successful Undergraduate Summer Research Institute in Applied Mathematics. For instance, we plan to use the graduate support model from the infrastructure grant to Mathematical Sciences, we will expand the operation of the Summer Institute to include students from Statistics, and we will adapt for Mathematical Sciences some of the postdoctoral mentoring procedures that have worked well in Statistics.
Our evaluation of this training program will assess the following: involvement of undergraduates in meaningful research experiences; its success in producing acceptable average time-to-degree for VIGRE graduate trainees; its effectiveness in expanding the mathematical horizons and career opportunities of students at both the undergraduate and the graduate levels, with particular focus on the graduate program; its effectiveness at the postdoctoral level in preparing VIGRE postdoctoral fellows for careers as professional mathematical scientists; its effectiveness in developing the communications skills of VIGRE participants; the effectiveness of the mentoring of undergraduate students, graduate trainees, and postdoctoral fellows; overall effectiveness of the research teams and other efforts to integrate research and education; the effectiveness of the partnership-in-training between the Departments of Statistics and Mathematical Sciences; and the degree of involvement of women and minorities.
Funding for this award is provided by the Division of Mathematical Sciences and the MPS Directorate's Office of Multidisciplinary Activities.
|
1 |
2000 — 2005 |
Nagin, Daniel [⬀] Roeder, Kathryn Tremblay, Richard |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Analyzing Developmental Trajectories With Mixture Models: a Second Generation of Models and Software @ Carnegie-Mellon University
A developmental trajectory describes the course of a behavior over age or time. This project builds upon prior NSF funded research (SES-9511412) that developed a semi-parametric, group-based approach for identifying distinctive groups of individual trajectories within the population of interest, and also produced "canned" SAS-based software for estimation of the trajectory models. This project will extend the research in several important ways. First, the univariate trajectory models and software will be extended to include a random effect component. The software also will be expanded to incorporate a previously developed non-parametric trajectory model. Second, a next generation of joint trajectory models will be developed. Work to date has focused on developing group-based models appropriate for analyzing the developmental course of a single behavior. A key goal of this research is to generalize univariate trajectory models to allow the joint estimation of trajectories of two distinct but related behaviors, for example, physical aggression in childhood and violent delinquency in adolescence. Here again the capacity for joint trajectory estimation will be incorporated into the already available software package. Third, alternative measures for evaluating model goodness of fit will be explored. Such measures are intended to complement the Bayesian Information Criterion (BIC) as a basis for model selection.
This methodological research program is motivated by substantive problems from criminology, developmental psychology, and psychiatry, such as co-morbidity and heterotypic continuity. Co-morbidity refers to the contemporaneous occurrence of two or more undesirable conditions such as physical aggression and hyperactivity during childhood. Heterotypic continuity refers to the inter-temporal manifestation of a latent individual trait in different but analogous behavioral forms. For example, a latent propensity for violence may reveal itself as kicking and biting siblings during early childhood, gang fighting during adolescence, and spouse abuse during adulthood. The form and target of the aggression is different but the constant is physical violence. Because of the changing form of the manifestation, use of the same measurement scale at different stages of life is inappropriate for capturing such a tendency. The themes of co-morbidity and heterotypic continuity will be explored in analyses of major prospective longitudinal data sets from around the world.
|
1 |
2000 — 2001 |
Eddy, William (co-PI) [⬀] Schervish, Mark (co-PI) [⬀] Roeder, Kathryn Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Scientific Computing Research Enviroments For the Mathematical Sciences @ Carnegie-Mellon University
ABSTRACT
Project Summary
The department of Statistics at Carnegie Mellon University will purchase a cluster of 32 Dual processors computers which will be used for several research projects, including in particular:
1. Computational Astrostatistics by Larry Wasserman and Chris Genovese
2. Statistical Genetics and Evolutionary Simulations by Kathryn Roeder, Bernie Devlin and Larry Wasserman
3. Data Analytic Approach to Seismic Imaging by William F. Eddy, Mark Schervish and Pantelis Vlachos
4. Parallelized Spatio-temporal Analyses of Functional Magnetic Resonance Data by Chris Genovese, William F. Eddy and Nicole Lazar
|
1 |
2001 — 2004 |
Kass, Robert (co-PI) [⬀] Roeder, Kathryn Wasserman, Larry [⬀] Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Complex Statistical Models: Theory and Methodology For Scientific Applications @ Carnegie-Mellon University
Complex Statistical Models: Theory and Methodology for Scientific Applications
Larry Wasserman, Christopher Genovese, Robert E. Kass and Kathryn Roeder
ABSTRACT
This project is aimed at developing statistical theory and methodology for highly complex, possibly infinite dimensional models. Although the methodology and theory will be quite general, we will conduct the research in the context of three scientific collaborations. The first is ``Characterizing Large-Scale Structure in the Universe,'' a joint project with astrophysicists and computer scientists. The main statistical challenges are nonparametric density estimation and clustering, subject to highly non-linear constraints. The second project is ``Locating Disease Genes with Genomic Control.'' We aim to locate regions of the genome with more genetic similarity among cases (subjects with disease) than controls. These regions are candidates for containing disease genes. Finding these regions ina statistically rigorous fashion requires testing a vast number of hypotheses. We will extend and develop recent techniques for multiple hypothesis testing. The third projects is ``Modeling Neuron Firing Patterns.'' The goal is to construct and fit models for neuron firing patterns, called spike trains. The data consist of simultaneous voltage recordings of numerous neurons which have been subjected to time-varying stimuli. The data are correlated over time and a major effort is to develop a class of models, called inhomogeneous Markov interval (IMI) process models, which can adequately represent the data.
Statistical methods for simple statistical models with a small number of parameters are well established. These models often do not provide an adequate representation of the phenomenon under investigation. Currently, scientists are deluged with huge volumes of high quality data. These data afford scientists the opportunity to use very complex models that more faithfully reflect reality. The researchers involved in this proposal are developing methodology and theory for analyzing data from these complex models. The methods are very general but they are being developed for applications in Astrophysics, Genetics and Neuroscience.
|
1 |
2003 — 2009 |
Kass, Robert (co-PI) [⬀] Roeder, Kathryn Junker, Brian [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Vigre in Statistics At Carnegie Mellon @ Carnegie-Mellon University
At Carnegie Mellon University, the VIGRE program of the Department of Statistics will involve all trainees in supervised, cross-disciplinary research, where they will learn how to translate a research question into well-posed statistical problems, solve these problems, and translate the results back into a product that is accessible to the relevant scientific community. This skill is also central to learning basic statistics and forms a conceptual link between research and education, facilitating their integration. At the graduate level experience in the process includes a year-long project, typically with a faculty member in another domain, while a Statistics faculty member serves as advisor; provides a series of steps to improve communication skills and teaching effectiveness, and mentors in the area of professional growth. The graduate curriculum will be modified to make it more effective in building cross-disciplinary skills. Undergraduates will have several new courses available and will be involved in a capstone research project, and we will run a summer program, emphasizing minority students. Postdoctoral fellows will be involved in research projects, and will co-teach courses with senior faculty. They will also participate in structured mentoring sessions.
Undergraduate, graduate, and postdoctoral trainees will be integrated in research teams. The Carnegie Mellon VIGRE program in Statistics will (i) train postdoctoral fellows for careers emphasizing research in settings that require versatility, (ii) recruit and retain U.S. graduate students, avoiding excessive time to complete Ph.D.s while providing students with a high probability of success after graduation, and (iii) help increase the numbers of U.S. undergraduates, including women and minorities, with advanced training in statistical science. While maintaining a strong disciplinary foundation for statistical practice, the program emphasizes cross-disciplinary research and understanding the needs of statistical novices.
|
1 |
2010 — 2012 |
Eddy, William (co-PI) [⬀] Kass, Robert [⬀] Roeder, Kathryn Wasserman, Larry (co-PI) [⬀] Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Emsw21-Rtg: Statistics and Machine Learning For Scientific Inference @ Carnegie-Mellon University
Statistics curricula have required excessive up-front investment in statistical theory, which many quantitatively-capable students in ``big science'' fields initially perceive to be unnecessary. A training program at Carnegie Mellon will expose students to cross-disciplinary research early, showing them the scientific importance of ideas from statistics and machine learning, and the intellectual depth of the subject. Graduate students will receive instruction and mentored feedback on cross-disciplinary interaction, communication skills, and teaching. Postdoctoral fellows will become productive researchers who understand the diverse roles and responsibilities they will face as faculty or members of a research laboratory.
The statistical needs of the scientific establishment are huge, and growing rapidly, making the current rate of workforce production dangerously inadequate. The Department of Statistics at Carnegie Mellon University will train undergraduates, graduate students, and postdoctoral fellows in an integrated program that emphasizes the application of statistical and machine learning methods in scientific research. The program will build on existing connections with computational neuroscience, computational biology, and astrophysics.Carnegie Mellon will recruit students from a broad spectrum of quantitative disciplines, with emphasis on computer science. Carnegie Mellon already has an unusually large undergraduate statistics program. New efforts will strengthen the training of these students, and attract additional highly capable students to be part of the pipeline entering the mathematical sciences.
|
1 |
2011 — 2017 |
Kass, Robert [⬀] Eddy, William (co-PI) [⬀] Roeder, Kathryn Wasserman, Larry (co-PI) [⬀] Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Emsw21 - Rtg: Statistics and Machine Learning For Scientific Inference @ Carnegie-Mellon University
Statistics curricula have required excessive up-front investment in statistical theory, which many quantitatively-capable students in ``big science'' fields initially perceive to be unnecessary. A research training program at Carnegie Mellon exposes students to cross-disciplinary research early, showing them the scientific importance of ideas from statistics and machine learning, and the intellectual depth of the subject. Graduate students receive instruction and mentored feedback on cross-disciplinary interaction, communication skills, and teaching. Postdoctoral fellows become productive researchers who understand the diverse roles and responsibilities they will face as faculty or members of a research laboratory.
The statistical needs of the scientific establishment are huge, and growing rapidly, making the current rate of workforce production dangerously inadequate. The research training program in the Department of Statistics at Carnegie Mellon University trains undergraduates, graduate students, and postdoctoral fellows in an integrated environment that emphasizes the application of statistical and machine learning methods in scientific research. The program builds on existing connections with computational neuroscience, computational biology, and astrophysics. Carnegie Mellon is recruiting students from a broad spectrum of quantitative disciplines, with emphasis on computer science. Carnegie Mellon already has an unusually large undergraduate statistics program. New efforts will strengthen the training of these students, and attract additional highly capable students to be part of the pipeline entering the mathematical sciences.
|
1 |
2016 — 2018 |
Roeder, Kathryn M |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
2/3 Multidimensional Investigation of the Etiology of Autism Spectrum Disorder @ Carnegie-Mellon University
? DESCRIPTION (provided by applicant): Autism Spectrum Disorder (ASD) is characterized by impairments in social communication and restricted or repetitive behavior or interests. The application of genomic technologies has led to the identification of many of the genes underlying ASD, presenting the opportunity to assess the insight these risk genes can give into the etiology of ASD. In this proposal we aim to: 1) Generate a list of ASD-associated genes; 2) Identify points of convergence between these genes in biological data (e.g. gene regulation and expression); and 3) Validate these points of convergence in model systems. Since ASD is a human neurodevelopmental disorder we will prioritize biological data that is collected longitudinally across development from human brain tissue. In our prior work we have demonstrated that de novo mutations, specifically copy number variants (CNVs) and loss of function (LoF) point mutations, are strongly associated with ASD. Furthermore, these mutations cluster at ASD risk genes and loci in cases but not in controls. By comparing the distribution of these mutations between cases and controls we can identify the points of mutational clustering that represent ASD risk loci (e.g. CNVs at the 500kbp 16p11.2 locus and LoFs at the gene CHD8). We have developed a statistical framework to assess this clustering as well as incorporating evidence from inherited variants and case-control data. This framework is called the Transmitted and De novo Associated Test (TADA). In Aim 1 we will develop this test further to incorporate all the available CNV, exome, genome, and targeted sequencing data into a single ASD gene list, ranked by the degree of ASD association. Previously we used the top nine ASD risk genes as seeds for gene co-expression networks and assessed the validity of these networks by their ability to incorporate 120 independent ASD risk genes. By limiting the co- expression input data to narrow windows of development and specific brain regions we could identify the spatiotemporal networks with the greatest enrichment, for example pre-frontal cortex in mid-fetal development. In Aim 2, we propose a similar approach, but using the DAWN (Detecting Association With Networks) method developed by our group. DAWN uses the narrow windows of co-expression data as before, but is able to incorporate evidence from other datasets such as gene regulation, and protein-protein interaction (PPI). By seeding the DAWN networks with the highest confidence genes we will assess the spatiotemporal networks that best predict other ASD genes. ASD shows a significant sex bias implicating an interaction between ASD etiology and sexually dimorphic factors. Building on our work of identifying sexually dimorphic transcripts in the developing human brain we will test their enrichment within specific networks identified by DAWN. To validate the ASD-associated networks, in Aim 3 we will identify the gene that best represents each network and assess if disrupting it also disrupts the other genes within the network. We will disrupt each gene using CRISPR/Cas9 in both mice and human-derived iPSCs and assess the genes disrupted using RNA-Seq.
|
0.958 |
2020 — 2021 |
Roeder, Kathryn M |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Computational Methods to Integrate and Interpret the Transcriptome From Single Cell and Tissue Level Data @ Carnegie-Mellon University
In the past decade, substantial progress has been made in discovery of genetic variants and genes associated with risk for psychiatric disorders. Altered gene expression in the brain, particularly at the cell-type-specific level, is believed to be a driving factor in conferring risk through these genetic variants. To link altered transcription to psychopathology, an immense amount of transcriptomic data is being accumulated, including single-cell and tissue level transcriptomes. Some of these samples cover critical developmental periods. An outstanding challenge is how to integrate single cell and tissue level transcriptomic data and how genetic variation alters transcription in specific cells to produce psychopathology. In this high dimensional ?omics setting, we need powerful statistical and machine learning tools to produce integrative analyses and mesh those results with large psychiatric genetic datasets to achieve new insights. We propose to use our expertise in high dimensional statistical inference to tackle this challenge. We go beyond machine learning models that specialize in prediction, focusing instead on providing interpretable statistical inferences. We identify gene communities, defined in terms of cell type and spatiotemporal window, driving risk. With vast amounts of data comes great risk of spurious inferences based on non-rigorous analyses. On the other hand, reliable, but naïve tools can sacrifice power by not fully integrating all available information. Our overall objective to produce analytic tools that yield reliable and powerful inferences relating cell-type-specific gene expression with genetic risk factors. With these analytical tools made available to the research community, our longer-term goal is to hasten discoveries in the field and thus build the foundation from which therapeutic targets for psychiatric disorders emerge. Our objectives will be accomplished with the following Specific aims: 1) statistically rigorous methods to select cell-type markers and to estimate cell-type-specific (CTS) expression, which will facilitate downstream analyses, including CTS eQTLs from tissue; 2) modeling dynamic gene communities throughout development of cell lineages or tissue and relating them to community-based-score statistics to gain insight into the impact of genetic risk factors on psychiatric disorders; and 3) novel methods for estimating gene co-expression networks from single cell RNA-seq. This contribution is significant because it will make many transcriptomic resources more valuable and enable downstream analyses, such as detection of CTS eQTLs in larger sample sets with higher power. Dynamic network analysis tools enhance our ability to identify gene communities that vary over developmental epochs and this variation facilitates inferences that relate cell type and developmental period with risk factors. The research proposed is innovative, in our opinion, because it uses novel statistical methods for integrative analysis of data from multiple sources, and cutting edge results to represent high dimensional data in a meaningful way that lends itself to clustering and network analysis.
|
0.958 |