1993 — 1998 |
Wasserman, Larry |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Nsf Young Investigator @ Carnegie-Mellon University
This research, supported by the National Science Foundation Young Investigator award, will address theoretical and methodological issues in Bayesian statistical inference. The first topic concerns information-theoretic diagnostics to assess and construct prior distributions. The second topic involves using function space derivatives to perform sensitivity analysis. The National Science Foundation Young Investigator award recognizes outstanding young faculty. This award recognizes the recipient's strong potential for continued professional growth as a research mathematician and for significant development as a teacher and academic leader.//
|
1 |
1993 — 1999 |
Tierney, Luke-Jon (co-PI) [⬀] Kass, Robert (co-PI) [⬀] Kadane, Joseph [⬀] Wasserman, Larry |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Bayesian Inference and Computing @ Carnegie-Mellon University
Our research is oriented toward implementation of Bayesian inference. There has been increasing interest recently in the Bayesian approach to statistics, in part because advances in computational ability have made it feasible in many settings, and in part because Bayesian analysis of data can make use of information from additional sources. Our work will build on our previous research in Bayesian statistics, part of which has been funded by NSF. Our main concerns are: (1) review and assessment of methods for choosing prior probability distributions by formal rules, and further development of methods for assessing sensitivity to the choices; (2) investigation of approximate and exact computational methods for Bayesian hypotheses testing; (3) modification and enhancement of numerical integration techniques and Monte Carlo simulation of posterior distributions; also, improvement of statistical computing environments including use of animation and three dimensional rendering for visualization of uncertainty in higher dimensions; (4) further work on the foundations of subjective probability; and (5) several other topics related to our previous work on elicitation of priors and asymptotic approximations. When analyzing data, it is important to combine all sources of information effectively. Bayesian statistical methods are tailored to this purpose. Our research focuses on finding practical ways to implement Bayesian methods and on investigating the theoretical basis for these methods. We are concerned with the development of computational and graphical techniques that make Bayesian inference feasible in complicated problems. These include: simulation, animation and the construction of statistical computing environments. We will also investigate theoretical issues that support Bayesian techniques. These issues include the foundations of subjective probability and the development of mathematical approximations.
|
1 |
1995 — 1996 |
Wasserman, Larry |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Second International Workshop On Bayesian Robustness; May, 1995; Rimini, Italy @ Carnegie-Mellon University
The purpose of this conference is to bring together experts in the area of robust Bayesian inference to discuss applications, recent methodological developments and theory. Of particular interest are practical methods for implementing the theory such as those that exploit recent advances in statistical computing. Among the topics to be discussed at the workshop are: robust Bayesian methods in biostatistics, local sensitivity, hierarchical models, dynamic graphics, Bayes factros, density estimation and time series. The term ``Bayesian statistics'' refers to a class of data analysis techniques based on probability theory . These techniques are very general and have been successfully used in such diverse areas as quality control, engineering, psychology, medicine, environmental sciences and astrophysics. But applications of Bayesian methods to science and technology have been hindered since Bayesian analysis often requires the use of certain simplifying assumptions. ``Bayesian robustness'' is concerned with developing techniques for understanding how scientific conclusions depend on the assumptions that are used by statisticians when analyzing the data. Bayesian robustness is crucial for the successful development of practical Bayesian data analysis methods. It is a relatively new, quickly growing area and hence it difficult for researchers -- especially young researchers -- to meet and stay in touch with the latest ideas. The objective of the workshop is to draw together researchers to discuss the latest advances in this field. LEVEL OF EFFORT STATEMENT The recommended level of support for this conference, though slightly lower than what we had hoped for, is still substantial and, together with the funds provided by the Italian government, we expect to produce a high quality workshop. Indeed, preparations for the conference are in progress and many top researchers have already indicated their intention to participate. Furthermore, the Institute of Mathematical Statistics has indicated that it is likely they will publish the proceedings in their Lecture Note series. We will make every effort to ensure a successful conference.
|
1 |
1998 — 1999 |
Mitchell, Tom (co-PI) [⬀] Faloutsos, Christos (co-PI) [⬀] Wasserman, Larry Thrun, Sebastian [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop On Automated Learning and Discovery @ Carnegie-Mellon University
This award provides partial support for a cluster of eight workshops centered around automated learning and decision making based on data. The meeting is held at Carnegie Mellon University on June 11-13, 1998. It covers scientific research at the intersection of statistics, computer science, artificial intelligence, databases, social sciences and language technologies. The aim of this meeting is to initiate a dialogue between these disciplines. By doing do, it seeks to attain two types of impacts. First, it aims to generate synergy between previously separated fields, to lead to new, cross-disciplinary research collaborations and new, unified research approaches. Second, it attempts to provide guidance to the scientific community by characterizing the state-of-the-art, pointing out possible overlaps, making people aware of research results in other fields, and identifying some of the most promising cross-disciplinary research directions. One of the results of this meeting will be a written report, which will be made available to NSF and the scientific community at large, by posting it on the Web, and by submitting it to a widely accessible magazine or journal.
|
1 |
1998 — 2001 |
Kass, Robert [⬀] Roeder, Kathryn (co-PI) [⬀] Wasserman, Larry |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bayesian Inference and Mixture Models @ Carnegie-Mellon University
9803433 Robert E. Kass
This research is focused on reference Bayesian methods (Bayesian inference with prior distributions chosen by some formal rule), mixture models, Bayes factors, and causal inference, with an emphasis on hierarchical models (including classical mixed models and their generalizations). Both parametric and nonparametric or semiparametric models are studied. Many of the results are obtained by asymptotic methods, but ``exact'' computation (typically via simulation) also play a substantial role.
Elaboration of simple statistical models has been a major theme in the discipline in the latter part of this century. Previously, models have involved a small number of parameters, the values of which have been determined from observed data. With increased computing power, more complicated statistical models involving many more parameters have become central to much current statistical activity. Yet, despite recent progress, fundamental issues remain. This research is motivated in part by problems in statistical genetics, cognitive neuroscience, and the study of criminal behavior.
|
1 |
1998 — 2001 |
Faloutsos, Christos (co-PI) [⬀] Spirtes, Peter (co-PI) [⬀] Wasserman, Larry Moore, Andrew Nichol, Robert (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Kdi: New Algorithms, Architectures and Science For Data Mining of Massive Astrophysics Sky Surveys @ Carnegie-Mellon University
Moore 9873442 There many massive databases in industry and science, and this is particularly true for Astrophysics. There are many kinds of questions that physicists and other users wish to ask the databases in real time, e.g., 'find outliers'; 'find clusters'; 'find patterns'; 'classify the data records into N predetermined classes.' Wide-ranging statistics and machine learning algorithms similarly need to query databases, sometimes millions of times for a single inference. With millions or billions of records (such as the new generation of astrophysics sky surveys) this can be intractable using current algorithms. This project aims to make repeated statistical querying of huge datasets computationally feasible by transforming massive databases into condensed representations that permit the rapid answering of such questions. To achieve these goals, the investigator and his colleagues explore ways in which tools from statistics (such as Bayesian networks), databases (such as kd-trees/R-trees), and Artificial Intelligence (such as AD-trees and rule-finders) can help, how they scale up, and how they can be combined. The investigators intend to help automate the process of scientific discovery for astrophysical data sources in which there is too much information for any unaided human to have a chance of spotting patterns, regularities, or anomalies. Government and industry in the U.S. have invested heavily in ingenious new ways to gather information in all branches of science and industry, from cell biology to the flows of capital in international commerce. Scientists and analysts who have worked so hard to gather magnitudes more data than they had ten years ago are now faced with an equally daunting task: exploiting it fully. It is ironic that in fields such as astrophysics there is now so much data that no human has enough time to even see a tiny fraction of it. The job of discovering new relationships, anomalies, and even causation, must now be at least partly turned over to computers. The investigators comprise a team of statisticians, computer scientists, and astronomers who have each already made progress in this direction. This team develops new algorithms to squeeze as much information as possible from trillion-byte astrophysics databases such as the Sloane Sky Survey. They also make sure that the resulting technology is deployed elsewhere in science and industry.
|
1 |
2001 — 2004 |
Kass, Robert (co-PI) [⬀] Roeder, Kathryn (co-PI) [⬀] Wasserman, Larry Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Complex Statistical Models: Theory and Methodology For Scientific Applications @ Carnegie-Mellon University
Complex Statistical Models: Theory and Methodology for Scientific Applications
Larry Wasserman, Christopher Genovese, Robert E. Kass and Kathryn Roeder
ABSTRACT
This project is aimed at developing statistical theory and methodology for highly complex, possibly infinite dimensional models. Although the methodology and theory will be quite general, we will conduct the research in the context of three scientific collaborations. The first is ``Characterizing Large-Scale Structure in the Universe,'' a joint project with astrophysicists and computer scientists. The main statistical challenges are nonparametric density estimation and clustering, subject to highly non-linear constraints. The second project is ``Locating Disease Genes with Genomic Control.'' We aim to locate regions of the genome with more genetic similarity among cases (subjects with disease) than controls. These regions are candidates for containing disease genes. Finding these regions ina statistically rigorous fashion requires testing a vast number of hypotheses. We will extend and develop recent techniques for multiple hypothesis testing. The third projects is ``Modeling Neuron Firing Patterns.'' The goal is to construct and fit models for neuron firing patterns, called spike trains. The data consist of simultaneous voltage recordings of numerous neurons which have been subjected to time-varying stimuli. The data are correlated over time and a major effort is to develop a class of models, called inhomogeneous Markov interval (IMI) process models, which can adequately represent the data.
Statistical methods for simple statistical models with a small number of parameters are well established. These models often do not provide an adequate representation of the phenomenon under investigation. Currently, scientists are deluged with huge volumes of high quality data. These data afford scientists the opportunity to use very complex models that more faithfully reflect reality. The researchers involved in this proposal are developing methodology and theory for analyzing data from these complex models. The methods are very general but they are being developed for applications in Astrophysics, Genetics and Neuroscience.
|
1 |
2001 — 2007 |
Wasserman, Larry Connolly, Andrew Moore, Andrew Nichol, Robert (co-PI) [⬀] Schneider, Jeff Miller, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Im: Statistical Data Mining For Cosmology @ Carnegie-Mellon University
Scientists are now confronted with many very large high-quality data sets. The potential scientific benefits of these data are offset by the laborious process of analyzing them to answer questions and test theories. This project will develop new data mining algorithms in pursuit of the goal of computer assisted discovery. Two key issues in achieving this are computational efficiency and autonomy. If scientists are to focus their energy on understanding, answers must arrive in minutes rather than days, hence the need for efficiency. Autonomy is important both from the data mining and the statistical perspective. Detailed searches for relationships, models, and parameters are too large for humans to undertake manually. New statistical methods will have to autonomously and quickly select models, test their significance, and report the results to search algorithms looking for new discoveries.
The National Virtual Observatory (NVO) currently under construction is a model of the future of science. The NVO will assemble petabytes of data from many multi-wavelength sky surveys into a single repository. The new methods to be developed will be implemented in the domain of cosmology, but they will be applicable to all other sciences.
The members of this project are computer scientists, physicists and statisticians who have a track record of collaborating closely. Working together they have produced: new algorithmic theory, new statistical theory, and publicly fielded software packages resulting from the theory, while developing new courseware and training students.
This proposal involves research and education in the following areas:
Nonparametric data analysis. Nonparametric statistical models enable powerful analysis techniques that make minimal assumptions, which is critical for scientific accuracy.
Automated discovery. Statistical models can be used directly for discovery. Individual objects are compared to models to identify anomalies and data generated models are compared to theoretical models to refute or confirm hypotheses.
Computational methods for fast analysis. The project will build on past successes of getting orders of magnitude speedups on operations such as Expectation Maximization based clustering and n-point correlations to make the new methods fast.
Automated simulation parameter searching. Using all of the above methods, a system will be developed that starts with a parameterized simulation and some observational data. The system will search the space of parameters, testing the resulting simulation against the real data using nonparametric methods to determine the best settings.
|
1 |
2004 — 2007 |
Wasserman, Larry Connolly, Andrew Genovese, Christopher [⬀] Miller, Christopher (co-PI) [⬀] Mcintyre, Julie (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nonparametrical Statistical Methods For Astrophysical and Cosmological Data @ Carnegie-Mellon University
AST-0434343 Genovese
Recent technological advances have enabled astronomers and cosmologists to collect data of unprecedented quality and quantity. These large data sets can reveal more complex and subtle effects than ever before, but they also demand new statistical approaches. This project consists of two intertwined components: (a) development of new nonparametric statistical methods that address recurrent problems in the analysis of astrophysical and cosmological data and (b) application of the new methods to help answer significant astrophysical and cosmological questions. Specifically, this research will improve inference for the Cosmic Microwave Background spectrum by constructing uniform confidence sets in nonparametric regression, characterize the influence of local environment on galaxy evolution by developing new methods for nonparametric errors-in-variables problems, and estimate the matter density from magnitude limited galaxy surveys by producing accurate density estimators for doubly truncated data.
The research provides interdisciplinary training for postdoctoral fellows and graduate students, and strengthens an interdisciplinary infrastructure between the mathematical and physical sciences.
|
1 |
2006 — 2010 |
Lafferty, John [⬀] Wasserman, Larry Lee, Ann |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mspa-McS: Nonparametric Learning in High Dimensions @ Carnegie-Mellon University
Prop ID: DMS-0625879 PI: Lafferty, John D. Institution: Carnegie-Mellon University Title: MSPA-MCS: Nonparametric Learning in High Dimensions
Abstract:
The research in this proposal lies at the boundary of statistics and machine learning, with the underlying theme of nonparametric inference for high-dimensional data. Nonparametric inference refers to statistical methods that learn from data without imposing strong assumptions. The project will develop the mathematical foundations of learning sparse functions in high-dimensional data, and will also develop scalable, practical algorithms that address the statistical and computational curses of dimensionality. The project will rigorously develop the idea that it is possible to overcome these curses if, hidden in the high-dimensional problem, there is low-dimensional structure. The focus of the project will be on five technical aims: (1) Develop practical methods for high-dimensional nonparametric regression (2) Develop theory for learning when the dimension increases with sample size (3) Develop theory that incorporates computational costs into statistical risk (4) Develop methods for sparse, highly structured models (5) Develop methods for data with a low intrinsic dimensionality. These aims target the advancement of both statistical theory and computer science, and the interdisciplinary team for the project includes a statistician (Wasserman), a computer scientist (Lafferty), and a physicist who is now in a statistics department (Lee).
|
1 |
2008 — 2011 |
Wasserman, Larry Genovese, Christopher [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Statistical Theory For Astrophysical Problems @ Carnegie-Mellon University
Nonparametric inference has become an essential tool for studying the cosmos. This project consists of two intertwined components: (a) development of new theoretical tools and nonparametric methodologies that are inspired by problems in astrophysics but apply more broadly, and (b) application of these tools to two important astrophysical problems, which will frame the need for and guide the development of new statistical theory. Specifically, the investigators will focus on inference for the dark energy equation of state and on identifying filamentary structures from point process data such as that produced by galaxy surveys. The first problem gives rise to a challenging nonlinear inverse problem and demands a nonparametric approach, given what little is known about the dark energy equation of state. The investigators will develop new theory for nonlinear inverse problems that allow for accurate estimates and sharp confidence statements about the unknown function. These techniques will then be applied to Type Ia supernova data, possibly combined with other data sources, to make inferences about dark energy. The second problem gives rise to challenging spatial and inference problems. Current theory in the statistical literature applies to a single filament only, and techniques in the astronomical literature are not supported by theory. The investigators on this project will close that gap, developing theory for defining, identifying, and making inferences about the filamentary structures. The investigators will test this technique and apply it to galaxy survey data.
One of the most important problems in cosmology is understanding dark energy. The relationship between observable quantities and dark energy produces a challenging nonlinear inverse problem. With very little strong a priori information about the nature of dark energy, parametric approaches to the problem are limited and suboptimal. And with the promise of much larger data sets in the near future, there will be need and opportunity to extract fine-scale features of the dark energy equation of state. The investigators will develop new theory of inference for such problems, with a focus on estimation under shape constraints, sharp hypothesis testing, and accurate confidence sets. The goal is a substantial improvement in accuracy over the current best techniques. In particular, the investigators will focus on the problem of understanding dark energy and on identifying filamentary structures in distribution of matter. The former is one of the central problems in modern cosmology and demands state of the art statistical techniques to get the most from the data. The investigators will develop new statistical theory and methodologies that substantially improve the precision with which features of dark energy can be estimated from supernova data and other data sources. The latter problem is central to understanding the distribution of matter in the universe. Current statistical theory only applies to a limited version of the problem, and current astronomical methodologies do not have strong theoretical support. The investigators will close that gap and develop a method and corresponding theory that can handle realistic versions of the problem and give optimal or near-optimal performance.
|
1 |
2010 — 2012 |
Eddy, William (co-PI) [⬀] Kass, Robert [⬀] Roeder, Kathryn (co-PI) [⬀] Wasserman, Larry Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Emsw21-Rtg: Statistics and Machine Learning For Scientific Inference @ Carnegie-Mellon University
Statistics curricula have required excessive up-front investment in statistical theory, which many quantitatively-capable students in ``big science'' fields initially perceive to be unnecessary. A training program at Carnegie Mellon will expose students to cross-disciplinary research early, showing them the scientific importance of ideas from statistics and machine learning, and the intellectual depth of the subject. Graduate students will receive instruction and mentored feedback on cross-disciplinary interaction, communication skills, and teaching. Postdoctoral fellows will become productive researchers who understand the diverse roles and responsibilities they will face as faculty or members of a research laboratory.
The statistical needs of the scientific establishment are huge, and growing rapidly, making the current rate of workforce production dangerously inadequate. The Department of Statistics at Carnegie Mellon University will train undergraduates, graduate students, and postdoctoral fellows in an integrated program that emphasizes the application of statistical and machine learning methods in scientific research. The program will build on existing connections with computational neuroscience, computational biology, and astrophysics.Carnegie Mellon will recruit students from a broad spectrum of quantitative disciplines, with emphasis on computer science. Carnegie Mellon already has an unusually large undergraduate statistics program. New efforts will strengthen the training of these students, and attract additional highly capable students to be part of the pipeline entering the mathematical sciences.
|
1 |
2011 — 2013 |
Lafferty, John (co-PI) [⬀] Wasserman, Larry Liu, Han [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Nonparametric Structure Learning For Complex Scientific Datasets @ Johns Hopkins University
The project brings together an interdisciplinary team of researchers from Johns Hopkins University, Carnegie Mellon University, and the University of Chicago to develop methods, theory and algorithms for discovering hidden structure from complex scientific datasets, without making strong a priori assumptions. The outcomes include practical models and provably correct algorithms that can help scientists to conduct sophisticated data analysis. The application areas include genomics, cognitive neuroscience, climate science, astrophysics, and language processing.
The project has five aims: (i) Nonparametric structure learning in high dimensions: In a standard structure learning problem, observations of a random vector X are available and the goal is to estimate the structure of the distribution of X. When the dimension is large, nonparametric structure learning becomes challenging. The project develops new methods and establishes theoretical guarantees for this problem; (ii) Nonparametric conditional structure learning: In many applications, it is of interest to estimate the structure of a high-dimensional random vector X conditional on another random vector Z . Nonparametric methods for estimating the structure of X given Z are being developed, building on recent approaches to graph-valued and manifold-valued regression developed by the investigators; (iii) Regularization parameter selection: Most structure learning algorithms have at least one tuning parameter that controls the bias-variance tradeoff. Classical methods for selecting tuning parameters are not suitable for complex nonparametric structure learning problems. The project explores stability-based approaches for regularization selection; (iv) Parallel and online nonparametric learning: Handling large-scale data is a bottleneck of many nonparametric methods. The project develops parallel and online techniques to extend nonparametric learning algorithms to large scale problems; (v) Minimax theory for nonparametric structure learning problems: Minimax theory characterizes the performance limits for learning algorithms. Few theoretical results are known for complex, high-dimensional nonparametric structure learning. The project develops new minimax theory in this setting. The results of this project will be disseminated through publications in scientific journals and major conferences, and free dissemination of software that implements the nonparametric structure learning algorithms resulting from this research.
The broader impacts of the project include: Creation of powerful data analysis techniques and software to a wide range of scientists and engineers to analyze and understand more complex scientific data; Increased collaboration and interdisciplinary interactions between researchers at multiple institutions (Johns Hopkins University, Carnegie Mellon University, and the University of Chicago); and Broad dissemination of the results of this research in different scientific communities. Additional information about the project can be found at: http://www.cs.jhu.edu/~hanliu/nsf116730.html.
|
0.94 |
2011 — 2013 |
Wasserman, Larry Genovese, Christopher (co-PI) [⬀] Lee, Ann Schafer, Chad Wood-Vasey, William |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nonparametric Inference For Complex Physical Models @ Carnegie-Mellon University
The recent years have seen rapid growth in the depth, richness, and scope of scientific data, a trend that is likely to accelerate. At the same time, simulation and analytical models have sharpened to unprecedented detail the understanding of the processes that generate these data. But what has advanced more slowly is the methodology to efficiently combine the information from rich, massive data sets with the detailed, and often nonlinear, constraints of theory and simulations. This project will bridge that gap. The investigators develop, implement, and disseminate new statistical methods that can fully exploit the available data by adhering to the constraints imposed by current theoretical understanding. The central idea in the work is constructing sparse, possibly nonlinear, representations of both the data and the distributions for the data predicted by theory. These representations can then be transformed onto a common space to allow sharp inferences that respect the inherent geometry of the model. The methodology developed in this project will apply to a wide range of scientific problems. The investigators focus, however, on a critical challenge in astronomy: using observations of Type Ia supernovae to improve constraints on cosmological theories explaining the nature of dark energy, a significant, yet little- understood, component of the Universe.
Crucial scientific fields have enjoyed huge advances in the ability both to gather high-quality data and to understand the physical systems that generated these data. Nevertheless, the full societal and scientific value of this progress will only be realized with new, advanced statistical methods of analyzing the massive amounts of available data. The investigators develop statistical methods for combining theoretical modelling and observational evidence into improved understanding of these physical processes. The analysis of these data will requirenot only new methods, but also the use of high-performance computing resources. There is a particular need for these tools in cosmology and astronomy, and this project will bring together statisticians and astronomers to combine expertise, but this research is motivated by problems that are present in other fields, such as the climate sciences.
|
1 |
2011 — 2017 |
Kass, Robert [⬀] Eddy, William (co-PI) [⬀] Roeder, Kathryn (co-PI) [⬀] Wasserman, Larry Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Emsw21 - Rtg: Statistics and Machine Learning For Scientific Inference @ Carnegie-Mellon University
Statistics curricula have required excessive up-front investment in statistical theory, which many quantitatively-capable students in ``big science'' fields initially perceive to be unnecessary. A research training program at Carnegie Mellon exposes students to cross-disciplinary research early, showing them the scientific importance of ideas from statistics and machine learning, and the intellectual depth of the subject. Graduate students receive instruction and mentored feedback on cross-disciplinary interaction, communication skills, and teaching. Postdoctoral fellows become productive researchers who understand the diverse roles and responsibilities they will face as faculty or members of a research laboratory.
The statistical needs of the scientific establishment are huge, and growing rapidly, making the current rate of workforce production dangerously inadequate. The research training program in the Department of Statistics at Carnegie Mellon University trains undergraduates, graduate students, and postdoctoral fellows in an integrated environment that emphasizes the application of statistical and machine learning methods in scientific research. The program builds on existing connections with computational neuroscience, computational biology, and astrophysics. Carnegie Mellon is recruiting students from a broad spectrum of quantitative disciplines, with emphasis on computer science. Carnegie Mellon already has an unusually large undergraduate statistics program. New efforts will strengthen the training of these students, and attract additional highly capable students to be part of the pipeline entering the mathematical sciences.
|
1 |
2012 — 2016 |
Verdinelli, Isabella Wasserman, Larry Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Estimating Low Dimensional Structure in Point Clouds @ Carnegie-Mellon University
This project will develop computationally efficient estimation methods with accompanying theory for the problem of identifying low-dimensional structure in point-cloud data, both low and high dimensional. A canonical example is a noisy sample from a manifold. The investigators will develop minimax lower bounds for the estimation problem and construct estimators that achieve these lower bounds. They will then implement these methods in a practically useful form nd apply them to several important scientific problems.
Datasets sometimes contain hidden, low-dimensional structure such as clusters, filaments and low dimensional surfaces. The goal of this project to develop rigorously justified, computationally efficient methods for extracting such structure from data. The developed methods will be applied to a diverse set of problems in astrophysics,seismology, biology, and neuroscience. The project will advance knowledge in several fields including computational geometry, machine learning, and statistics.
|
1 |
2012 — 2017 |
Maloney, Craig Wasserman, Larry |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: a Data-Driven Statistical Approach to Aging and Elasticity in Colloidal Glasses @ Carnegie-Mellon University
The research objective of this grant is to understand the fundamental process of aging of glassy materials from a statistical and microscopic point of view. Aging of glassy materials is a long standing problem: a glass is an out-of-equilibrium material, so its mechanical and electrical properties change over time, but the microscopic reasons for this are still unknown. Understanding aging has implications for extending the lifetime of products made from glassy materials (both "regular" glass and plastics). This project involves simulations of glassy systems and will employ novel statistical techniques to reconstruct local elastic properties of the material. We will learn how these properties change during the aging process. Complementary experiments will be conducted on colloidal glasses, a well-characterized model system which exhibits aging. Due to the limited amount of data the experiments can obtain - and more fundamentally, the limited data one can obtain before a system ages into a new state - we will need to extend statistical techniques from bioinformatics and econometrics to be able to the relevant information from our data.
The developed statistical techniques will be useful in a variety of applications beyond aging or materials science. Of equal importance is understanding the fundamental mechanisms of aging in glassy systems, which may have a long-term impact on how glassy materials are produced. The students involved in this project will gain interdisciplinary experience, as the project merges physics and engineering, statistics and soft materials, statistical mechanics theory and microscopy experiments. Outreach efforts will target Pittsburgh area high schools with under-represented populations and will target Atlanta area K-12 students via "Squishy Physics" field trips.
|
1 |
2015 — 2018 |
Wasserman, Larry Genovese, Christopher (co-PI) [⬀] Verdinelli, Isabella |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Estimating Low Dimensional, High-Density Structure @ Carnegie-Mellon University
Data in high dimensional spaces are now very common. This project will develop methods for analyzing these high dimensional data. Such data may contain hidden structures. For example, clusters (which are small regions with a large number of points) can be stretched out like a string forming a structure called a filament. Scientists in a variety of fields need to locate these objects. It is challenging since the data are often very noisy. This project will develop rigorously justified and computationally efficient methods for extracting such structures. The methods will be applied to a diverse set of problems in astrophysics, seismology, biology, and neuroscience. The project will advance knowledge in several fields including computational geometry, astronomy, machine learning, and statistics.
Finding hidden structure is useful for scientific discovery and dimension reduction. Much of the current theory on nonlinear dimension reduction assumes that the hidden structure is a smooth manifold and is very restrictive. The data might be concentrated near a low dimensional but very complicated set, such as a union of intersecting manifolds. Existing algorithms, such as the Subspace Constrained Mean Shift exhibit erratic behavior near intersections. This project will develop improved algorithms for these cases. At the same time, contemporary theory breaks down in these cases and this project will develop new theory to address the aforementioned problem. A complete method (which will be called singular clusters) will be developed for decomposing point clouds of varying dimensions into subsets.
|
1 |
2015 — 2018 |
Ho, Shirley Di Matteo, Tiziana Mandelbaum, Rachel (co-PI) [⬀] Wasserman, Larry Genovese, Christopher (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cosmic Web Reconstruction: a Unique Opportunity to Study the Cosmic Structures of the Universe @ Carnegie-Mellon University
Understanding how matter is distributed in the Universe is key to developing accurate models of how it evolved. The investigators will use mapping of filamentary structures in large samples of galaxies as a new indicator of large scale structure that can then be compared to other data. They seek to trace the cosmic web from these data. Broader impacts of the work include training of a graduate student, and engagement of the broader community through public lectures at the Allegheny Observatory, online educational games, and existing programs for middle school students.
The research will cross correlate the new data with other cosmological observables like Baryon Acoustic Oscillations and study intrinsic galaxy shapes with filaments. The group has developed a method for identifying filaments in purely photometric data and, with this award, will develop a pre-existing prototype into a reconstruction tool. They will then apply the tool to simulated data.
|
1 |
2017 — 2020 |
Wasserman, Larry Rinaldo, Alessandro (co-PI) [⬀] Balakrishnan, Sivaraman |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
High-Dimensional Clustering: Theory and Methods @ Carnegie-Mellon University
The past two decades have witnessed an explosion in the scale and complexity of data sets that arise in science and engineering. Broadly, clustering methods which discover latent structure in data are our primary tool for navigating, exploring and visualizing massive datasets. These methods have been widely and successfully applied in phylogeny, medicine, psychiatry, archaeology and anthropology, phytosociology, economics and several other fields. Despite its ubiquity, the widespread scientific adoption of clustering methods have been hindered by the lack of flexible clustering methods for high-dimensional datasets and by the dearth of meaningful inferential guarantees in clustering problems. Accordingly, the goal of this research is to develop new and effective methods for clustering complex data-sets, and to further develop an inferential grounding -- which will in turn lead to actionable conclusions -- for these methods. This research will lead to the development of new clustering methods, as well as to a deeper understanding of the fundamental limitations of methods aimed at uncovering latent structure in data.
The research component of this project consists of four aims designed to address related aspects of this high-level goal: (a) analyze and develop new clustering methods for high-dimensional datasets, with a particular focus on practically useful methods like mixture-model based clustering, and minimum volume clustering; (b) develop novel methods for inference in the context of clustering, motivated by scientific applications where it is important not only to cluster the data but also to clearly characterize the sampling variability of the discovered clusters; (c) develop fundamental lower bounds for high-dimensional clustering (d) develop novel methods for clustering functional data with inferential guarantees. These research components are closely coupled with concrete educational initiatives, including the development and broad dissemination of publicly-available software for high-dimensional clustering; tutorials and workshops at Machine Learning conferences and fostering further interactions between the Departments of Statistics and Machine Learning at Carnegie Mellon.
|
1 |
2021 — 2024 |
Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Foundations of High-Dimensional and Nonparametric Hypothesis Testing @ Carnegie-Mellon University
Statistical inferential tools are the main export from the discipline of statistics to the empirical sciences, serving as the primary lens through which natural scientists interpret observations and quantify the uncertainty of their conclusions. However, in the analysis of modern large datasets the most common inferential tools available to us are fraught with pitfalls, often requiring various technical conditions to be checked before their valid application. This in turn has led to misuse of the inferential tools and subsequent misinterpretation of results. This research project will aim to address this issue by developing and analyzing new user-friendly methodologies for statistical inference in complex settings. The methods we develop will be broadly applicable to a wide variety of challenging inferential problems in the physical and biological sciences, will eliminate the need to verify technical conditions, and will ultimately be robust in their application. The principal and co-principal investigators will be involved in advising and mentoring graduate students, in curricular and course development, and in integrating the project with a research group on Statistical Methods in the Physical Sciences (STAMPS).
This project will advance our understanding of high-dimensional and non-parametric inference along three frontiers. Firstly, we aim to develop statistical inferential tools for irregular models, which are valid under weak conditions. Our particular focus will be on mixture models, and on methods which use sample-splitting to avoid strong regularity conditions. Secondly, we will show that our methods achieve these strong guarantees at a surprisingly small statistical price. To rigorously quantify the statistical price paid for avoiding strong regularity conditions we will use minimax theory. However, standard minimax theory, in many cases, does not adequately capture the difficulty of statistical inference since the difficulty of inference can vary significantly across the parameter space. A more refined theory -- called local minimax theory -- leads to a more accurate picture, and we will study our methods via this lens. Finally, we will address the problem of conditional independence (CI) testing. Despite its central role in regression diagnostics, and in the study of probabilistic graphical models, the task of CI testing and its intrinsic difficulty is poorly understood. We will address two fundamental aspects of CI testing, by studying methods to appropriately calibrate CI tests, and by developing and analyzing powerful new CI tests.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |