2004 — 2010 |
Dasgupta, Sanjoy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Algorithms For Unsupervised Learning @ University of California-San Diego
The goal of this research is to develop algorithms with rigorous performance guarantees for core machine learning tasks. Although the most common guarantee in the current literature is that of local optimality in the solution space, this project aims to use stronger performance criteria, such as quantitative bounds on the ratio by which the cost of the learned solution exceeds that of the global optimum, both to guide the development of new algorithms and to compare existing ones. This project will focus on two canonical unsupervised learning tasks: hierarchical clustering and learning the structure of directed probabilistic (Bayesian) nets. Both models are already in widespread use for analyzing massive data sets; better algorithms will increase their effectiveness and reliability, and will involve technical tools that are likely to be of broader use for other machine learning and statistical tasks. The results of this research project will be integrated into a new course that focuses on algorithmic aspects of machine learning; the resulting educational materials will be made available to the academic community.
|
0.976 |
2007 — 2012 |
Dasgupta, Sanjoy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Foundations of Active Learning @ University of California-San Diego
Proposal 0713540 "RI: Foundations of Active Learning" PI: Sanjoy Dasgupta University of California-San Diego
ABSTRACT
The goal of this project is to characterize several important problems in active learning from a theoretical perspective. Active learning is a kind of machine learning, a key aspect of Robust Intelligence. A central aim of machine learning is to develop techniques that construct models of data in order to help make predictions in future situations. The past decades have seen huge advances in machine learning that uses labeled data. However, labels are often difficult to obtain. Active learning addresses situations in which the data are unlabeled, and any labels must be explicitly requested and paid for. The aim of active learning is to learn a good classifier with as few labels as possible. Despite its practical importance, active learning is a comparatively underdeveloped area in machine learning.
This project will rigorously investigate the potential of intelligent querying, and develop practical, label-efficient learning algorithms. It will bring together a diversity of student talent, from theoreticians to domain experts in biology and vision applications. The resulting algorithms will be made widely available, and have the potential to increase the applicability of machine learning to the many large-scale problems in which difficulty of labeling is a critical bottleneck.
|
0.976 |
2008 — 2012 |
Dasgupta, Sanjoy Freund, Yoav [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri-Small: Learning From Data of Low Intrinsic Dimension @ University of California-San Diego
This project studies machine learning from data that appears high dimensional but, in fact, has low intrinsic dimension (e.g., the data lies on a low-dimensional manifold). Physical constraints in many applications produce exactly such a situation. The project is developing machine learning systems that use resources (e.g., compuational time and space) that scale with the intrinsic rather than the extrinsic dimension. The idea of data lying on a manifold is appealing and suggestive, and has been the inspiration of a lot of recent, exciting work in machine learning. Often the aim is to embed such data into a lower dimensional space, after which the application of standard methods consume less resources. The PIs have developed a precise notion of intrinsic dimension that captures the manifold intuition while being broad enough to both be statistically sensible and empirically verifiable. This quantity is then treated as a fundamental parameter in terms of which a variety of new nonparametric methods can be assessed. The first of these is a simple variant of the k-d tree that is provably adaptive to intrinsic dimension. The PIs also consider schemes for nonparametric classification and regression, for manifold learning, and for embedding. These new algorithms and ideas will be applied to fundamental challenges in a variety of domains, including sensor networks, computer vision, protein structure prediction, and robotic control.
|
0.976 |
2009 — 2013 |
Griswold, William Krueger, Ingolf (co-PI) [⬀] Dasgupta, Sanjoy Rosing, Tajana (co-PI) [⬀] Shacham, Hovav (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cps:Medium: Citisense - Adaptive Services For Community-Driven Behavioral and Environmental Monitoring to Induce Change @ University of California-San Diego
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
The objective of this research project is to achieve fundamental advances in software technology that will enable building cyber-physical systems to allow citizens to see the environmental and health impacts of their daily activities through a citizen-driven body-worn mobile-phone-based commodity sensing platform. The approach is to create aspect-oriented extensions to a publish-subscribe architecture, called Open Rich Services (ORS), to provide a highly extensible and adaptive infrastructure. As one example, ORS will enable highly adaptive power management that not only adapts to current device conditions, but also the nature of the data, the data?s application, and the presence and status of other sensors in the area. In this way, ORS will enable additional research advances in power management, algorithms, security and privacy during the project. A test-bed called CitiSense will be built, enabling in-the-world user and system studies for evaluating the approach and providing a glimpse of a future enhanced by cyber-physical systems.
The research in this proposal will lead to fundamental advances in modularity techniques for composable adaptive systems, adaptive power management, cryptographic methods for open systems, interaction design for the mobile context, and statistical inference under multiple sources of noise.
The scientific and engineering advances achieved through this proposal will advance our national capability to develop cyber-physical systems operating under decentralized control and severe resource constraints. The students trained under this project will become part of a new generation of researchers and practitioners prepared to advance the state of cyber-physical systems for the coming decades.
|
0.976 |
2012 — 2017 |
Dasgupta, Sanjoy Freund, Yoav [⬀] Chaudhuri, Kamalika (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Medium: Quantifying and Utilizing Confidence in Machine Learning @ University of California-San Diego
This project defines meaningful notions of confidence in prediction, designs procedures for computing such notions, and applies these procedures to core machine learning tasks such as active learning, crowd-sourced learning, and tracking. In many applications it is helpful to have classifiers that output, together with each prediction, a rating of the confidence that the prediction is in fact correct. Existing literature either provides various ad-hoc ways for computing such ratings which typically lack a rigorous mathematical footing, or provides mathematically consistent methods (in the Bayesian framework) for computing confidence ratings under very strong assumptions that are unlikely to hold in practice. The research team investigates methods of computing measures of confidence that are mathematically rigorous while making minimal assumptions on the way data is generated, and use these measures to further develop solutions to core machine learning tasks.
Defining and computing mathematically sound measures of confidence lies at the heart of machine learning, pattern recognition and uncertainty in AI. Confidence-rated prediction, active learning, and tracking are fundamental tasks of machine learning and statistics that arise repeatedly in large-scale problems; this project will develop rigorous solutions to these problems. The algorithms developed in this work are tested and used in the Automatic Cameraman project, an interactive, audio-visual installation in the UCSD Computer Science department. The interactive Automatic Cameraman system are used an educational tool to be extended in many different directions, by teams of students at a variety of skill levels.
|
0.976 |
2012 — 2017 |
Chan, Theodore Patrick, Kevin [⬀] Dasgupta, Sanjoy Griswold, William Papakonstantinou, Yannis (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shb: Type Ii (Int): Delphi: Data E-Platform Leveraged For Patient Empowerment and Population Health Improvement @ University of California-San Diego
In response to a healthcare crisis of epidemic proportions, thousands of software developers have been innovating new personal healthcare applications and technologies that leverage advances in medical and computing technology. Despite the endless streams of personal data that these tools process -- weight, activity, diet, heart rate, etc. -- they are relatively data poor. Left out of these applications is a comprehensive set of users' clinical electronic medical records, genomic data, comparative data with relevant subpopulations, and data on environmental influences important to health and quality of life.
There are numerous barriers to incorporating such data in applications, the dominant factors being the tremendous volume and heterogeneity of such data, much of it streaming in real-time and spread across disparate stakeholder platforms. A related problem is drawing inferences from these data. With the advances in databases and machine learning proposed, we envision a new era of health and healthcare where patients, providers and consumers are empowered by data access and applicability that we characterize as personalized population health. In particular, we anticipate a new category of healthcare applications that infer one's health status - and help execute interventions - in the perspective of one's entire life history and context.
This project is conducting fundamental and applied research in support of a platform, called DELPHI, that enables integrated access and analysis of all data relevant to health, and consequently promotes more rapid development of empowering, data-driven health apps and tools by a broad community of health-related software developers. The platform supports an integrated "whole health information model" of the individual that provides developers a single point of access that both (a) hides distribution and data heterogeneities, and (b) facilitates drawing inferences from these "noisy" data. The platform enables novel forms of analyses based on contextual and statistical metadata. Scalability is achieved through theoretically proven and newly proposed database and machine learning techniques. Our research is driven by three disparate case studies and field trials: a clinician-facing type-1 diabetes intervention, a patient and consumer-facing hypertension application, and a regional population health asthma and respiratory disease scenario.
Intellectual Merit
DELPHI is yielding fundamental advances in databases and machine learning that enable a wide community of programmers - from full-time professional to relative novices - to program on top of a "live", streaming population-scale medical dataset. Additionally, these techniques are being evaluated in at least three realistic field trials, yielding new insights on both the nature of computing on medical "big data" and the techniques we have proposed to make it tractable.
Broader Impact
This will be demonstrated through a personal well-being and population health applications ecosystem, with three immediate beneficiaries: 1) The San Diego Beacon Community, a model for health information exchanges currently under development nationally. 2) Governmental and non-profit agencies who serve as an example of public/private partnerships to promote community-wide health. 3) Private industry, in this case Qualcomm Life's/2net platform where we demonstrate how to utilize existing services in novel ways to handle health data. Finally, this project will serve as a training ground in personalized population health for graduate students, post docs and medical residents.
|
0.976 |
2015 — 2018 |
Rosing, Tajana (co-PI) [⬀] Patrick, Kevin (co-PI) [⬀] Dasgupta, Sanjoy Griswold, William |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cps: Ttp Option: Synergy: Collaborative Research: Calibration of Personal Air Quality Sensors in the Field - Coping With Noise and Extending Capabilities @ University of California-San Diego
All cyber-physical systems (CPS) depend on properly calibrated sensors to sense the surrounding environment. Unfortunately, the current state of the art is that calibration is often a manual and expensive operation; moreover, many types of sensors, especially economical ones, must be recalibrated often. This is typically costly, performed in a lab environment, requiring that sensors be removed from service. MetaSense will reduce the cost and management burden of calibrating sensors. The basic idea is that if two sensors are co-located, then they should report similar values; if they do not, the least-recently-calibrated sensor is suspect. Building on this idea, this project will provide an autonomous system and a set of algorithms that will automate the detection of calibration issues and preform recalibration of sensors in the field, removing the need to take sensors offline and send them to a laboratory for calibration. The outcome of this project will transform the way sensors are engineered and deployed, increasing the scale of sensor network deployment. This in turn will increase the availability of environmental data for research, medical, personal, and business use. MetaSense researchers will leverage this new data to provide early warning for factors that could negatively affect health. In addition, graduate student engagement in the research will help to maintain the STEM pipeline.
This project will leverage large networks of mobile sensors connected to the cloud. The cloud will enable using large data repositories and computational power to cross-reference data from different sensors and detect loss of calibration. The theory of calibration will go beyond classical models for computation and physics of CPS. The project will combine big data, machine learning, and analysis of the physics of sensors to calculate two factors that will be used in the calibration. First, MetaSense researchers will identify measurement transformations that, applied in software after the data collection, will generate calibrated results. Second, the researchers will compute the input for an on-board signal-conditioning circuit that will enable improving the sensitivity of the physical measurement. The project will contribute research results in multiple disciplines. In the field of software engineering, the project will contribute a new theory of service reconfiguration that will support new architecture and workflow languages. New technologies are needed because the recalibration will happen when the machine learning algorithms discover calibration errors, after the data has already been collected and processed. These technologies will support modifying not only the raw data in the database by applying new calibration corrections, but also the results of calculations that used the data. In the field of machine learning, the project will provide new algorithms for dealing with spatiotemporal maps of noisy sensor readings. In particular, the algorithms will work with Gaussian processes and the results of the research will provide more meaningful confidence intervals for these processes, substantially increasing the effectiveness of MetaSense models compared to the current state of the art. In the field of pervasive computing, the project will build on the existing techniques for context-aware sensing to increase the amount of information available to the machine learning algorithms for inferring calibration parameters. Adding information about the sensing context is paramount to achieve correct calibration results. For example, a sensor that measures air pollution inside a car on a highway will get very different readings if the car window is open or closed. Finally, the project will contribute innovations in sensor calibration hardware. Here, the project will contribute innovative signal-conditioning circuits that will interact with the cloud system and receive remote calibration parameters identified by the machine learning algorithms. This will be a substantial advance over current circuits based on simple feedback loops because it will have to account for the cloud and machine learning algorithms in the loop and will have to perform this more complex calibration with power and bandwidth constraints. Inclusion of graduate students in the research helps to maintain the STEM pipeline.
|
0.976 |
2022 — 2027 |
Dasgupta, Sanjoy Wang, Yusu (co-PI) [⬀] Chaudhuri, Kamalika (co-PI) [⬀] Mazumdar, Arya (co-PI) [⬀] Saha, Barna [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Encore: Institute For Emerging Core Methods in Data Science @ University of California-San Diego
The proliferation of data-driven decision making, and its increased popularity, has fueled rapid emergence of data science as a new scientific discipline. Data science is seen as a key enabler of future businesses, technologies, and healthcare that can transform all aspects of socioeconomic lives. Its fast adoption, however, often comes with ad hoc implementation of techniques with suboptimal, and sometimes unfair and potentially harmful, results. The time is ripe to develop principled approaches to lay solid foundations of data science. This is particularly challenging as real-world data is highly complex with intricate structures, unprecedented scale, rapidly evolving characteristics, noise, and implicit biases. Addressing these challenges requires a concerted effort across multiple scientific disciplines such as statistics for robust decision making under uncertainty; mathematics and electrical engineering for enabling data-driven optimization beyond worst case; theoretical computer science and machine learning for new algorithmic paradigms to deal with dynamic and sensitive data in an ethical way; and basic sciences to bring the technical developments to the forefront of health sciences and society. The proposed institute for emerging CORE methods in data science (EnCORE) brings together a diverse team of researchers spanning the afore-mentioned disciplines from the University of California San Diego, University of Texas Austin, University of Pennsylvania, and the University of California Los Angeles. It presents an ambitious vision to transform the landscape of the four CORE pillars of data science: C for complexities of data, O for optimization, R for responsible learning, and E for education and engagement. Along with its transformative research vision, the institute fosters a bold plan for outreach and broadening participation by engaging students of diverse backgrounds at all levels from K-12 to postdocs and junior faculty. The project aims to impact a wide demography of students by offering collaborative courses across its partner universities and a flexible co-mentorship plan for truly multidisciplinary research. With regular organization of workshops, summer schools, and seminars, the project aims to engage the entire scientific community to become the new nexus of research and education on foundations of data science. To bring the fruit of theoretical development to practice, EnCORE will continuously work with industry partners, domain scientists, and will forge strong connections with other National Science Foundation Harnessing Data Revolution institutes across the nation.<br/><br/>EnCORE as an institute embodies intellectual merit that has the potential to lead ground-breaking research to shape the foundations of data science in the United States. Its research mission is organized around three themes. The first theme on data complexity addresses the complex characteristics of data such as massive size, huge feature space, rapid changes, variety of sources, implicit dependence structures, arbitrary outliers, and noise. A major overhaul of the core concepts of algorithm design is needed with a holistic view of different computational complexity measures. Faced with noise and outliers, uncertainty estimation is both necessary, and at the same time difficult, due to dynamic and changing data. Data heterogeneity poses major challenges even in basic classification tasks. The structural relationships hidden inside such data are crucial in the understanding and processing, and for downstream data analysis tasks such as in visualization and neuroscience. The second theme of EnCORE aims to transform the classical area of optimization where adaptive methods and human intervention can lead to major advances. It plans to revisit the foundations of distributed optimization to include heterogeneity, robustness, safety, and communication; and address statistical uncertainty due to distributional shift in dynamic data in control and reinforcement learning. The third and final theme of EnCORE proposes to build the foundations of responsible learning. Applications of machine learning in human-facing systems are severely hampered when the learned models are hard for users to understand and reproduce, may give biased outcomes, are easily changeable by an adversary, and reveal sensitive information. Thus, interpretability, reproducibility, fairness, privacy, and robustness must be incorporated in any data-driven decision making. The experience and dedication to mentoring and outreach, collaborative curriculum design, socially aware responsible research program, extensive institute activities, and industrial partnerships would pave the way for a substantial broader impact for EnCORE. Summer schools with year-long mentoring will take place in three states involving a large demography. Joint courses with hybrid, and fully online offerings will be developed. Utilizing prior experience of running Thinkabit lab that has impacted over 74,000 K-12 students so far, EnCORE will embark on an ambitious and thoughtful outreach program to improve the representation of under-represented groups and help create a future generation of workforce that is diverse, responsible, and has solid foundations in data science.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.976 |
2022 — 2025 |
Dasgupta, Sanjoy Rosing, Tajana (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Iis: Ri: Medium: Lifelong Learning With Hyper Dimensional Computing @ University of California-San Diego
The use of artificial intelligence (AI) has enabled computers to solve some problems that were out of reach just a decade ago, such as recognizing familiar objects in images, or translating between languages with reasonable accuracy. In each case, a specific task (such as "translate spoken Mandarin into spoken Spanish") is defined, data is collected (consisting, say, of utterances in the two languages), and an AI system is trained to achieve this functionality. To further expand the scope of AI, it is important to build systems that are not just geared towards highly-specific and static predefined tasks, but are able to take on new tasks as they arise (new words, new accents, and new dialects, for instance). This is often called "lifelong learning", and it means, basically, that the systems are adaptive to change. This project develops an approach to lifelong learning using a brain-inspired framework for distributed computing, yielding machines that potentially can solve tasks more flexibly and consume significantly less power than traditional AI systems. It will: (1) advance the ability of AI systems to handle changing environments, (2) enable a host of new low-power AI systems with applications such as environmental sensing, (3) strengthen mathematical connections between computer science and neuroscience, and (4) serve as the basis for educational and outreach activities.<br/><br/>This project will develop lifelong learning within the framework of "hyperdimensional computing", a neurally-inspired model of computation in which information is encoded using randomized distributed high-dimensional representations, often with limited precision (e.g., with binary components), and processing consists of a few elementary operations such as vector summation. We will build HD algorithms for some fundamental statistical primitives -- similarity search, density estimation, and clustering -- and then use these as building blocks for various forms of lifelong learning. These will rest on mathematical advances in (1) the analysis of sparse codes produced by expansive random maps and (2) algorithmic exploitation of kernel properties of high-dimensional randomized representations. Our algorithms will be implemented in hardware, deployed on a network of low-power sensors, and evaluated experimentally in a lifelong learning task involving air quality sensing.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.976 |