2002 — 2008 |
Sahai, Amit (co-PI) [⬀] Charikar, Moses Arora, Sanjeev [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: New Directions in Clustering and Learning
New directions in clustering and learning
Faced with ever-larger amounts of data, researchers, government institutions, corporations and even the general public seek tools that help them deal with large bodies of information, identify patterns in it, learn what these patterns mean, and act upon that information in a timely fashion. Developing such tools involves a novel and interesting blend of algorithms, statistics, AI, and machine learning. The project assembles a team of experts (four from academia and two from industry) in these areas to attack an interesting and meaningful subset of such problems which have the general flavor of clustering or learning.
The defining philosophy of this proposal is that no clear boundary Separates the twin notions of clustering and learning. Clustering is usually driven by the end goal of learning, but can also be viewed as a learning task in itself since it results in a more compact description of the data. By the same token all learning involves clustering of some sort, and in fact this viewpoint is implicit in recent papers in the learning literature. The project takes an integrated view of the entire problem of learning patterns in data, starting from streaming computations that might produce representative sketches of the data as it streams by, to problems of clustering data into meaninful patterns (with attendant problems of outlier removal, multiobjective optimization etc.), to learning algorithms that fit sophisticated models (SVMs, bayesian nets, gaussian mixtures etc.) for inference and reasoning tasks.
The investigators believe that all these disparate algorithmic efforts have unifying ideas. Furthermore, their synergistic approach throws up several interesting ideas of its own that could lead to significant advances. Examples: include using coding theoretic ideas in disparate applications such as Multiclass learning (a broad class of learning problems including text and speech categorization, part-of-speech tagging, gesture recognition etc.) and shape recognition in vision; the use of clustering ideas to do dimension reduction (offering an alternative to popular SVD based approaches), and using ideas from approximation algorithms and clustering to do near-optimal model fitting for models such as bayesian nets.
The project also includes a management and educational plan that involves dissemination of the ideas of this research through development of new courses and also pieces of learning software that will be placed in the public domain. Algorithms developed in as part of this project will be tested on large datasets, including those obtained from Google Inc. Some algorithmic ideas will also be implemented in industry (including Google).
|
0.954 |
2003 — 2009 |
Charikar, Moses |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Approximation Algorithms - New Directions and Techniques
The theory of NP Completeness has shown that several naturally occurring problems are unlikely to have polynomial time algorithms. One approach to overcome this fundamental intractability for optimization problems has been to shift the focus from exact solutions to obtaining approximate solutions. The study of approximation algorithms has thus emerged as a rich and exciting field and recent advances have led to good approximation algorithms for several fundamental optimization problems. Despite these developments, several interesting and difficult open problems remain. A primary focus of this career development plan is studying the approximability of fundamental problems and attempting to close such gaps in our understanding.
The broad research goals of this career development plan are the following:
1. Developing new tools to devise approximation algorithms for problems on directed graphs. 2. Developing new techniques for metric approximations and embeddings as building blocks for approximation. 3. Investigating a systematic way to obtain and exploit strengthened SDPs as well as the use of strengthened SDPs for graph partitioning minimization problems and ordering problems. 4. Devise techniques to deal with scheduling problems with precedence constraints. 5. Extend the machinery of approximation algorithms to new settings such as information theoretic and algebraic problems.
The research program will involve students at all levels, from undergraduate projects on understanding the quality of LP and SDP relaxations through computational experiments, to the mathematical research suitable for Ph.D. students. The educational component of this project includes the development of courses geared towards disseminating new algorithmic ideas outside the theoretical computer science community; the techniques developed in the latest research will be distilled into new graduate and undergraduate courses. Course materials developed for such new courses will be made freely available to enable similar courses to be taught elsewhere.
|
0.954 |
2003 — 2004 |
Charikar, Moses |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Finite Metric Spaces and Their Applications |
0.954 |
2003 — 2007 |
Charikar, Moses Barak, Boaz [⬀] Yao, Andrew (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Foundations of Complexity Theory
Complexity theory addresses the question of how much resource is needed to solve a given computational problem. In recent years complexity theory has found connections and applications in many scientific and engineering disciplines. This project plans to investigate complexity theory across a broad front. The topics include communication complexity, decision trees, quantum algorithms, quantum cryptography, and other foundational issues in complexity theory. Te goal is to gain deep insights into the nature of computation, so that one can most effectively perform computational tasks in any model. It is expected that a variety of algebraic and combinatorial techniques will be called upon to explore these topics.
Two of the most interesting scientific developments in recent years spring from the realization that complexity theory can be utilized for cryptography, and that quantum mechanics can be used for performing powerful computation. Advances in these fronts have attracted interest from scientific communities at large, as their relevance is far-reaching. The research of this project could lead to substantial further progress in these two areas.
In the broader domain of public education, the Principal Investigator has given many speeches in recent years on various topics on complexity, cryptography and quantum computing. Many of these talks are geared toward general audiences and often undergraduate students. These lectures, if thoughtfully designed, can stimulate the intellectual curiosity of young minds and develop their interest in the computing sciences. With further advances of research in these areas, one can expect that public awareness and interest will be further enhanced to the good of a stronger scientific education.
|
0.954 |
2004 — 2006 |
Charikar, Moses Schapire, Robert (co-PI) [⬀] Osherson, Daniel (co-PI) [⬀] Fellbaum, Christiane [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Constructing An Enhanced Version of Wordnet
WordNet is an important lexical resource for research in areas including NLP and AI. This project initiates the development of a radically enhanced version of WordNet. Constructing WordNet+ involves a novel combination of empirical methods: human annotation, corpus analysis, and machine learning. WordNet+ specifically addresses some of WordNet's limited ability to identify word senses, stemming from the sparsity of Boolean arcs among sets of synonymous words ("synsets"). First, quantified, oriented arcs are to be added among a core set of 5,000 synsets. These arcs reflect evocation--the extent to which the meaning of one synset brings to mind another. Following the selection of the core synsets, a random subset of 250,000 arcs are to be elicited from annotators. The annotators, trained and tested for inter- and intra-reliability, record the strength of their mental associations using a specially designed and tested interface. The remaining arcs are to be extrapolated from the manually obtained arcs using machine learning algorithms.
All results will be made available to the research community: the core concepts, the indirect co-occurrence matrices, and all available ratings. Given WordNet's past contributions to a number of diverse disciplines, the initial stages of the construction of this research tool should stimulate great interest and have a significant impact on related work.
|
0.954 |
2004 — 2009 |
Charikar, Moses Barak, Boaz (co-PI) [⬀] Yao, Andrew (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr Collaborative Research: Ase-Dmc Computational Complexity of Interactive Computation
In the rapidly growing world of Internet infrastructures, we face many challenging new problems. These arise in part because the usual assumptions made in problems of this general type may no longer hold. For example, many typical questions dealing with massive data sets often involve networks or graphs of prohibitively large sizes. Only partial information can be obtained and in addition, this information is changing dynamically. There is an increasing need to develop the theoretical foundation for these myriad complex processes. In particular, there are many unresolved fundamental issues regarding the computational and informational-theoretic complexity of interactive computing, both in the classical setting as well as other emerging computational paradigms. In this proposal, we will investigate several inter-related areas : - Major open problems in communication complexity. - Two information-theoretic identification problems Guessing secrets and Finding favorites. - Two directions in quantum information processing quantum decision tree model and quantum communication complexity. - Using techniques in the study of the so-called "power law model" for realistic networks to develop new methods in the analysis of on-line algorithms. Impact Interactive computing is prevalent in almost all areas of computing and communcations with applications in numerous areas of science and engineering, such as security, finance, information retrieval, bioinformatics and beyond. However, the current state of the theoretical foundation for interactive computation is quite primitive and far from satisfactory. The proposed study on the computational complexity of interactive computation is meant to strengthen our understanding and provide insight that is crucial for the design and analysis of interactive algorithms. Because of the fundamental and far-reaching nature of the proposed work, this study will help bring together different areas and crossfertilization typically occur.
|
0.954 |
2005 — 2010 |
Charikar, Moses Arora, Sanjeev [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Mspa-McS: Embeddings of Finite Metric Spaces - a Geometric Approach to Efficient Algorithms
Geometry has become a central notion in algorithm design, in fields as diverse as bioinformatics and graph partitioning. This research is a concerted and unified attack on a large subset of the underlying mathematical problems, which often have to do with geometric embeddings of finite metric spaces. The concrete applications range from clustering and learning to compact representation of data to graph partitioning to nearest neighbor searching. Since the research spans a a variety of fields, the assembled team is multidisciplinary, involving analysts (Johnson and Naor), a geometer (Gromov), a discrete mathematician and combinatorialist (Linial) and algorithm designers (Arora and Charikar).
The research area emerging from the ongoing geometrization of algorithms is an exciting new frontier for both mathematics and computer science. For example, deep mathematical results such as Lipschitz extension may turn out to have applications to the practical problem of compactly representing computer sounds. In turn, algorithmic settings provide a fertile new ground for mathematical theory. The investigators study geometric representations for data and low disortion mappings into structured spaces. Metrics that arise in the design of approximation algorithms for NP-hard problems are studied, especially to understand their local versus global properties. The research develops new understanding for practically important metrics such as earth mover and edit distance metrics, which are defined in terms of computational effort and have thus not been studied in mathematics.
|
0.954 |
2005 — 2009 |
Charikar, Moses Li, Kai [⬀] Cook, Perry (co-PI) [⬀] Troyanskaya, Olga (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr-Pdos-Content-Searchable Storage For Feature-Rich Data
Storage capacity and data volume have been doubling every 18 months during the past two decades. A key challenging issue in building next-generation storage systems is to manage massive amounts of feature-rich (non-text) data, which has dominated the increasing volume of digital information. Comparing noisy, feature-rich data requires fast similarity match instead of exact match, and thus exploring such data requires similarity search instead of exact search. Current file systems are designed for named text files; they do not have mechanisms to manage feature-rich data. To date, there is no practical storage system with the ability to do similarity search for noisy, high-dimensional data and there is no index engine design for efficient similarity search. This research addresses this problem by studying how to design and implement a content-addressable and -searchable storage (CASS) system to manage and explore diverse feature-rich data. The system includes a built-in similarity search engine for general-purpose, noisy, highdimensional metadata using compact data structures and novel indexing methods. The research will also develop segmentation methods and feature extraction methods for audio, image and genomic data, and develop similarity search benchmarks and to evaluate the CASS system.
This research will advance knowledge and understanding in the area of storage system designs such as data structures, mechanisms, and APIs for managing, searching and exploring noisy, high-dimensional feature-rich data. The research will accelerate the development of next-generation storage systems which will revolutionize how to access, search, explore and manage massive amounts of feature-rich data in many disciplines.
|
0.954 |
2008 — 2014 |
Charikar, Moses Barak, Boaz (co-PI) [⬀] Tarjan, Robert (co-PI) [⬀] Arora, Sanjeev [⬀] Chazelle, Bernard (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Understanding, Coping With, and Benefiting From Intractibility.
Expeditions in Computing: Understanding, Coping with, and Benefiting from Intractability Computational intractability imposes a limit on the ability to understand nature and design systems. Intractability is a stumbling block for postmen, travelling salesmen, map colorers, and millions of others who would like to complete their tasks as efficiently as possible, yet it is the foundation of 21st century cryptography, which in turn is a pillar of electronic commerce. In order to understand, manage, and exploit intractability it is imperative that progress be made on proving intractability in many forms and computational models, and on unraveling the interconnections among different forms and uses of intractability. This Expedition will explore an array of diverse but interrelated topics in computational intractability including algorithms, complexity, cryptography, analysis, geometry, combinatorics, and quantum mechanics. A "Center for Intractability," the first of its kind, will be based at Princeton. Addressing some of the deepest and hardest theoretical problems standing in the way of significant advancements in computer science, this Expedition involves a high degree of collegial interactivity through collaborations among geographically local participating institutions. Outreach includes an active "Women in Theory" program as well as programs targeting undergraduate and high-school students.
|
0.954 |
2009 — 2013 |
Charikar, Moses |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Af: Small: Mathematical Programming Methods in Approximation
Mathematical programming techniques like linear programming and semidefinite programming have firmly established themselves as valuable tools in the approximation algorithm design toolkit. They are often used as tractable relaxations to hard combinatorial optimization problems and as design guides to obtain approximation algorithms. This project attempts to enhance our understanding of the strengths and weaknesses of mathematical programming techniques for several fundamental optimization problems and proposes to investigate general methods to strengthen our currently known mathematical programming techniques.
The broad goals of this project are the following: (a) Attempt to devise better algorithms for unique games ? a constraint satisfaction problem that is known to capture the limitations of current semidefinite programming methods used in approximation algorithms. Beating the current best algorithms will necessarily involve developing new techniques that overcome the limitations of current SDP approaches. (b) Understand the structure of strengthened relaxations obtained by lift-and-project procedures and develop techniques to exploit the additional information provided by these stronger relaxations to obtain better approximation algorithms. (c) Work towards closing large gaps in our understanding of the approximability of fundamental optimization problems.
Successfully achieving the project goals will entail significant advances in the state of the art for approximation algorithms. The research could potentially develop tools and techniques with broad applicability to several optimization problems. Course materials for graduate and undergraduate courses will be developed distilling research results of this project, as well as new developments in the field.
|
0.954 |
2012 — 2016 |
Charikar, Moses |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Af: Small: Approximation Techniques For Combinatorial Optimization
Approximation techniques are valuable in the design of algorithms for combinatorial optimization problems. Mathematical programming relaxations provide tractable versions of hard optimization problems that are useful in the design of good algorithms -- in some cases, these relaxations and their duals serve as a guide for the design of algorithms and lower bound proofs. Such approximation techniques are useful not just in traditional optimization problems, but also for problems in online algorithms and other areas. This project proposes to study a variety of problems where approximation techniques play a crucial role -- many of these questions are about basic problems in approximation algorithms, but many are about questions where insights from mathematical relaxations and other approximation techniques play an important role. The broad goals of this project include (a) Obtaining a better understanding of the use of lift-and-project relaxations for optimization problems like coloring and the closely related question of developing algorithmic techniques for graphs of low threshold rank, (b) Attempting to close gaps in our understanding of classical optimization problems like the traveling salesman problem and bin packing, and (c) Shedding light on newer problems like online versions of weighted matching and new flow formulations.
Optimization problems are ubiquitous and for many such problems of interest, we have strong evidence that it is impossible to obtain exact efficient solutions. To circumvent this intractability, we design efficient heuristics that may not find the best solution necessarily, but have guarantees that the solution they produce is not far from the optimal (i.e. is approximately optimal). Mathematical programming is a very important tool in designing such approximation algorithms. Successfully achieving the project goals will require advances in our knowledge of this area, and especially new insights into the powerful and versatile mathematical programming toolkit. As part of this project, graduate and undergraduate students will be trained by involving them in these research activities. Course materials for graduate and undergraduate courses will be developed distilling research results of this project, as well as new developments in the field.
|
1 |
2013 — 2017 |
Charikar, Moses Arora, Sanjeev [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Af: Medium: Towards Provable Bounds For Machine Learning
Many areas of machine learning (ML) currently rely on heuristic algorithms with no performance guarantees, and, in fact, the underlying problems are computationally intractable. The proposed project will lead to new machine learning algorithms with provable guarantees. The PIs will leverage theoretical ideas from average case analysis, semi-random models, approximation, solution stability, and approximate linear algebra, and develop new tools and techniques. Benefits from this endeavor will occur across fields. Theoretical computer science will benefit by gaining new problems and research agendas, and further development of algorithmic and mathematical tools. Machine learning will benefit by gaining new models (hopefully, more tractable than the ones currently in use), new modes of thinking, and new algorithms. The task of proving bounds on algorithms will provide insight into when they do or do not work, as well as suggest modifications to existing heuristics.
The project will involve training a new generation of graduate students and postdocs who will be fluent both in theoretical algorithms and machine learning. The PIs gained experience in delivering this kind of training and mentoring during the past couple of years and will continue this, including working with the undergraduate students. The PIs will disseminate the ideas of this project by lecturing, teaching, and creating new course materials, including videos. One specialized workshop will also be organized, devoted to the topics of study. Open-source software implementation will be released based upon any new learning algorithms that are discovered.
|
0.954 |
2014 — 2015 |
Charikar, Moses |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Funding Application For the Fourth Biennial Women-in-Theory Workshop (Wit)
The PIs are organizing the 4th Biennial Women-in-Theory Workshop. This workshop is to take place in New York City in May 2014 immediately before the 46th ACM Symposium on Theory of Computing (STOC 2014), in coordination with New York University and Princeton University. The intended audience of the workshop is graduate students in the field of theoretical computer science (TCS). Motivated by the low overall number of female faculty members, researchers, and students in TCS world wide, the workshop has two goals. The first is to deliver an invigorating educational program to the student participants by inviting leading female researchers to present tutorials of their research topics. The second is to provide an outstanding opportunity to bring together women students from different institutions across the country and internationally, so as to foster a sense of kinship and camaraderie, and to provide access to role models in this area by having senior and junior faculty members and industrial researchers present. The format of this workshop consists of more than ten technical tutorials, one non-technical talk, a student rump session, and a panel discussion.
The first three Women-in-Theory workshops were greeted with great enthusiasm both from the participants and from the TCS research community at large. Response has been strong for this year's workshop from female students at universities across the United States. Since the first workshop in 2008, the PIs have received overwhelmingly positive feedback and many participants have matured to be successful researchers. These serve as testimonials that the workshop is accomplishing its primary goal of helping women become more successful in TCS.
|
0.954 |
2016 — 2019 |
Charikar, Moses |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Af: Small: New Perspectives On Mathematical Programming Relaxations
The project will study methods for designing efficient heuristics with provable guarantees for hard optimization problems. Optimization problems arise in many different contexts (e.g. finding the best routes for delivery trucks, finding clusters in a social network, etc). We now know that for many such problems, we do not expect to design algorithms that are efficient (i.e. run in polynomial time) that produce optimal solutions. The field of approximation algorithms develops principled methods to design algorithms for such hard optimization problems. This project will advance our knowledge of mathematical programming relaxations ? a widely used tool in the design of such approximation algorithms. The idea here is as follows: First, express the problem in terms of integer variables with constraints on them. Next relax the problem by allowing variables to take on fractional values. The relaxed version can be solved exactly in polynomial time. Finally, map solutions to the fractional version back to integers, while attempting to maintain the quality of the solution as far as possible. The project will attempt to advance our understanding of the power of this technique and its limitations for fundamental problems in the field. In particular, the PI will investigate structure in problem instances where exact solutions to the original problem can be obtained from solving the relaxation, study new relaxations for basic problems such as k-means clustering. The PI will also study relaxations for problems where the goal is find a dense structure inside a graph and problems involving laying out the vertices of a graph on a line.
Course materials for graduate and undergraduate courses will be developed distilling research results from this project, as well as new developments in the field. The exact recovery questions investigated in this project for mathematical programming relaxations are of great interest in communities other than theoretical computer science; new insights here have the potential to influence and impact these other communities. Undergraduates will be involved in appropriately chosen pieces of the project so as to expose them to research at an early stage. The PI will encourage the participation of women and minorities in this research. He will organize outreach and community building activities at Stanford, leveraging his experience with organizing such activities in the past.
This project will study and shed new light on mathematical programming relaxations for hard combinatorial optimization problems in various settings. The broad goals of this project are:
1. Exact recovery via mathematical programming: Investigate and understand the phenomenon of mathematical programming relaxations returning integer solutions. The PI will apply this new lens to linear programming (LP) and semidefinite programming (SDP) relaxations for several problems. Specifically, the PI will study SDP relaxations for computing transformation invariant metrics on sets of points and solving approximate graph isomorphism for correlated random graph models (introduced recently in the study of de-anonymization and privacy in social networks).
2. Clustering problems: Investigate a recently introduced SDP relaxation for the k-means objective to obtain a better worst case approximation factor. The PI will also investigate the performance of this relaxation on random models of clustered points that better represent instances in practice.
3. Densest subgraph: Investigate hypergraph generalizations of the k-densest-subgraph problem and a close variant, the smallest-m-edge-subgraph, to understand the best approximation factor achievable and whether natural planted instances of random hypergraphs suggest a natural limit for efficient approximation algorithms. The PI will also study lower bounds for densest subgraph, to understand the Lasserre hierarchy for planted instances that are predicted to be the hardest for the problem.
4. Layout problems: The PI will study two different classical optimization problems involving mapping the vertices of a (un)directed graph to a line: minimizing (a) Cutwidth, and (b) Feedback Arc Set, to improve the worst case approximation factors. For cutwidth, the PI will investigate a new mathematical programming relaxation motivated by lift-and-project hierarchies. For Feedback Arc Set, the PI will study a new semidefinite programming relaxation.
5. Constructive Discrepancy: Recent advances in designing efficient algorithms for minimizing discrepancy suggest that new breakthroughs are possible. A recent result builds on a breakthrough on the Kadison-Singer problem to show that the integrality gap of a natural relaxation for Asymmetric Traveling Salesman is small. The PI will investigate whether this existential result can be made algorithmic. The PI will also study the Beck-Fiala and Komlos conjectures, where efficient algorithms can lead to new progress.
|
1 |