1993 — 1997 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Ria: Efficient Algorithms For Computational Graph Theory @ University of Maryland College Park
Computational graph theory has emerged as a powerful medium for the development of algorithmic techniques (e.g., for dynamic algorithms, online algorithms, parallel algorithms, approximation algorithms, data structure design, etc.). Graph theory provides an excellent way to model problems related to transportation, network design, optimization, VLSI and many other fields. This allows one to both develop and illustrate the power of different algorithmic tools. This project studies two aspects of efficient algorithm design. The first aspect is the study of intractable problems (many optimization problems fall under this category), and the design of efficient algorithms to obtain approximate solutions. The second aspect is the study of problems that are known to be solvable exactly in polynomial time, but there is a wide gap between the known upper and lower bounds. The specific problems studied are problems that are becoming important in the design of efficient and reliable communication networks. They address issues that are relevant for higher reliability of communication at low cost, and are of interest in network design and in the implementation of broadcast protocols. Some of the tools and ideas that are used in the solutions for these problems will have applications to solving other problems as well.

1 
1994 — 1998 
Smith, Carl (coPI) [⬀] Gasarch, William (coPI) [⬀] Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Capitol Area Theory Seminar: College Park, Md: Fall of 1994 @ University of Maryland College Park
9401842 Khuller Over recent years, the Capital Area Theory Seminar has been particularly active, hosting a talk every other week. Some of the speakers for the seminar are local, and others are visitors to the DC area. The mailing list has 107 entries, some of which are other mailing lists. The goal of the seminar is to stimulate activity in theoretical computer science. In this regard, the seminar has been quite successful. The seminar is often attended by researchers from Univ. of Maryland at Baltimore County, Johns Hopkins Univ. and Georgetown University. ***

1 
1995 — 1999 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Career: Approximation Algorithms For GraphTheoretic Problems @ University of Maryland College Park
Graphs are powerful tools that can be used to model objects and relationships between objects. Indeed, many distinct problems can be cast as optimization problems in graph theoretic terms. Specifically, graph theory provides an excellent way to model problems related to transportation, network design, scheduling, robotics and many other areas. It is used to develop and illustrate the power of different algorithmic tools. There are three broad areas from which the particular problems considered in this proposal are abstracted: (1) Network Design: How can a network that connects a given set of sites be built? How can the network be made reliable, and also provide high bandwidth between sites (while keeping the cost low)? Such fundamental questions are addressed here. (2) Motion Planning: How does an agent navigate in an environment? What are good navigation strategies for navigating in unknown environments? In this case the agent discovers the environment by actually touching obstacles. How does an agent record an approximate map of the environment, when working with limited memory? (3) Scheduling: Precedence based scheduling problems arise in the context of manufacturing and compilation for parallel machines. How can effective heuristics for such problems be designed? For most of these problems it is very difficult to obtain even an approximate solution in polynomial time. Education Plan: Most computer scientists are constantly faced with the task of designing algorithms to solve problems. Although these arise in many different contexts, some basic principles underlie the design of efficient algorithms and data structures. As a part of the Education Program of this CAREER award, the PI is planning: (a) to undertake the training of students in designing algorithms, and (b) to teach them the importance of reexamining existing algorithms so as to make them simpler and more efficient. An algorithms lab where students undertake projects, implement algorithms, and act as consultants to students in other research groups is a significant step in this direction.

1 
1998 — 2000 
Smith, Carl (coPI) [⬀] Gasarch, William (coPI) [⬀] Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Capital Area Theory Seminar @ University of Maryland College Park
Basic research in theoretical computer science is studied. Fundamental questions, about the analysis of algorithms and complexity theory are researched. This seminar series focuses on inviting distinguished outside speakers who are able to give an overview of a research area to first and second year graduate students. It is hoped that the interaction between graduate students, researchers in the DC metropolitan area, and outside visitors will be very productive and useful in stimulating new research and facilitating new collaborations. Students will also get an opportunity to present their work to the faculty. A special effort will be made to involve honors undergraduate students in this endeavor.

1 
1999 — 2003 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Designing Algorithms For NpHard Graph Problems @ University of Maryland College Park
CCR9820965 Khuller, Samir
This project will conduct a combination of theoretical and practical explorations of algorithms for graph theoretic problems in graph connectivity, facility location, distance preserving subgraphs, multimedia storage and data broadcast. Emphasis will be on good approximations for NPhard problems for problems in optimization. Particular emphasis will be on greedy methods and heuristics for algorithms.

1 
2001 — 2005 
Golubchik, Leana (coPI) [⬀] Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Itr/Sy: Algorithms For Data Storage and Movement @ University of Maryland College Park
The central focus of this proposal is the development of efficient algorithms for the storage and movement of data. Specifically, we are interested in algorithms that impact the performance of large multimedia data storage systems. In algorithmic terms, some of the principal challenges that arise in the context of multimedia data storage are: (a) deciding how many copies of each data item need to be stored, (b) determining the exact layout of data on a set of servers, (c) dealing with changing workloads and dynamic data access patterns. These related challenges require the development of efficient algorithms for optimizing data layout to maximize client satisfaction, monitoring the performance of data storage systems and scheduling the movement of large amounts of data.
Futhermore, what makes the issues that we consider even more significant is the fact that data storage and movement issues also arise within publicly share networks such as the Internet where the bandwidth can be dynamic and highly variable, and can result in a poor choice of paths chosen to transfer data in the network. One way to address this issue is to route data through specific holding points. By doing this we are able to increase throughput and decrease completion times by an order of magnitude to transfer data from several sources to a single destination. Algorithms related to this problem have been developed by us and are being tested with the Bistro framework, which is a framework for providing a data upload service such as one required by IRS for tax submission purposes. Our data movement algorithms are being used to schedule the transfer of data from many different locations to a final destination server.
While some specific instances of the individual problems have been considered earlier, there is no work dealing comprehensively with the range of issues that we focus on.

1 
2001 — 2007 
Samet, Hanan [⬀] Khuller, Samir Golubchik, Leana (coPI) [⬀] 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Digital Government: Scalable Data Collection Infrastructure For Digital Government Applications @ University of Maryland College Park
EIA 0091474 Hanan Samet University of Maryland
Digital Government: Scalable Data Collection Infrastructure for Digital Government Applications
This project in collaboration with the Internal Revenue Service, the Department of Justice, and the US Census Bureau will explore the collection of data over widearea networks. Of particular interest are situations with high submission volume from a variety of sites to one agency (hot spots); e.g. the submission of electronic tax forms to IRS in the April 1414 timeframe. Within that context, research issues will include scalability, and privacy/integrity. Other possible government applications include submissions of proposals at deadlines, such as the NSF Fastlane system, and voting in digital democracy. The technology to be developed will work at the application layer of TCP/IP, implementing a small set of primitive services.

1 
2004 — 2009 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Techniques in Approximation Algorithms @ University of Maryland College Park
This project aims to study the approximability measures and the design of heuristics for NPhard optimiza tion problems. This endeavor will result in the development of techniques that enable a deeper understanding of the principles that underlie the design of approximation algorithms for NPhard problems. An approxima tion algorithm is a polynomial time algorithm that produces solutions provably \close" to optimal. Many of the problems we study are graph theoretic. Graphs are powerful tools that can be employed to model objects, as well as the relationships between them. Indeed, many distinct optimization problems can be cast in graph theoretic terms. Specifically, graph theory provides an excellent way to model problems related to areas such as transportation, network design/routing, facility location, and cluster analysis. This powerful framework allows us to both develop and illustrate the power of different algorithmic tools. In addition to graph problems, we also study some fundamental scheduling, data management and broadcasting problems. Intellectual Merit: An increased understanding of techniques to develop and apply approximation algo rithms will have a significant impact on both research and practice. As we are well aware, NP complete problems are abundant in many scientific and engineering disciplines, any technique that effectively copes with this diffculty will have a significant impact on the design of heuristics. Broader Impact: The broader scientific goals are to both further the field of algorithms in terms of our understanding of what problems can be solved efficiently and what problems cannot. In addition, we explore applications of these algorithms in other areas such as networking, GRID computing, parallel computing etc. The project will train two full time PhD students in conducting research both in Universities and through internships during the summer at National Labs and Industrial Research Centers.

1 
2007 — 2012 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Ccf: Fundamental Algorithms For Data Management @ University of Maryland College Park
Fundamental Algorithms for Data Management
We are in an era where data (entertainment, media) is expected to be available on demand, and not primarily through broadcast methods such as TV. Storage technologies will need to cope with extremely high demand, with high quality of service. Storing large amounts of data and making it accessible to a very large number of users will be a challenge. The focus of this research is to develop fundamental algorithms for managing data in large scale storage systems. This research focuses on data of all types, ranging from multimedia data stored on a collection of disks, as well as data that is being collected and stored in a distributed sensor network. This research addresses the question of how to develop algorithms to manage this data effectively and efficiently. This project will involve training of a graduate student, and the development of a course on data management methods.
This research develops tools and methods for managing data, with a special focus on data layout and data migration. The data layout specifies for a storage system, how many replicas are required to cope with the demand, and which servers they should reside on. This problem is NPhard and our goal is to develop fast and practical approximation algorithms for it. The goal of data migration is to focus on the problem of reorganizing the layout when the demand pattern changes over time. We also consider energy issues in data collection and monitoring in sensor networks.

1 
2009 — 2013 
Khuller, Samir Srivastava, Ankur (coPI) [⬀] Deshpande, Amol (coPI) [⬀] 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Optimization Algorithms For LargeScale, ThermalAware Storage Systems @ University of Maryland College Park
This project investigates optimization problems that arise while performing thermal management in very large data storage centers. To satisfy the growing data management needs, such storage centers contain possibly hundreds of thousands of hard disks and other components, and typically are consistently active. These generate a lot of heat, and hence the storage system must be cooled to maintain reliability, resulting in significant cooling costs. The cooling mechanism and the workload assignments in a storage center are intricately tied together. This project is developing a general science of thermal management for large scale storage systems, by focusing on thermal modeling and management at different levels of the system hierarchy. Thermal aware techniques for allocating data access tasks to specific disks on which data is located, for controlling the schedules and speeds of thousands of tasks and disks to optimize quality of service, and for reorganizing data layouts on disks are being developed. This project will enable better thermal management in data storage centers, which can potentially result in significant reductions in the carbon footprint caused by those. The project will train several Ph.D. students in conducting research both at the University, and through internships at Industrial Research Labs.

1 
2010 — 2013 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Collaborative Research: Broader Impacts For Research and Discovery Summit @ University of Maryland College Park
This CISE special project funds the organization of the CISE Broader Impacts Summit. The goals of the summit include educating the CISE community about the NSF Broader Impacts criterion how to evaluate this criterion as part of a research project. The Summit should provide researchers and educators with the opportunity to forge collaborations and build longterm partnerships that may lead to a wide range of activities with the potential to enhance CISE research projects. The Summit is to be organized by a Steering Committee that represents all CISE divisions and a range of CISE disciplines. The investigators plan to discuss, present, and subsequently develop guidance materials for the CISE computing research community on how to effectively integrate broader impact activities into research projects.
Intellectual Merit: The intellectual merit of this project will arise from the working group discussions at the summit. At the end of the summit, a report is to be developed that synthesizes both current and future ideas in each of the five broader impact categories. This should provide research based methods for CISE investigators to connect their research and broader impacts in innovative ways.
Broader Impacts: The broader impacts of this project will arise from the computing community members, by those to whom they speak about the summit, and by those who access materials created after the summit. The project includes dissemination plans to ensure the efforts from this project have an impact on future NSF projects. The impact on future NSF projects should result in significant broader impacts from CISE research projects across all five broader impact categories.

1 
2012 — 2016 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Af: Small:Efficient Data Management Algorithms @ University of Maryland College Park
Within the last decade, with the growth of video ondemand applications and the explosion of data collected via sensors and other devices, computation over massive data sets is becoming the ubiquitous norm. This development calls for a deeper understanding of several issues specific to such contexts. The first issue is one of data placement; when data is located more closely to the demand, the need for network resources is reduced. The second issue pertains to energy minimization: how can one develop algorithmic methods to make data processing more efficient? Both of these issues lead to a host of interesting questions in the vein of scheduling and facility location type problems. In an effort to address these issues, this project focuses on the development of algorithms that manage data storage for processing, with energy efficiency as the primary consideration.
Much of the prior scheduling literature assumes a jobcentric perspective  algorithms are developed to optimize tardiness, completion time, makespan, etc. In contrast, this work is motivated by a systemcentric view in which utilizing resources in an "efficient'' way is of the utmost priority, subject to individual jobs being completed in a "satisfactory'' manner. Such efficiencies are primarily manifested in the form of the energy cost incurred by the system. These problems are particularly eminent in the context of large scale storage devices and data centers. The main focus is on data of all types, ranging from multimedia data stored on a collection of disks to data collected and stored in a distributed storage system. The amount of data to be stored and efficiently accessed is increasing at an unsustainable rate. The costs for managing this data are expected in turn to grow significantly. The main question is how can one develop scheduling algorithms to manage this data effectively and efficiently.
Data centers are fast becoming integral to society and have transformed everything from social networking to human communication to scientific collaboration, computation, and data exchange. This research will lead to increased efficiencies in this critical infrastructure. The project will train graduate students in conducting research both at universities and through internships at industrial research labs during the summer. Extensive mentoring and involvement of undergraduate students and women is expected. Over the last few years, the PI has developed a new course on "Science behind Computing'' and is working on a book for this course, the primary purpose of which is to educate the general public about important scientific concepts related to computing in the 21st century.

1 
2013 — 2016 
Gasarch, William (coPI) [⬀] Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Reu Site: Caar: Combinatorial Algorithms Applied Research @ University of Maryland College Park
This funding will establish a new CISE Research Experiences for Undergraduates (REU) Site at the University of MarylandCollege Park. This Site focuses on research in combinatorial algorithms, which are algorithms that work on discrete objects such as numbers, graphs, or data where one needs to optimize over a space of possible solutions. To help create a climate in which students learn new techniques, understand a few real world problems, and work on designing and implementing algorithms, the students will be involved in applied algorithms research projects that will be jointly advised by members of the UMD theory group in conjunction with faculty in different research areas, such as systems, artificial intelligence, and databases.
This REU Site will support eight undergraduate students for ten weeks each year. The Site will recruit computer science, computer engineering, and mathematics majors, with a focus on students from Historically Black Colleges and Women's colleges. The Site will help bridge the gap between theory and practice, which will benefit both as it gives theorists new and exciting problems and nontheorists tools they can use. The students will have the opportunity to see what research is like, obtain a solid grounding in algorithms, and encourage them to go to graduate school.

1 
2014 — 2015 
Khuller, Samir Daume, Hal [⬀] 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Eager: Discrete Algorithms in Nlp @ University of Maryland College Park
Algorithms that can understand human language must be able to recognize the underlying structure (e.g., subjectverbobject) of that language. Computational approaches developed in the natural language processing community typically have build ad hoc, oneoff algorithms for solving the hard, combinatorial optimization problems that arise in such tasks. Most largescale systems are built using complex combinations of heuristics applied to try to make approximate search techniques better. Concurrently, the algorithms community has developed scalable exact algorithms and approximation algorithms for solving many of these hard combinatorial optimization problems. This EArly Grant for Exploratory Research investigates the connection between these two extremes: the language processing community with the hard problems they need solved, and the algorithms community with the provably correct algorithms for solving such hard problems. The biggest technical challenge this exploration addresses is how to couple the statistical learning algorithms necessary to build effective language applications with the types of abstractions that make efficient algorithms possible. In particular, this project explores the application of "inverse optimization" to machine learning. For example, if one has access to an efficient algorithm for solving a particular discrete optimization problem, how can one learn parameters that make that particular algorithm as high accuracy as possible? Success in this project will give rise to theoretically principled, efficient algorithms for learning to solve complex linguistic tasks, which can transform to downstream applications like machine translation, automatic question answering and information retrieval.
This project's main technical innovation is the coupling of "inverse optimization" problems with online learning techniques. For instance, suppose that the end goal is to find some particular structure. The search for this structure can often be cast as a particular form of dynamic programming problem, which in turn often becomes a shortest path problem in a hypergraph. The machine learning challenge then is to learn a model under which the solution to this shortest path search is actually the desired structure. From an algorithmic perspective, this requires finding a set of inputs under which a given structure is optimal: inverse optimization. However, it is not enough for a given structure to be optimal: it must also beat all other (nonoptimal) structures by some given margin. This project will develop a combination of online learning algorithms and inverse optimization formulations that enable such advances.

1 
2016 — 2017 
Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Eager: Algorithms For Data Set Versioning: Store or ReCreate? @ University of Maryland College Park
Technologically facilitated access to large data sets is increasingly emerging as key to scientific research in areas ranging from medicine to climate change with teams of researchers simultaneously engaged in accessing, modifying and cleaning data sets. Not surprisingly, such collaborative datause has engendered substantial challenges related to data management. Indeed, the continuous modification of largescale data sets frequently results in the creation of thousands of versions of data sets over time, especially as multiple users? access and edit the data over time. Such proliferation raises some basic questions: Should all versions of a document be saved? While this is certainly convenient, the storage costs may be prohibitively high. Alternatively, should only a certain version be saved? In this case, while the storage costs are low, the cost of recreating a particular version can rise significantly due to the effort involved in making changes to an existing version. This project focuses on the fundamental challenges arising from balancing storage needs with efficient retrieval of information in the context of big data. Thus the primary research goal of this proposal is to design provably good algorithms that will not only result in a deeper understanding of the storage and recreation tradeoff but will also contribute to the development of effective data storage systems that are based on a sound theoretical foundation.
In previous NSFfunded projects, the PI has collaborated extensively and successfully with women and high school students and this project will also involve similar collaborations. Over the course of the past five years, the PI has graduated three women PhDs and is currently advising another three. He has also worked with several women undergraduates who are now pursuing doctoral degrees. Additionally, the PI has played a key role establishing connections with the national Braid project, supporting the departmental chapter of the Association of Women in Computing and organizing events and activities focused on bringing in established women computer scientists as role models for current students.
This fundamental problem can be modeled within a graph theoretic framework, as a directed weighted graph. Each node denotes a version. In the general form each edge (a,b) has two associated parameters  a weight denoting the storage cost to generate version b, given a copy of a and a cost denoting the cost to actually perform the computation of converting a to b. While both these are closely related, they could be different. In addition, the edge weights and costs can be wildly asymmetric. The primary reason for this is that when a new version is created by deleting data, we can simply specify that a significant portion of the data is deleted, however the reverse operation of insertion needs to actually specify the data to be inserted. In this framework, the goal is to compute a rooted tree and the structure and depth of the tree controls the storage and recreation tradeoff. While there exists a deep understanding of this problem for undirected graphs, none of those methods work effectively for directed graphs. This project will develop a deeper understanding of this basic problem.

1 
2016 — 2019 
Gasarch, William [⬀] Khuller, Samir 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Reu Site: Caar: Combinatorics and Algorithms Applied to Real Problems @ University of Maryland College Park
This REU site project to be hosted at University of Maryland College Park emphasizes combining theory with practice. The students will benefit from a strong faculty team. Building on the past experience, the project is expected to result in strong undergraduate research leading to successful research/industry careers for the students in the future. Outreach activities with high schools will attract high school students to this research. The site plans to recruit underrepresented groups and also students from nonresearch universities and HBCUs.
The projects proposed not only form cuttingedge theory research but are also relevant to practice and have high impact applications. The projects considered are in applied area of Big Data Analytics, Information Extraction in Natural Language Processing, Fair Division, Pricing over Social Networks, Secure Communication and Public Health. The approach of getting students interested in theory, starting from applied implementation has an added side benefit that it would train students in sharp algorithmic implementation and experimentation skills, which are deemed in high regard by leading Computer Science companies.

1 
2022 — 2027 
Nocedal, Jorge (coPI) [⬀] Khuller, Samir Berry, Randall (coPI) [⬀] Hartline, Jason Vijayaraghavan, Aravindan 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Institute For Data, Econometrics, Algorithms and Learning (Ideal) @ Northwestern University
The Institute for Data, Econometrics, Algorithms, and Learning (IDEAL) will consolidate and amplify research devoted to the foundations of data science across all the major researchfocused educational institutions in the greater Chicago area: the University of Illinois at Chicago, Northwestern University, the Toyota Technological Institute at Chicago, the University of Chicago, and the Illinois Institute of Technology. This transdisciplinary institute involves over 50 researchers working on key aspects of the foundations of data science across computer science, electrical engineering, mathematics, statistics, and several related fields like economics, operations research, and law, and they are complemented by members of Google’s learning theory team. Its research goals range from the core foundations of data science to its interfaces with other disciplines: 1) tackling important challenges related to foundations of machine learning and optimization, 2) addressing statistical, algorithmic and mathematical challenges in dealing with highdimensional data, and 3) exploring the foundations of aspects of data science that interact with society. The institute will foster strong connections with the community and local high schools, broaden participation in data science locally and nationally, and build lasting research and educational infrastructure through its activities. Institute activities will include workshops for undergraduate students, high school teacher workshops, public lectures, and museum exhibit designs. These will build new pathways for undergraduate students, high school students, and the broader public from diverse and underrepresented backgrounds, to increase participation and engagement with scientific fields related to data science.<br/><br/>The research thrusts of the institute will center around the foundations of machine learning, highdimensional data analysis and inference, and data science and society. Specific topics include foundations of deep learning, reinforcement learning, machine learning and logic, network inference, highdimensional data analysis, trustworthiness & reliability, fairness, and data science with strategic agents. The research activities are designed to facilitate collaboration between the different disciplines and across the five Chicagoarea institutions, and they build on the extensive experience from previous efforts of the participating universities. The activities include topical special programs, postdoctoral fellows, comentored PhD students, workshops, coordinated graduate courses, visiting fellows, research meetings, and brainstorming sessions. The proposed research will lead to new theoretical frameworks, models, mathematical tools and algorithms for analyzing highdimensional data, inference and learning. Successful outcomes will also lead to a better understanding of the foundations of data science and machine learning in both strategic and nonstrategic environments – including emerging concerns like reliability, fairness, privacy and interpretability as data science interacts with society in various ways. The institute will also have broader impacts of strengthening research and educational infrastructure, developing human resources, broadening participation from underrepresented groups, and by connecting theory to science and industry. The institute will organize activities to engage the community and a diverse group of students at all levels, including introductory workshops for undergraduate research participants, high school student and teacher outreach (through a partnership with the Math Circles of Chicago), and public lectures as part of both our research program and a partnership with the Museum of Science and Industry. The Chicago public institutions that we engage serve a very diverse population, so the outreach, recruitment, and training activities will broaden participation from underrepresented groups. Finally, the institute will have direct engagement with applications and industry through its activities involving Google, other industry partners in the broader Chicago area, and applied data science institutes.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

0.942 