1999 — 2003 |
Kannan, Sampath |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Maximum Likelihood Estimation and Other Probabilistic Algorithms @ University of Pennsylvania
CCR-9820885 Kannan, S.
This project pursues research in a number of areas with the common theme being the design and analysis of probabilistic algorithms.
Maximum Likelihood Estimation is a classical and important technique in statistics. Specific versions of this general problem are defined by considering specific stochastic processes that generate the sample data. This problem has generally been thought to be computationally intractable for most interesting applications. However, this project seeks to create very efficient algorithms that find approximately the most likely solutions. Specifically such algorithms are sought for important problems in computational biology such as phylogeny construction and multiple alignment. The project also investigates similar approaches for the problem of inferring Belief Nets, a popular stochastic model in artificial intelligence.
The problem of contention is concerned with message transmission over an ethernet where messages arrive according to a stochastic process such as a Poisson process. Each message uses a protocol whereby it attempts to transmit itself at each time instant with some probability.
A central open question is whether there is a protocol that ensures that the system is "stable." The PI proved that the answer is "no" for a number of "natural" classes of protocols and is working on extending these proofs to all acknowledgement-based protocols. Finally, the project investigate new models and problems in the area of program checking which again involves the design of randomized algorithms.
|
1 |
2001 — 2004 |
Feigenbaum, Joan [⬀] Kannan, Sampath |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Massive Data Streams: Algorithms and Complexity
Title: "Massive Data Streams: Algorithms and Complexity"
Investigators: Joan Feigenbaum and Sampath Kannan
Abstract: Massive data sets are increasingly important in many applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce ``synopses'' or ``sketches'' for further processing. Moreover, network-generated massive data sets are often distributed: Several different, physically separated network elements may receive or generate data streams that, together, comprise one logical data set. The enormous scale, distributed nature, and one-pass processing requirement on the data sets of interest must be addressed with new algorithmic techniques. Two programming paradigms for massive data sets are "sampling" and "streaming." Rather than take time even to read a massive data set, a sampling algorithm extracts a small random sample and computes on it. By contrast, a streaming algorithm takes time to read all the input, but little more time and little total space. Input to a streaming algorithm is a sequence of items; the streaming algorithm is given the items in order, lacks space to record more than a small amount of the input, and is required to perform its per-item processing quickly in order to keep up with the unbuffered input. The investigators continue the study of fundamental algorithms for massive data streams. Specific problems of interest include but are not limited to the complexity of proving properties of data streams, the construction of one-pass testers of properties of massive graphs, and the streaming space complexity of clustering.
|
0.97 |
2006 — 2009 |
Kannan, Sampath Blaze, Matthew [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ct-Isg: Isotropic Cryptography @ University of Pennsylvania
PIs: Matthew Blaze and Sampath Kannan University of Pennsylvania
The Isotropic Cryptography project introduces and lays the theoretical and engineering foundations for information-theoretically secure communication over open channels that meet certain requirements. More specifically, the project identifies and is concerned with {\em isotropic channels,} in which eavesdroppers monitoring the channel are unable to identify the sources and destinations of messages with certainty. Such channels appear to have very interesting and promising security properties. Even a partially isotropic channel (in which the adversary can identify a message source but only with some bounded certainty) can provide information-theoretic confidentiality.
The project's experimental work focuses on designing and analyzing practical protocols that exploit the isotropic properties of conventional communication channels (such as local Ethernet, near-field radio and mobile wireless networking). Can (partially or fully) isotropic channels be realized in practice over existing network technologies? The project also lays a theoretical foundation for isotropic cryptography, studying what assumptions are necessary to implement protocols for various well-known and new cryptographic primitives and what the most efficient protocols are for various primitives. Finally, the project examines the relationship between traditional anonymity networks and Mix networks on the one hand and isotropic channels on the other. Can isotropy be used to provide anonymity and can anonymous networks simulate isotropic channels? Can isotropic approaches be made practical and efficient?
|
1 |
2007 — 2010 |
Lee, Insup [⬀] Kannan, Sampath |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ct-Isg: Collaborative Research: Massive Dataset Algorithmics For Network Security @ University of Pennsylvania
As malicious attacks on computer systems increase in severity and sophistication, developing effective methods for protecting the Internet is among the most important challenges facing computer science today. Network-based security mechanisms offer both good coverage and the possibility of early threat detection, but they often conflict with the performance requirements of network elements because of the vast amounts of traffic data that must be analyzed. This project will apply massive-dataset (MDS) algorithmics to network security, bringing together two previously unconnected research areas. The objective is to achieve a qualitative improvement in network security by developing efficient, yet theoretically rigorous, algorithmic defenses that can be deployed at scale in modern networks.
The project addresses both fundamental algorithm-design problems and practical applications. MDS algorithmics provides a set of basic techniques for highly efficient (e.g., one-pass, small-space, polylog-time) analysis of large amounts of data. This project will investigate how these methods can be used for (1) online classification and property testing of packet streams, including efficient inference of streams generated by a mixture of stochastic sources, (2) detection of changes and anomalies in traffic patterns, and (3) development of computationally tractable models of traffic sources that support reasoning about a wide variety of adversarial behaviors and incorporate prior knowledge such as conventional intrusion-detection rules. The algorithmic toolkit developed in the course of the project will be applied to practical network-security problems such as recognizing denial of service activity, worm fingerprinting, and detecting botnets.
|
1 |
2011 — 2014 |
Kannan, Sampath |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Estimating Phylogenetic Trees When Character Evolution Is Neither Independent Nor Identically Distributed @ University of Pennsylvania
The evolutionary tree or phylogeny of a set of species is a tree that explains the history of their evolution from a common ancestor. To estimate this history, scientists make observations about species that are alive today and seek to find a tree that best fits this data. The most common types of observations these days are in the form of biomolecular sequences such as DNA or protein sequences for genes or proteins. These sequences are aligned so that corresponding positions in the given sequences exhibit as much similarity as possible. Each column of the alignment is called a site or a character. In the standard model of evolution each node in the tree has a certain state for the character and transmits this state to its children. However, the state is probabilistically mutated along each edge of the tree. The standard model also assumes that all characters evolve according to identical, independent stochastic processes. Tight bounds are known for the number of characters needed to infer the tree (and mutation probabilities on the edges) under these assumptions. The problem is that these assumptions are not biologically realistic. It is well known that selection pressure operates differently on different sites and that the evolution of one character can be dependent on other characters. Much more sophisticated mathematical analysis is needed to infer the tree and dependence structure under these conditions, and this is precisely the major goal of this project.
The problem of reconstructing evolutionary trees is of central importance in biology since evolution is the theory in biology. Computer scientists have made contributions to this field, but the solutions provided by computer scientists so far simplify the problem too much to produce reliable solutions on realistic data. This project aims to take an important step towards making tree inference algorithms more realistic.
|
1 |
2013 — 2015 |
Kannan, Sampath Daniilidis, Kostas [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nri: Small: Collaborative Research: Active Sensing For Robotic Cameramen @ University of Pennsylvania
With advances in camera technologies, and as cloud storage, network bandwidth and protocols become available, visual media are becoming ubiquitous. Video recording became de facto universal means of instruction for a wide range of applications such as physical exercise, technology, assembly, or cooking. This project addresses the scientific and technological challenges of video shooting in terms of coverage and optimal views planning while leaving high level aspects including creativity to the video editing and post-production stages.
Camera placement and novel view selection challenges are modeled as optimization problems that minimize the uncertainty in the location of actors and objects, maximize coverage and effective appearance resolution, and optimize object detection for the sake of semantic annotation of the scene. New probabilistic models capture long range correlations when the trajectories of actors are only partially observable. Quality of potential novel views is modeled in terms of resolution that is optimized by maximizing the coverage of a 3D orientation histogram while an active view selection process for object detection minimizes a dynamic programming objective function capturing the loss due to classification error as well as the resources spent for each view.
The project advances active sensing and perception and provides the technology for further automation on video capturing. Such technology has broader impact on the production of education videos for online courses as well as in telepresence applications. Research results are integrated into robotics and digital media programs addressing K-12 students.
|
1 |
2015 — 2018 |
Ives, Zachary [⬀] Tannen, Val (co-PI) [⬀] Davidson, Susan (co-PI) [⬀] Kannan, Sampath |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cici: Data Provenance: Provenance-Based Trust Management For Collaborative Data Curation @ University of Pennsylvania
Data-driven science relies not only on statistics and machine learning, but also on human expertise. As data are being collected to tackle increasingly challenging scientific and medical problems, there is need to scale up the amount of expert human input (curation and, in certain cases, annotation) accordingly. This project addresses this need by developing collaborative data curation: instead of relying on a small number of experts, it enables annotations to be made by communities of users of varying expertise. Since the quality of annotations by different users will vary, novel quantitative techniques are developed to assess the trustworthiness of each user, based on their actions, and to distinguish trustworthy experts from unskilled and malicious users. Algorithms are developed to combine users' annotations based on their trustworthiness. Collaborative data curation will greatly increase the amount of human annotated data, which will, in turn, lead to better Big Data analysis and detection algorithms for the life sciences, medicine, and beyond.
The central problems of collaborative data curation lie in the high variability in the quality of users' annotations, and variability in the form the data takes when they annotate it. The proposal develops techniques to take annotations made by different users over different views of data (such as an EEG display with filters and transformations applied to the signal), to use provenance to reason about how the annotations relate to the original data, and to reason about the reliability and trustworthiness of each user's annotations over this data. To accomplish this, the research first defines data and provenance models that capture time- and space-varying data; novel reliability calculus algorithms for computing and dynamically updating the reliability and trustworthiness of individuals, based on their annotations and how these compare to annotations from recognized experts and the broader community; and a high-level language called PAL that enables the researchers to implement and compare multiple policies. The researchers will initially develop and validate the techniques on neuroscience and time series data, within a 900+ user public data sharing portal (with 1500+ EEG and other datasets for which annotations are required). The project team later expands the techniques to other data modalities, such as imaging and genomics
|
1 |
2017 — 2021 |
Tannen, Val (co-PI) [⬀] Kannan, Sampath Haeberlen, Andreas (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Aitf: Provenance With Privacy and Reliability in Federated Distributed Systems @ University of Pennsylvania
The goal of this project is to investigate the question of provenance in federated distributed systems, such as networks and scientific workflows, where several self-interested entities generate and share data, and compute and make decisions on the shared data. Computations and decisions give rise themselves to new shared data. Provenance analysis allows us to understand the contributions of the various entities to the data. This project seeks to put provenance on a sound mathematical foundation by unifying the theoretical notion of semiring provenance with practical approaches to network provenance. The PIs would also like to design new algorithms and compression techniques for collecting and managing provenance data. A second thrust of the project is to compute and associate reliability scores to each data item, in conjunction with their provenance. In order to do this, the PIs will design techniques for assigning reliability scores to primitive data items, as well as a calculus based on sound axiomatic principles for assigning reliability scores to derived data items. This project will achieve broad impact by allowing for networks to operate more reliably and by enhancing reproducibility in scientific workflows.
|
1 |