1987 — 1989 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Parallel Processing of Computer Vision Problems (Computer and Information Science) @ University of Southern California
Parallel Processing has been widely used in image processing. A number of parallel architectures such as cellular arrays, memory augmented arrays, and pyramids have been proposed for these tasks. In the past, many problems in low- and medium- level vision have been solved on these architectures. This research will explore parallel architectures for a range of problems in computer vision and develop efficient parallel algorithms for them. As part of the research, several novel VLSI arrays including arrays with efficient global communication features, arrays with reduced processing requirements, as well as reconfigurable VLSI arrays will be developed. While these architectures are also suitable for many other problems, they seem to be particularly well-suited for vision applications due to inherent properties of problems in low-level vision. This research will also investigate parallel algorithms for medium-and high-level vision problems. Finally, the use of information theoretic techniques to study the interprocessor communication requirements and inherent parallel complexity of solving several fundamental problems in vision will be investigated.
|
0.915 |
1989 — 1990 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Instrumentation @ University of Southern California
A front-end interface to a Connection Machine will be provided for researchers at the University of Southern California for research in the School of Engineering. This equipment is provided under the Instrumentation Grants for Research in Computer and Information Science and Engineering program. The research for which the equipment is to be used will be in the area of design and implementation of parallel algorithms for vision.
|
0.915 |
1990 — 1993 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Parallel Techniques For Image Processing and Vision @ University of Southern California
This research will develop parallel algorithms for machine vision, especially for the interface between image processing and image understanding, with further study of new and existing parallel architectures for efficient execution of these algorithms. Architectures to be studied include fixed-size arrays, reconfigurable meshes, reduced VLSI arrays, and arrays with hypercube connections such as the Connection Machine. Data movement techniques will be designed to support parallel solutions to image computations in mid-level and high-level vision. Specific high-level problems to be studied are motion analysis, image matching, and stereo matching, as well as several discrete relaxation techniques. Neural-net approaches to vision will be supported by design of routing techniques based on preprocessing of the underlying neural graph and by mapping of such structures onto fine-grain parallel machines. A Connection Machine at the USC Information Sciences Institute will be used to evaluate data partitioning, data routing, and mapping techniques.
|
0.915 |
1993 — 1995 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Parallel Techniques For Problems in Vision & Robotics @ University of Southern California
This research continues earlier efforts of the PI in studying parallelism for image understanding and robotics as well as in understanding the power of reconfiguration. There are three spheres of emphasis in this work: 1. design and analysis of efficient parallel algorithms for roblems in vision and robotics on well-established parallel models of computation, 2. implementation of the parallel solutions on state of the art parallel machines, and 3. a study of the power of reconfigurable meshes in solving fundamental problems of interest to the parallel processing community as well as problems arising in image processing, vision and robotics. These problems are among the generic high and intermediate level problems in image understanding and robotics. Specifically, in image understanding, design and analysis of algorithms for motion analysis (object tracking), image and stereo matching, model based object recognition as well as algorithms for several symbolic computation based approaches used in understanding images will be investigated. In robotic applications, parallel solutions to a variety of practical problems in real time robot motion- and task-planning arising in terrain navigation and industrial automation will be investigated. The parallel models to be employed include the mesh-connected processor array, reconfigurable mesh array, and the hypercube. The algorithms will be implemented on Connection Machine CM-5, Maspar MP-1 and the Image Understanding Architecture (IUA). In the work on the reconfigurable mesh model, design and analysis of fast and processor efficient parallel solutions to several fundamental problems on the reconfigurable mesh will be investigated. Problems to be considered include arithmetic problems, image problems and geometric problems on planar points. Known techniques on other parallel models will be studied for possible mapping onto the reconfigurable mesh.
|
0.915 |
1994 — 1998 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Partitioning and Mapping Problems in Heterogeneous Computing @ University of Southern California
9317301 Prasanna Future parallel processing systems will consist of diverse high performance computers integrated to form a Heterogeneous Computing Network that allows them to cooperate in solving complex scientific and engineering problems. Each machine connected to the network may be suitable for a different class of parallel computations. Such an organization will exploit the current advances in parallel hardware architectures, interconnection technologies, and programming paradigms to provide a cost-effective environment for high performance parallel computing. Efficient utilization of such an environment depends on a number of issues to be addressed including techniques to partition application tasks and map the subtasks onto various machines in the network. This research addresses several key issues in using a heterogeneous computing environment. The work will focus on modeling the Heterogeneous Computing paradigm for algorithm development, addressing the computational issues involved in using such a paradigm, design of partitioning and mapping strategies, and integrating these strategies into existing programming systems to solve computationally demanding problems in image understanding. ***
|
0.915 |
1999 — 2002 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dynamic Libraries For Reconfigurable Architectures @ University of Southern California
The project will develop an algorithmic design methodology for designing Dynamic libraries. Configuration design techniques based on Parameterized Computation Structures will be developed for the library. Parameterized Computation Structures are designs optimized for a specific computation based on the algorithm and input instance and thus can lead to compact and efficient designs. The computation structures will incorporate run-time instantiation and optimization techniques resulting in a Dynamic library. The Dynamic library components will be modular and scalable. The methodology will also exploit partial and dynamic reconfiguration. The performance of the resulting designs will be evaluated using total execution time as a metric. The total execution time includes reconfiguration time in addition to execution time on the hardware. This effort is expected to significantly advance the understanding of configurable architectures and result in the development of dynamic configurable structures.
|
0.915 |
2002 — 2003 |
Raghavendra, Cauligi (co-PI) [⬀] Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: a Model-Based Framework For Adaptive Algorithm Design @ University of Southern California
Rapidly increasing performance requirements of applications have spurred the next generation of complex, dynamic, heterogeneous, parallel/distributed computing system architectures. The emerging computational grid, tightly-coupled petaflop grids-in-a-box (GiBs), distributed sensor networks, System-on-Chip (SoC) and polymorphous computing (PCA) architectures are examples of such systems. To exploit the full potential of this new computing architecture, applications, as they execute, must be able to adapt to the continuously changing system. Although some support for adaptive application development is available in the form of programming languages and runtime systems, there is a lack of high level system abstractions that model the dynamic behavior and runtime adaptivity. The proposed research will address fundamental issues in modeling these dynamic, complex architectures and the design and evaluation of adaptive algorithms for such architectures. The focus of the proposed research will be on creating a formal framework to reason about adaptivity at an abstract level. A direct educational impact of the proposed activity will be the introduction of new curriculum in academia, to impart knowledge on algorithm design aspects for dynamic system architectures. This will include initiating new course-work along with traditional courses offered on analysis of algorithms and architectures. One of the broader impacts we foresee is the preparation of future Grid/GiBs/SoC/PCA application developers.
|
0.915 |
2003 — 2007 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Algorithms: Performance Programming For Advanced Cache Architectures @ University of Southern California
The recognition of drawbacks of traditional cache hierarchies, especially for irregular applications, has led to the emergence of a new breed of processors that allow the cache hierarchy to be directly manipulated at the application level. Based on the knowledge of the application's data access behavior, "intelligent" programming can lead to dramatic performance improvements.
This project will explore a new approach towards performance programming for advanced cache architectures, based on explicit memory hierarchy management at the application level. Our research will focus on: (i) Definition of a generalized model for split spatial/temporal caches and explicit cache control. This model will abstract available architecture features from a programmer's perspective. A high-level simulator based on this model will be implemented. (ii) Develop cache cognizant algorithms for regular and irregular application kernels. The kernels will be optimized to exploit spatial and temporal cache structures, data prefetch, and other features abstracted in the model. Performance improvements will be validated through low-level simulations and experiments on real architecture platforms such as Intel IA-64 and Sun UltraSPARC III Cu. (iii) Create a mathematical foundation for compile-time data placement in main memory to minimize cache misses at run time, using on Perfect Latin Squares (PLS) to reduce cache conflicts. (iv) Use the above techniques to optimize performance of algorithms used for database storage and access (search), tree traversal, unstructured mesh computations, and graph problems. We envision that our research will complement the ongoing advances in cache architectures and lead to the creation of a new computation model for programming the next generation of general-purpose processors.
|
0.915 |
2003 — 2007 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sensors: Models and Algorithms For Collaborative Computation in Sensor Networks @ University of Southern California
Majority of the research results in the field of wireless sensor networking have been substantiated mostly through simulations and empirical measurements, as is common in traditional networking research. For efficient design, in terms of latency, energy, robustness, etc., models that abstract node hardware and the network characteristics are needed for systematic algorithm design and analysis. The proposed work will demonstrate that models of computation for sensor networks (from a parallel and distributed systems' perspective) will create a modular, layered paradigm for application development.
The intellectual merit of this research is the development of computation models and robust, adaptive, energy-efficient collaborative algorithms for computation and communication in wireless sensor networks. High-level models will allow designers to make informed decisions regarding energy and time tradeoffs, and robustness at the node and network level - eliminating most of the ad-hoc-ness in application design for sensor networks. The benefits of our approach will be demonstrated on two classes of end-to-end applications. Highly optimized computation and communication kernels for information dissemination in sensor networks, and distributed image processing will be developed.
The broader impact of this work is in understanding, modeling, and exploiting sensor networks as a computing substrate - not just a loose federation of nodes equipped with sensors, processors, and radios. We expect this to lead towards a new discipline for programming sensor networks by providing the application developer with high-level technology-independent 'knobs' for analysis and performance optimization. A direct educational impact of the proposed activity will be the introduction of new curriculum in academia to impart knowledge on algorithm design for sensor networks.
|
0.915 |
2003 — 2008 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Performance Modeling and Algorithm Design For Reconfigurable System-On-Chip Architectures @ University of Southern California
Prasanna
Performance Modeling and Algorithm Design for Reconfigurable System-on-Chip Architectures
Abstract
This project will explore algorithmic techniques to optimize application performance on reconfigurable System-on-Chip (RSoC) architectures based on a novel concept of malleable algorithms. Malleable algorithms are architecture-platform aware specification of alternate implementations of a given functionality, and form the basis of a new methodology for performance modeling and algorithm design for RSoC architectures. The proposed research will have the following main thrusts.
1. Domain-specific modeling: hybrid performance modeling using high-level analytic performance models and low-level simulations. 2. Energy-efficient designs with malleable algorithms: design of energy-efficient malleable algorithms for a set of embedded benchmarks and applications. 3. System-level optimization: combinatorial approaches including formulations using interval arithmetic and generalized assignment problem.
The proposed effort will lead to the development of highly optimized portable and reusable solutions for implementing embedded applications, and the definition of a new methodology for designing energy-efficient soft-IP cores for hybrid architectures consisting of multiple tightly integrated heterogeneous computing elements. The research will complement ongoing advances in design automation, and bridge the gap between the application developer and RSoC platform architectures.
|
0.915 |
2003 — 2007 |
Horowitz, Ellis (co-PI) [⬀] Hwang, Kai [⬀] Prasanna, Viktor Neuman, B. Clifford |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Gridsec: Trusted Grid Computing With Dynamic Resources and Automated Intrusion Responses @ University of Southern California
A new trust model will be developed for grid metacomputing over multiple administrative domains. The GridSec enhances grid operations with seamless security, assured privacy, data integrity, confidentiality, and optimized resource allocations. Distributed micro firewalls and intrusion repelling libraries for protecting grid resources will be generated as a part of this project. The new security system will be designed to adjust itself dynamically with changing threat patterns and network conditions. Fine-grain resource-access control at the file, device, and storage levels will be designed to enhanced the trusted aspect of the system. The results of this project will benefits all grid applications and offers protection of shared grid resources. The eventual construction of a production grid platform dedicated for global emergency response and crisis management is one of the project's goals. This work integrates advanced security research with higher education. The project enhances Internet and grid security, reduces the vulnerability of our society, and protects the global economy as a whole. The broader impacts are far reaching in science, education, business, and governments.
|
0.915 |
2004 — 2008 |
Prasanna, Viktor Krishnamachari, Bhaskar (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Design Automation of Compute-Intensive Networked Embedded Systems @ University of Southern California
Networked embedded systems such as wireless sensor networks (WSNs) have the potential for revolutionizing data collection and analysis in physical sciences and other fields by allowing intelligent dense monitoring of the environment. State-of-the-art research in WSNs treats the problem of designing sensor network applications primarily as one of manual customization of low-level network protocols. The design complexity and required expertise make this approach insufficient for increasingly complex, compute-intensive distributed sensor systems.
There is a clear need for a new top-down methodology that automates a bulk of the low-level implementation aspects of design and allows the end user to focus on high level algorithm design and optimization.
Intellectual Merit: * Development of models and methodologies for design automation of compute-intensive sensor networks, with a focus on two WSN applications: (i) networked structural health monitoring (SHM) where a large-scale network of thousands of sensor and actuator devices embedded into a building or bridge is deployed to continuously monitor the structure, trigger alarms that identify the onset of damage, precisely pinpoint the location of damage and also provide a long-term history of ambient stresses imposed on the building, (ii) networked micro-climate monitoring (MCM) where a network of multi-modal sensors is deployed to provide information about climatic variables such as temperature, light, humidity, etc., in the operational environment (e.g. a wildlife reserve). *Application representation: A suitable model of computation (MoC), will be defined to capture the structure of computation and communication in the algorithms. *Virtual architectures: The virtual architecture (abstract machine model) for the target sensor networks will include a network model, a set of computation and communication primitives, cost functions, and middleware services. *Algorithms for design automation: Algorithms will be developed and middleware services used for in-network processing, such that the desired performance is achieved.. *Demonstration: The design automation methodology will be validated and demonstrated for the two target applications through simulation.
Broader Impacts: *The target applications are of great benefit to society: SHM networks will improve the safety of our civil infrastructure including roads, bridges and buildings; while the MCM networks will advance our scientific knowledge of the complex ecological interactions between organisms and their environment. *More broadly, the proposed methodology will facilitate the rapid design and synthesis for a wide range of compute-intensive sensor network applications. *The results of the proposed work will be disseminated on a timely basis to the research community and to industrial partners. *This project will build on their existing collaboration which includes co-advising of PhD students, and joint publications. *The proposed research will also provide educational material for an advanced graduate course on sensor networks at USC.
|
0.915 |
2005 — 2008 |
Singh, Manbir (co-PI) [⬀] Hwang, Kai (co-PI) [⬀] Leahy, Richard (co-PI) [⬀] Prasanna, Viktor Vashishta, Priya (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cri: Reconfigurable Computing Infrastructure For High End and Embedded Computing Applications @ University of Southern California
Abstract
Program: NSF 04-588 CISE Computing Research Infrastructure Title: CRI: Reconfigurable computing infrastructure for high end and embedded computing applications Proposal: CNS 0454407 PI: Prasanna, Viktor K. Institution: University of Southern California
The investigators will acquire a reconfigurable computer comprised of general purpose processors, field programmable gate arrays (FPGAs), a common memory, and an interconnect fabric joined under a programming model that works with all the parts. The acquisition of this machine will enable research at a realistic scale on actual reconfigurable machines for performance testing, validation, and applications demonstrations. This infrastructure will be robust enough to implement application "kernels" such as (e,g, an LU implementation or n-body simulation) that give realistic scale experimental results. Applications that will be explored include matrix operations, computational genomics, molecular dynamics, density functional theory, and finite element methods. The team will also be able to work on energy efficiency for embedded FPGAs. Broader impacts of this project include the potential impact on reconfigurable systems, use of FPGAs for applications, and discoveries in the applications areas. The investigators participate in USC's Minority Opportunities in Research (MORE) program.
|
0.915 |
2006 — 2010 |
Prasanna, Viktor Krishnamachari, Bhaskar (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nets-Noss: a Middleware Framework For Rapid Composition and Deployment of Compute-Intensive Networked Embedded Systems @ University of Southern California
Proposal Number: 0627028 Investigators: Viktor Prasanna (PI), Bhaskar Krishnamachari (Co-PI) Institution: University of Southern California Title: NeTS-NOSS: A Middleware Framework for Rapid Composition and Deployment of Compute-Intensive Networked Embedded Systems
Abstract
Wireless sensor networks (WSN) have the potential to revolutionize data collection and analysis in physical sciences and other fields by allowing intelligent dense monitoring of the environment. The primary focus of WSN research till now has been the design and implementation of the basic sensor node hardware and low-level protocols such as those for localization, time synchronization, medium access, routing, etc. However, the composition and deployment of a complex networked sensing application is still a daunting task for the non-expert end user.
This project involves the design and evaluation of reusable middleware functions that provide easy-to-program abstractions of the underlying hardware and network services, allowing rapid composition and deployment of sensor network applications. The topics addressed by our research include the development and evaluation of suitable topological abstractions, task mapping and migration techniques, communication and computational primitives, and realistic performance models to evaluate designs.
We expect the middleware techniques developed in this project to significantly enhance the state of the art in the design of sensor networks for complex applications. The abstractions and tools we develop will help establish a paradigm shift from the current dependence on application-specific customized solutions to a generalized automated approach that facilitates rapid design and ease of deployment for a wide range of applications.
The outcomes of the research will be disseminated in a timely basis through publications, presentations, and collaborations to the academic research community as well as to industry. The project will also have a significant educational impact, by supporting graduate student research, and providing material for a course on wireless sensor networks taught at USC.
|
0.915 |
2006 — 2009 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Csr---Aes: a Framework For Optimizing Scientific Applications @ University of Southern California
Title: Collaborative Research: CSR---AES: A Framework for Optimizing Scientific Applications
The Design Optimizer for Scientific Applications (DOSA) framework allows the programmer or compiler writer to explore alternative designs and optimize for speed (or power) at design-time and use its run-time optimizer as an automatic application composition system (ACS) that constructs an efficient application that dynamically adapts to changes in the underlying execution environment based on the kernel model, architecture, system features, available resources, and performance feedback. DOSA allows design-time exploration and automatic run-time optimizations using continuous performance optimizations (CPO) so that application programmers and compiler writers are relieved from the challenging task of optimizing the computation in order to achieve high performance. As an illustration of the DOSA framework, one complex, full application is optimized for IBM Cell. The innovative performance optimization techniques for the memory hierarchy use new techniques for reducing I/O complexity, data layout, data remapping, and in-memory processing, and are supported by DOSA, the semi-automatic design framework and dynamic run-time system. This framework allows rapid, high-level performance estimation and detailed low-level simulation by incorporating high-level performance models into the model-integrated computing framework. The run-time system dynamically improves application performance using the component library, the models, and the run-time optimizer. The application studies are chosen for their broad impact to traditional and emerging scientific areas such as bioinformatics, computational biology, and medical applications, as well as for national security. The project especially encourages the participation by women, minorities, and underrepresented groups.
|
0.915 |
2007 — 2011 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Parameterized and Tunable Linear Algebra Library For Fpga-Accelerated Systems @ University of Southern California
With the advances in technology, Field-Programmable Gate Arrays (FPGAs) have become an attractive choice for scientific computing. Indeed, several research groups as well as vendors are developing high performance computer systems which employ FPGAs for application acceleration. These hybrid systems integrate general-purpose processors, FPGAs, memory hierarchy consisting of SRAM and DRAM, and pose new design challenges in optimizing the overall performance. The challenges to achieving high performance include managing shared memory hierarchy, partitioning among multiple FPGAs, and hardware/software co-design between the general-purpose processors and the FPGAs.
This research develops a high performance linear algebra library for FPGA-accelerated systems. The operations considered include reduction of a series of floating-point values, data path synthesis using deeply pipelined FPUs, sparse matrix-vector multiplication, and dense matrix computations. These kernels are fundamental operations in many scientific applications. The library is parametrized using available configurable logic, on-chip memory (Block RAM), SRAM and its bandwidth, and DRAM bandwidth via interconnection network. Algorithmic exploration of hybrid computing platforms that consist of processors, reconfigurable logic and user controlled memory hierarchy are performed. These include: 1. Optimal algorithms to exploit memory hierarchy and reconfigurable logic, 2. Parameterized IP cores based on the design space characterized by available logic, SRAM and memory bandwidth, 3. Hardware/software partitioning to exploit the computational resources, 4. Synthesis of optimal data paths for arithmetic expression evaluation including reduction circuits, and 5. Demonstration on state of the art high end computing platforms from leading supercomputing vendors and research groups.
Comparison against highly optimized code developed for general purpose processors using well-defined benchmarks are performed using comparable architectural resources such as processor-memory bandwidth, memory and logic.
|
0.915 |
2010 — 2014 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dc: Small: Accelerating Large-Scale Pattern Matching For Data Intensive Applications @ University of Southern California
Pattern matching is a key function in many data intensive computing applications ranging from deep packet inspection, text processing, to genomic research. The explosive growth of digital information in the form of webpages, XML documents, network traffic and scientific data has put an enormous pressure on the performance requirements of large-scale pattern matching. This research will study the use of innovative algorithms and architectures on ASIC/FPGA and multi-core platforms to accelerate large-scale pattern matching for network security, data mining and filtering applications. Various types of pattern matching will be considered, including regular expression matching, dictionary-based string matching, and extended regular expression matching. The intellectual merit of this proposal includes the innovation in algorithms and architectures for matching large pattern sets against high bandwidth data input.
The proposed research will be conducted from two perspectives: (1) Novel algorithms and data structures for large-scale pattern matching; such as finite automata, dynamic search tree, formal language and graph theory. (2) Practical optimization techniques for pipelining, partitioning, scalable and modular designs on parallel architectures with ASICs/FPGAs, multi-core processors and general-purpose graphics processors (GPGPUs). Instead of producing heuristics specific to a particular input or pattern set, the proposed research aims to improve the fundamental understanding of large-scale pattern matching, and apply the understanding to both algorithmic and architectural innovations. This allows exploration of the design limits and tradeoffs in using practical optimizations on state-of-the-art computing platforms. The designs will be mapped onto parallel architectures based on both FPGA and multi-core technologies, including CPU-FPGA and CPU-GPU heterogeneous architectures.
|
0.915 |
2010 — 2011 |
Prasanna, Viktor Bader, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop: Accelerators For Data Intensive Applications; a Workshop to Engage the Science and Engineering Community - Arlington, Va - Fall 2010 @ University of Southern California
We will hold workshop in early Fall 2010 to promote understanding of, through collaboration around, issues required to utilize accelerator enhanced systems efficiently for solving challenging problems in several application domains. Working through these issues together will advance understanding in application domains and computer science. This will launch productive collaborations between computer scientists and application developers to enable efficient and early use of accelerator computing resources as they come online and foster the discussion of a potential Software Infrastructure Center organization. We will use application requirements from several important science and engineering disciplines to drive innovations in future computing technologies such as manycore processor architectures and application accelerators.
The workshop will promote meaningful information exchange between the application developers and computer scientists and engineers. It will be structured as a tutorial; application scientists will describe their applications along with the computational challenges they present; computer scientists will describe their methods and tools for improving application performance and capability. There will be concrete actions identified out of the workshop, such as collaborative development of a center scale proposal for algorithm and model development and software for the community based on these abstractions which will be included in the workshop report. The objective is that investigators in both fields are to become better positioned to use emerging application accelerators.
The workshop will be used to facilitate an unprecedented exploration of important science problems at great scale and fidelity and enabling a broad set of science and engineering applications to productively employ parallelism and accelerators to tackle data- and compute- intensive problems. Collaborations between domain experts and computer scientists thus have the potential to improve U.S. economic competitiveness, improve understanding of emerging processor architectures, and keep the U.S. pre-eminent in this important promising technology.
|
0.915 |
2011 — 2014 |
Prasanna, Viktor Rajagopal, Karthik |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cic (Rddc) Parallelizing Large Scale Graph Problems On the Cloud @ University of Southern California
This research explores application development and optimizations for cloud platforms by developing: 1) cloud based parallelization for data intensive graph algorithms, 2) a framework for efficient scheduling and execution of applications in a heterogeneous cloud environment, and 3) hierarchical programming abstraction to specify parallelism. The work investigates and adapts wealth of techniques in traditional parallel computing for graph problems based on a performance model of the cloud and explore strategies for scheduling and load balancing applications on the cloud. These include centralized and distributed approaches for scheduling and work stealing and work sharing. Methodologies to evaluate the framework in executing applications that involve data intensive graph computations are being developed. The broader impact of this project includes addressing key challenges in the areas of application mapping and performance optimization. The research makes developing data intensive graph applications across public and private clouds easier. The developed software will be released as free and open source software to the community, making it possible for researchers and engineers in academia and industry to leverage this work and develop applications for the cloud. Graph problems and streaming applications arising in the area of energy informatics are considered to illustrate the techniques.
|
0.915 |
2011 — 2015 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Hardware-Software Co-Design For Next Generation Packet Forwarding Engines @ University of Southern California
The Internet backbone, including both core and edge routers, is becoming more flexible, scalable and programmable to enable future innovations in the next generation Internet. While the functionality of Internet routers evolves, the performance remains a major concern for real-life deployment. Traditionally, core routers have been designed using throughput as a key performance metric. While the throughput requirements continue to grow, peak power and total energy dissipated have emerged as additional critical considerations in the design of core routers as well as in other network equipment. Although ternary content addressable memories (TCAMs) have been widely used for packet forwarding, they have poor power performance. This work studies the use of low-power memory technology such as the static random access memory (SRAM) combined with field-programmable gate arrays (FPGAs) / application-specific integrated circuits (ASICs) to develop high-throughput and power-efficient solutions for various packet forwarding engines including IP lookup, router virtualization, packet classification and flexible flow processing (e.g., OpenFlow). Packet forwarding engines in next generation routers and switches are designed using a hardware-software co-design framework. Based on this framework, novel architectures and algorithms are developed using power (including energy) as a key performance metric in addition to throughput. Specifically, to bridge the gap between software and hardware development, high-level power-performance models for hardware implementations of packet forwarding engines are developed and validated. These models facilitate design of various heuristics for power-efficient algorithms and architectures for virtualized IP lookup, multi-field packet classification and flexible flow processing. Instead of the highly popular TCAM based solutions, this work focuses on SRAM-based parallel and pipeline architectures. Novel techniques including partitioning, clock gating, power-aware data structure design and power-aware load balancing are studied to simultaneously increase throughput and reduce power and/or energy dissipation
|
0.915 |
2012 — 2014 |
Prasanna, Viktor Simmhan, Yogesh |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Software Infrastructure For Accelerating Grand Challenge Science With Future Computing Platforms @ University of Southern California
Solving scientific grand challenges requires effective use of cyber infrastructure. Future computing platforms, including Field Programmable Gate Arrays (FPGAs), General Purpose Graphics Processing Units (GPGPUs), multi-core and multi-threaded processors, and Cloud computing platforms, can dramatically accelerate innovation to solve complex problems of societal importance when supported by a critical mass of sustainable software.
This project will organize scientific communities to help leverage the disruptive potential of future computing platforms through sustainable software. Grand challenge problems in biological science, social science, and security domains will be targeted based on their under-served needs and demonstrated possibilities. Users will be engaged through interdisciplinary workshops that bring together domain experts with software technologists with the goals of identifying core opportunity areas, determining critical software infrastructure, and discovering software sustainability challenges. The outcome will be an in-depth conceptual design for a Center for Sustainable Software on Future Computing Platforms, as part of the Software Infrastructure for Sustained Innovation (SI2) program. The design, scoped toward grand challenge problems, will identify common and specialized software infrastructure, research, development and outreach priorities, and coordination with the SSE and SSI components of the SI2 program. The interactions will offer a comprehensive understanding of grand challenges that best map to future computing platforms and the software infrastructure to best support scientists' needs. The workshops will enhance understanding of future platforms' potential for transformative research and lead to key insights into cross-cutting problems in leveraging their potential. Published results will help guide future research and reduce barriers to entry for under-represented groups.
|
0.915 |
2012 — 2015 |
Prasanna, Viktor Simmhan, Yogesh |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Us-India Workshop On Fostering Synergistic Collaborations to Accelerate Big Data Applications, December, 2012, Pune, India @ University of Southern California
This project supports US-India workshop on fostering collaboration in the area of high performance computing particularly as it relates to big data applications. The workshop will bring together participants from US and India to identify promising areas of common interest and models for sustainable collaboration including NSF's SAVI mechanism. Attendees will be chosen to represent 3 areas: Big Data Software Platforms, Accelerated Systems Infrastructure, and Scientific Applications of societal significance. The workshop will be co-located with the well established HiPC conference, to be held in Pune, India in December, 2012. The participants from India will be supported through grants from the Department of Science and Technology, India.
|
0.915 |
2013 — 2017 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Collaborative: the Xscala Project: a Community Repository For Model-Driven Design and Tuning of Data-Intensive Applications For Extreme-Scale Accelerator-Based Systems @ University of Southern California
The increasing gap between processor and memory performance -- referred to as the memory wall -- has led high-performance computing vendors to design and incorporate new accelerators into their next-generation systems. Representative accelerators include reconfigurable hardware such as FPGAs, heterogeneous processors such as CPU+GPU processors, highly multicore and multithreaded processors, and manycore co-processors and general-purpose graphics processing units, among others. These accelerators contain myriad innovative architectural features, including explicit control of data motion, large-scale SIMD/vector processing, and multithreaded stream processing. Such features provide abundant opportunities for developers to achieve high-performance for applications that were previously deemed hard to optimize. This project aims to develop tools that will assist developers in using hardware accelerators (co-processors) productively and effectively.
This project's specific technical focus is on data-intensive kernels including large dictionary string matching, dynamic programming, graph theory, and sparse matrix computations that arise in the domains of biology, network security, and the social sciences. The project is developing XScala, a software framework for designing efficient accelerator kernels. The framework contains a variety of design time and run-time performance optimization tools. The project concentrates on data-intensive kernels, bound by data movement. It proposes optimization techniques including (a) enhancing and exploiting maximal concurrency to hide data movement; (b) algorithmic reorganization to improve spatial and/or temporal locality; (c) data structure transformations to improve locality or reduce the size of the data (compressed structures); and (d) prefetching, among others. The project is also developing a public software repository and forum, called the XBazaar, for community-developed accelerator kernels. This project includes workshops, tutorials, and the PIs class and summer projects as various means by which to increase community involvement. The broader impacts include productive use of emerging classes of accelerator-augmented computer systems; creation of an open and accessible community repository, the XBazaar, for distributing accelerator-tuned computational kernels, software, and models; training of graduate and undergraduate students; and dissemination through publications, presentations at scientific meetings, lectures, workshops, and tutorials. The framework itself will be released as open-source code and as precompiled binaries for several common platforms, through the XBazaar, as an initial step toward building a community around accelerator kernels.
|
0.915 |
2013 — 2017 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: High-Performance Data Plane Kernels For Software Defined Networking @ University of Southern California
As the use of elastic cloud architectures and mobile computing systems increases, the need for an additional layer of software based networking arises. The research community has proposed open standards to specify network services without coupling specifications with network interfaces, referred to as Software Defined Networking (SDN). Such a software layer can improve the performance of network routers and switches. As demands due to network security increase, aggregation of traffic from diverse applications including high performance computing will further require novel solutions for the data plane.
This project explores hardware as well as software-based solutions to optimize the SDN data plane with respect to latency, throughput, and power efficiency. The work investigates novel algorithms, data structures, and architectures that exploit state-of-the-art technologies including heterogeneous multi-processor system-on-chip architectures, multi/many-core processors, and Programmable Gate Arrays to realize flexible designs for data plane kernels and understand performance tradeoffs. Novel solutions based on hashing and data structures for large-scale IP lookup as well as parallel solutions for multi-field packet classification to support high performance will be developed. The work also develops new techniques for network virtualization and data aggregation using hybrid trees and virtual engines to achieve high performance on various platforms. The broader impact of the project includes providing a flexible and scalable solution for a high performance Internet backbone to support next generation networking.
|
0.915 |
2013 — 2015 |
Swenson, Michelle Simmhan, Yogesh Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Accelerating Graph Analytics On Clouds For Genome Assembly @ University of Southern California
Graph analytics form a canonical Big Data problem that is of significant value to the long tail of science, from social sciences to genomics. While graph algorithms for big-memory machines abound, they are inaccessible to the wider community. Developing appropriate abstractions for graph applications on distributed cyber-infrastructure like Clouds and commodity clusters has been challenging. This work explores a subgraph-centric approach which offers the potential for an order-of-magnitude performance benefit. This work investigates graph algorithms, focused on de novo plant genome sequencing, that use a scalable subgraph-centric graph programming model for Clouds. It offers a novel research direction that can profoundly impact next-generation genome sequencing in addition to other domains where graph abstractions can be employed. It catalyzes research into distributed graph analytics through a critical mass of subgraph-centric algorithms, mitigating the lost opportunity cost in delayed adoption of the technology and domain specific computing abstractions. In the process, it will fundamentally advance scalable graph processing to rapidly accelerate and democratize cyber-infrastructure for Big Data for next generation sequencing.
|
0.915 |
2014 — 2015 |
Prasanna, Viktor Panangadan, Anand |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ieee Ipdps Conference Student Participation Support @ University of Southern California
Title: Student Travel Support for 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS)
This award will support student travel to the 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS). The conference will be held in Hyderabad, India, in May 2015. IPDPS is an international forum for engineers and scientists to present their latest research findings in all aspects of parallel computation. Supporting student travel to attend professional conferences and workshops is a very important mission of the NSF. The broader significance and importance includes fostering the next generation of researchers in this research area, as well as providing international experiences to build a globally-aware workforce. In particular, students will have the opportunity to learn state-of-the-art methodologies, be exposed to novel techniques, and interact with senior researchers in their areas of expertise.
|
0.915 |
2016 — 2018 |
Prasanna, Viktor Tehrani, Arash |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Safer Connected Communities Through Integrated Data-Driven Modeling, Learning, and Optimization @ University of Southern California
Crime is a major problem in many urban communities. This project focuses on developing a framework for increased security and crime prevention in crime-prone environments by identifying and integrating hitherto disaggregated heterogeneous data and analyzing the causal and spatio-temporal interconnections between constituent parts of a connected community including environmental aspects (i.e., traffic, lighting, poverty levels, business proximity such as banks/ATMs), crime history, and social events. While existing crime prediction and prevention methods focus on the location of the crimes to detect ``hot-zones'', this project takes a fundamentally different, data-driven approach towards integrated multi-scale data analytics for identifying the characteristics and features of crime-prone environments. This high-risk high-payoff project research is based on real-time crime data and interactions with crime prevention and safety agencies. By revealing the connections between crime and environmental, social, and economic factors, this research aims to demonstrate the critical need of an integrated systems approach to crime prevention, instead of focusing on post-crisis management.
This interdisciplinary endeavor of developing computational methods for crime prevention across public urban landscapes requires the combination of data mining and statistical methods in space and time to extract useful features and discover models from passive data sets. The proposed project will develop 1) new tools for the fundamental understanding of criminal behavior by analyzing the time varying and location-specific systems and patterns observed as a result of complex processes between interacting cyber-physical entities, and 2) scalable data-driven Nowcasting algorithms for crime prediction that will adapt with the constantly evolving state of criminal activity by continuously learning from a rich set of spatial and demographic features, including traffic, spatial attributes, socio-economic characteristics of neighborhoods, and current time, as well as context. To enable continuous forecasting over streaming data, while maintaining high prediction accuracy and low time complexity, the project will develop and train crime prediction artificial neural networks (CANN) for prediction across space and time. The output of the proposed data-driven models will feed a novel multi-objective optimization formulation that will be used for the integrated optimization of personnel positioning, patrol scheduling and safest route calculation. The resulting decision support environment, will be transferred to the USC Department of Public Safety (DPS), the Los Angeles Police Department (LAPD), and South Park Business Improvement District (SPBID) for integration with their systems to enable decision makers to choose the best course of action at any given time.
This project will lead to the development of technology for crime prevention that will be directly applicable to smart and connected communities across the US, with the potential to bring together white and blue-collar residents from mixed urban communities- college campus residents, off-campus neighborhood residents and businesses with their employees, transiting commuters and law enforcement under the theme of making the communities quantifiably more secure. The project will leverage the USC Living Laboratory, a unique ?city within a city? campus and its adjacent neighborhoods as a real-world use case of a connected community of interrelated infrastructures.
|
0.915 |
2016 — 2019 |
Prasanna, Viktor Chelmis, Charalampos (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cns: Csr: Small: Exploiting 3d Memory For Energy-Efficient Memory-Driven Computing @ University of Southern California
Semiconductor technology is facing fundamental physical limits creating an increased demand for acceleration of data-intensive applications on architectures that bring memory much closer to reconfigurable compute logic. Three dimensional integrated circuits (3DIC) appear to be the most prominent technology towards memory-driven computing by enabling large amounts of memory stacked in layers to be accessed by a logic unit using high bandwidth vertical interconnects. Software-defined technologies can provide the framework for harnessing the potential breakthrough performance of 3D and other advanced memory technologies in a holistic but dynamic manner, while at the same time hiding their internal complexity. This project focuses on developing a novel software paradigm to perform algorithmic exploration of memory-driven computing on new memory architectures and facilitate the development of massively parallel algorithms for memory-unconstrained computing with the potential for breakthrough performance levels.
The project will develop Software-Defined 3D Memory (SD3DM) as a transformative layer for memory-driven computing that will not simply virtualize 3D memory but will holistically address the oncoming reality of massive on-chip 3D Memory for accelerating data-intensive applications while jointly optimizing energy consumption. Memory access optimizations will be developed at the algorithm level to meet application performance objectives of throughput, latency, and energy efficiency. Specifically, the optimizations will be designed to fully exploit the characteristics of target architectures by (i) carefully defining application-specific dynamic data layouts, (ii) developing application-specific memory controllers for runtime support, and (iii) designing novel in-memory data permutation mechanisms to accelerate inter-stage communication. Integer Linear Programming (ILP) and Stochastic Programming (SP) based dynamic data layouts that exploit the interlayer pipelining and parallel vault access features of 3D memory for throughput and energy-optimal mapping of data to different memory components will be developed. Data layout algorithms will be developed in in conjunction with application-specific memory controllers to provide maximum pipeline execution efficiency for any given application.
The proposed optimizations will be demonstrated on widely used signal processing and machine learning algorithms with diverse data access and logic use requirements. Successful completion of this project will directly lead to a significant increase in the size of signal processing and machine learning problems that can be solved on emerging 3DIC platforms at speeds that were not possible before. The developed work will potentially influence multiple application domains. The investigators will encourage the participation by women, minorities, and under-represented groups in the project through USC's Minority Opportunities in Research (MORE) Programs.
|
0.915 |
2019 — 2022 |
Prasanna, Viktor Kuppannagari, Sanmukh Rao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Oac Core: Small: Scalable Graph Analytics On Emerging Cloud Infrastructure @ University of Southern California
Graphs are powerful tools for representing real world networked data in a wide range of scientific and engineering domains. As examples, graphs are used to represent people and their interactions in social networks, or proteins and their functionality in biological networks, landmarks and roads in transportation networks, etc. Understanding graph properties and deriving hidden information by performing analytics on graphs at extreme scale is critical for the progress of science across multiple domains and solving real world impactful problems. Cloud platforms have been adopted to perform extreme scale graph analytics. This has led to exponential increase in the workloads while at the same time the rate of performance improvements of cloud platforms has slowed down. To address this, cloud platforms are being augmented with accelerators. However, the expertise required to realize high performance from such accelerator enhanced cloud platforms will limit their accessibility to the broader scientific and engineering community. To address this issue, this project will research and develop a toolkit to provide Graph Analytics as a Service to enable researchers to easily perform extreme scale graph analytics workflows on accelerator enhanced cloud platforms. This will significantly increase the productivity of the researchers as i) the researchers will avoid the steep learning curve of developing parallel implementation of graph analytics algorithms, and ii) the increased size and scale of graph analytics will allow researchers to analyze significantly large datasets at reduced latency thereby enriching the quality of the domain research. Moreover, the techniques developed in this project will also be applicable for performing streaming graph analytics at the "edge" for applications such as autonomous vehicles, smart infrastructure, etc. The toolkit is expected to be used in many engineering and science disciplines including power systems engineering, network biology, preventive healthcare, smart infrastructure, etc. The research conducted in this project will also constitute materials appropriate for inclusion in graduate and undergraduate courses.
The project will research and develop high performance graph analytics algorithms and software for key graph workflows and kernels spanning multiple scientific and engineering domains. The target platform will be accelerator enhanced cloud platforms consisting of emerging node architectures comprising of multi-core processors, Field Programmable Gate Arrays (FPGAs) and high bandwidth memory (HBM) with cache coherent interface. An integrated optimization framework consisting of memory optimizations and partitioning and mapping techniques will be developed to exploit the heterogeneity of the target platforms. Specifically, techniques for optimal memory data layout and integrated optimizations for cloud execution will be developed to realize scalable performance in accelerator enhanced cloud platforms. The memory data layout optimization seeks to fully exploit the high bandwidth provided by HBM by ensuring data reuse for a broad class of graph analytics problems. The proposed software will ensure seamless parallel processing of the entire graph on a single heterogeneous node architecture as well as cloud platforms with multiple heterogeneous nodes. The integrated optimization framework will be developed into a scalable, deployable, robust Cyber Infrastructure (CI) toolkit to provide Graph Analytics as a Service (GAaaS). The framework will be developed using state-of-the-art heterogeneous platforms. By accelerating graph analytics workflows on cloud platforms, this project will enable researchers to perform extremely large-scale graph analytics workflows which are key components of many scientific and engineering domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2019 — 2022 |
Prasanna, Viktor Qian, Xuehai [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Spx: Collaborative Research: Fastleap: Fpga Based Compact Deep Learning Platform @ University of Southern California
With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used because of their high accuracy, excellent scalability, and self-adaptiveness properties. Many applications employ DNNs as the core technology, such as face detection, speech recognition, scene parsing. To meet the high accuracy requirement of various applications, DNN models are becoming deeper and larger, and are evolving at a fast pace. They are computation and memory intensive and pose intensive challenges to the conventional Von Neumann architecture used in computing. The key problem addressed by the project is how to accelerate deep learning, not only inference, but also training and model compression, which have not received enough attention in the prior research. This endeavor has the potential to enable the design of fast and energy-efficient deep learning systems, applications of which are found in our daily lives -- ranging from autonomous driving, through mobile devices, to IoT systems, thus benefiting the society at large.
The outcome of this project is FASTLEAP - an Field Programmable Gate Array (FPGA)-based platform for accelerating deep learning. The platform takes in a dataset as an input and outputs a model which is trained, pruned, and mapped on FPGA, optimized for fast inferencing. The project will utilize the emerging FPGA technologies that have access to High Bandwidth Memory (HBM) and consist of floating-point DSP units. In a vertical perspective, FASTLEAP integrates innovations from multiple levels of the whole system stack algorithm, architecture and down to efficient FPGA hardware implementation. In a horizontal perspective, it embraces systematic DNN model compression and associated FPGA-based training, as well as FPGA-based inference acceleration of compressed DNN models. The platform will be delivered as a complete solution, with both the software tool chain and hardware implementation to ensure the ease of use. At algorithm level of FASTLEAP, the proposed Alternating Direction Method of Multipliers for Neural Networks (ADMM-NN) framework, will perform unified weight pruning and quantization, given training data, target accuracy, and target FPGA platform characteristics (performance models, inter-accelerator communication). The training procedure in ADMM-NN is performed on a platform with multiple FPGA accelerators, dictated by the architecture-level optimizations on communication and parallelism. Finally, the optimized FPGA inference design is generated based on the trained DNN model with compression, accounting for FPGA performance modeling. The project will address the following SPX research areas: 1) Algorithms: Bridging the gap between deep learning developments in theory and their system implementations cognizant of performance model of the platform. 2) Applications: Scaling of deep learning for domains such as image processing. 3) Architecture and Systems: Automatic generation of deep learning designs on FPGA optimizing area, energy-efficiency, latency, and throughput.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2020 — 2021 |
Prasanna, Viktor Srivastava, Ajitesh |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Rapid: Recover: Accurate Predictions and Resource Allocation For Covid-19 Epidemic Response @ University of Southern California
The recent outbreak of COVID-19 and its world-wide impact calls for urgent measures to contain the epidemic. Predicting the speed and severity of infectious diseases like COVID-19 and allocating medical resources appropriately is central to dealing with epidemics. Epidemics like COVID-19 not only affect world-wide health, but also have profound economic and social impact. Containing the epidemic, providing informed predictions and preventing future epidemics is essential for the global population to resume their day-to-day work and travel without fear. Shortage of resources puts undue stress on healthcare system further risking health of the community. Preparedness and better management of available resources would require specific predictions at the level of cities and counties around the world rather than solely at the level of countries. The project will provide a predictive understanding of the spread of the virus by developing machine learning based computational models to study the transmission of the virus and evaluate the impact of various interventions on disease spread. The project will learn infection prediction models for COVID-19 considering the following. (i) Predicting at state/county/city-level rather than country-level as finer granularity is essential in planning and managing resources. (ii) How infectious a person is changes over time. Learning the model through observed data will help in understanding of the temporal nature of the virality. (iii) At such granularity travel is a significant reason for the spread and needs to be accounted for. (iv) Available data needs to be ?corrected? by finding the number of underlying unreported cases that are not observed and yet influence the epidemic dynamics. The project will also solve the resource allocation problem based on the prediction ? for instance if a certain number of masks will be available next week in a certain state, how should they be distributed across different hospitals in the state (which hospitals and how many in each state)?
Proposed project ReCOVER will use a novel fine-grained, heterogeneous infection rate model to perform predictions at various granularities (hospital/airports, city, state, country) while accounting for human mobility. ReCOVER will integrate data from various sources to build highly accurate models for prediction of the epidemic across the world at various granularity. Due to the ability to capture temporal heterogeneity in infection rate, the approach has the potential to provide insights into infectious nature of COVID-19 which are not fully understood yet. The project will address the issue of unreported cases through temporal analysis of historical infections and correct the data. The right granularities of modeling will be automatically identified, e.g., when to model a state over its cities to trade-off precision for higher reliability in predictions. The proposed project also formulates and solves a resource allocation problem that can guide the response to contain the epidemic and prevent future outbreaks. This is provided by optimal solutions to resource allocation over a network where each node (representing a region) has a function that captures probabilistic response. While the project obtains data with COVID-19 in consideration, the model and algorithms developed under the project are applicable to a wide class of contagious diseases. The project will culminate into an interactive customizable tool that can be used to perform predictions and resource management by a qualified user such as a government entity tasked with managing the epidemic response. The data and code will also be shared with research community.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2020 — 2023 |
Prasanna, Viktor Kuppannagari, Sanmukh Rao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cns Core: Small: Accelrite: Accelerating Reinforcement Learning Based Ai At the Edge Using Fpgas @ University of Southern California
Artificial Intelligence (AI) has led to significant progress in several domains such as self-driving cars and robotics. Reinforcement Learning (RL) is a class of AI that includes algorithms that enable machines to teach themselves optimal decision making. However, RL algorithms are complex and time-consuming, which render them unsuitable for applications that require fast response. Heterogeneous platforms, which couple a Central Processing Unit (CPU) with an integrated circuit that can be configured - Field Programmable Gate Arrays (FPGA) are promising candidates for implementing fast algorithms due to their capabilities. The project will develop fast implementations of RL algorithms targeting such platforms. The intellectual merits of the project include the research and development of innovative optimizations that exploit the heterogeneity of the emerging class of FPGA devices and address challenges such as conflicts in parallel accesses to shared objects, irregular memory accesses, and overheads in fine grained acceleration. The project will develop parameterized performance models for key AI kernels ? Stochastic Gradient Descent (SGD), conjugate gradient, parallel hash tables, and neural networks, to enable energy-performance trade-off analysis. The proposed project will develop a novel spatiotemporal constraint graph-based design space exploration technique to accelerate RL algorithms by taking a holistic view of the algorithm.
The broader impact of the project is in efficient use of heterogeneous architectures consisting of CPUs and FPGAs coupled with cache coherent memory for accelerating AI for edge computing. Successful completion of this project will lead to significant increase in the complexity of AI applications that can be deployed in real world environments. This will lead to a dramatic improvement in the capabilities of AI enabled devices such as self-driving cars, robotics, and wearable healthcare devices. The project will also constitute materials appropriate for inclusion in graduate and undergraduate courses.
All software developed in the project will be posted on github at: https://github.com/pgroupATusc. Software releases will be maintained for a period of not less than 3 years after the conclusion of the grant.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2021 — 2024 |
Kuppannagari, Sanmukh Rao Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Satc: Core: Small: Accelerating Privacy Preserving Deep Learning For Real-Time Secure Applications @ University of Southern California
Currently, to draw insights from data, the owner needs to send them to a cloud server to perform complex Machine Learning based analytics. To enable data security, the data is encrypted by the owner and sent to the cloud server where it is decrypted to perform analytics. For privacy sensitive applications such as healthcare, finance, etc., this leads to data security concerns as the decrypted data on the cloud may be snooped by malicious actors. To address this concern, this proposal will develop techniques to efficiently perform Machine Learning (ML) analytics on encrypted data, without a need for decoding, thereby enabling end-to-end privacy.
The proposed project will develop optimizations targeting Field Programmable Gate Arrays (FPGAs) to address the challenges such as conflicts in parallel access to shared objects, irregular memory accesses, low data reuse, etc., which are prevalent in many application domains. Moreover, the parameterized FPGA Intellectual Property (IP) cores for the key kernels of privacy preserving Deep Neural Networks (DNNs) such as Number Theoretic Transform (NTT), rotation, multiplication, etc., that will be developed in the project will allow application developers to easily implement a wide variety of privacy preserving Machine Learning/Deep Learning models. Additionally, the proposed acceleration techniques are applicable to applications which rely on post-quantum lattice based cryptography.
The broader impact of this work is in efficient use of emerging data center and cloud platforms for accelerating Homomorphic Encryption (HE) based DNNs for real-time secure applications. Successful completion of this project will lead to a significant increase in the capabilities of privacy sensitive applications by enabling them to utilize public clouds in a trusted and secure manner. The project will identify and expose underrepresented and underserved students to STEM (Science, Technology, Engineering, Mathematics) through various programs at the University of Southern California. The proposed research will also constitute materials appropriate for inclusion in graduate and undergraduate courses.
All software developed in the project will be posted on github at: https://github.com/pgroupATusc. Software releases will be maintained for a period of not less than 3 years after the conclusion of the grant.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2021 — 2022 |
Kuppannagari, Sanmukh Rao Prasanna, Viktor Qian, Xuehai (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research:Pposs:Planning: Streamware - a Scalable Framework For Accelerating Streaming Data Science @ University of Southern California
In grand-challenge scientific applications, the enormous amount of data produced by the sensing and instrumentation infrastructure often loses its value after a small window of time. Thus, to obtain actionable intelligence from the data, streaming analytics, i.e., the ability to analyze in-motion data, is increasingly becoming critical. Moreover, modern computing systems are highly heterogeneous, consisting of processors, accelerators, and large high-bandwidth external memories. To develop scalable streaming analytics applications, challenges across the full system stack -- from application to target platform -- need to be addressed. In this regard, this planning project is identifying a comprehensive set of research challenges, goals, key innovations and timelines in algorithms and applications, systems software, hardware-software co-design, and computer architecture. This project is bringing together a community of application developers and users, computer scientists, and data scientists, whose interests lie in building streaming data science applications targeting a wide variety of scalable systems. This project is demonstrating preliminary results on how it will achieve significant cross-stack performance improvements using Privacy Preserving Streaming Graph Learning for Secure Smart Grids as the driving application.
Modern data-science applications are characterized as being highly decentralized, distributed and requiring composition and orchestration between localized analytics on thousands or millions of edge platforms and massive centralized analytics in cloud/data centers, as well as requiring real-time analytics on streaming data. To enable scalable performance of grand-challenge streaming data-science applications, a framework that allows developers to seamlessly build these applications targeting a wide variety of scalable systems is needed. This planning project is conducting preliminary research towards a large proposal for developing an opensource framework, StreamWare, that will enable users to develop streaming data-science applications. This project is establishing a community of application developers and users, computer scientists, and data scientists who would serve as early adopters and developers of the StreamWare framework. In consultation with domain experts, a list of key data-science kernels for StreamWare is being generated, and their existing state-of-the-art algorithms and hardware IPs are being evaluated to identify performance limitations and opportunities for improvement. This project is also articulating the requirements of novel abstractions that can represent and operate on streaming data on heterogeneous platforms. This project uses Privacy Preserving Streaming Graph Learning for Secure Smart Grids as a motivating application to show preliminary evidence of end-to-end scalability using a novel notion of symbiotic scalability that captures the impact of StreamWare's cross-layer optimizations. The expected outcomes of this planning project include a proposal for the research activities to be carried out in the large grant, publications on the results of the survey activities and future research directions for enabling streaming data science, and curricula for future graduate and undergraduate courses.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2022 — 2025 |
Prasanna, Viktor |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Oac Core: Scalable Graph Ml On Distributed Heterogeneous Systems @ University of Southern California
Methods that employ graph machine learning (Graph ML), which is a sub-discipline within machine learning that deals with graph data, are becoming important in many key science and engineering domains. For example, the predictive power of graph embedding has been effectively utilized in domains such as social media, biology, pharmacology, and knowledge understanding. However, such methods typically come with an expensive computational footprint, as the computations often need to be performed in real-time on very large and highly heterogeneous static and dynamic graphs with billions of vertices and edges of different types. This project aims at conducting multi-pronged research to enable creation of a cyberinfrastructure (CI) toolkit to run such complex Graph ML applications on emerging heterogeneous distributed systems.<br/> <br/>The objective of this project is to develop high-performance Graph ML algorithms for key graph workflows spanning multiple scientific and engineering domains targeting distributed heterogeneous systems composed of multi-core processors, Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), accelerators and high bandwidth memory interconnected with cache coherent interfaces. The project develops a scalable, deployable, and robust CI toolkit consisting of: (1) novel graph sampling algorithms and efficient Graph ML models for low complexity training and inference computation on static and dynamic graphs; (2) a heterogeneity-aware hardware mapping methodology to accelerate these algorithms and models; and (3) software and hardware libraries for automatic design generation. The project develops proof of concept software for the ML and Data Science communities to facilitate end-to-end deployment of various large-scale applications. Given that graph neural networks are increasingly becoming an important tool for analyzing data in many diverse domains, the outcomes of this project will have a strong impact across a broad range of disciplines, including domains that rely on edge computing, such as autonomous vehicles and smart cities. The project has a robust plan to integrate research into education programs and focuses on activities that promote involvement of students from minority and economically disadvantaged backgrounds into the research.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |