1988 — 1994 |
Li, Kai |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shared Virtual Memory Systems For Large-Scale Multiprocessors
This project continues research on shared virtual memory for multicomputers and further research on fault tolerant, real-time, parallel systems based on shared virtual memory. Such a system will be called persistent shared virtual memory. The system research is based on the principal investigator's previous research: shared virtual memory for multicomputers, concurrent real-time methods for checkpointing, garbage collection, main-memory transaction processing. Design methods for persistent memory management for a large class of multicomputers are studied and a persistent shared virtual memory system for the Intel iPSC/2 and iPSC/i860 multicomputers will be implemented.
|
1 |
1995 — 1998 |
Li, Kai Clark, Douglas [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Architectural and Organizational Issues For High-Performance Uniprocessors
Caches and write buffers are becoming increasingly important to memory hierarchy designs since they are the key components for bridging the widening performance gap between microprocessors and DRAMs. This research studies the details of memory-referencing behavior in contemporary uniprocessors such as multiple referencing behavior . In particular, the research investigates the fine distinctions that characterize data references with respect to their cache and write-buffer behavior, and also studies possible architectural and organizational enhancements that will benefit writes.
|
1 |
1995 — 2000 |
Lipton, Richard Li, Kai Felten, Edward (co-PI) [⬀] Clark, Douglas (co-PI) [⬀] Martonosi, Margaret (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shrimp: Architectural and Systems Support For Inexpensive, High-Performance Multicomputers
This project is building a high-performance multiprocessor from commodity desktop computer systems and off-the-shelf interconnects. Commercial Intel Pentium workstation boards, each with attached memory, disk, and I/O, are attached to a Paragon backplane. Communication uses a new mechanism called virtual memory-mapped communication, which disguises interprocessor communication as write operations to memory. The node interface maps physical pages in the memories of individual nodes to each other, so that a write to one mapped page results in messages to other nodes that share the mapped page. The operating systems on the individual nodes use their ordinary virtual memory mechanism to support virtual page mapping. In addition to this word-by-word communication, DMA transfers are available, with control registers located in the address space of individual processes. This allows high bandwidth communication that maintains user-level protection. Research to be addressed in the project includes the achievement of high-bandwith low-latency communication between processes, the structure of an I/O system supported by the new communication mechanism, and performance evaluation of the resulting system.
|
1 |
1998 — 2003 |
Li, Kai Singh, Jaswinder [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Experimental Software Systems: Integrated Applications and Systems Research in Software Shared Memory
9806751 Singh, Jaswinder Pal Li, Kai Princeton University
Experimental Software Systems: Integrated Applications and Systems Research in Software Shared Memory
A coherent shared address space (SAS) is an attractive programming model for parallel computing. It is emerging as commercially successful when implemented in hardware on tightly coupled multiprocessors. With less tightly coupled clusters of workstations or symmetric multiprocessors becoming important platforms, it is important to support this model in software on clusters as well. Otherwise, the need to run applications on both types of platforms may drive the more difficult explicit message passing model to dominate. This research develops and evaluates software SAS systems on clusters of various scales and organizations. The research is distinguished by being highly application-driven, including collaboration with application scientists. It examines protocols/systems and applications simultaneously without treating either as fixed. Protocols and systems are enhanced based on bottlenecks encountered in real applications, the role of hardware support is examined, and different software approaches are compared. The focus is on both programming ease and performance, and includes comparing the SAS and message passing models on tightly-coupled systems and clusters. The research will result in new, scalable software SAS systems, a greater integration of applications and systems research that is now necessary for major advances, and a better understanding of the tradeoffs in programming models across platforms.
|
1 |
1999 — 2003 |
Peterson, Larry [⬀] Li, Kai Felten, Edward (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
General Purpose Routers For the Next Generation Internet
There has been tremendous pressure over the last several years to push functionality from end hosts onto network routers. Whether one calls the resulting systems routers, application gateways, active networks, proxies, or even topology-aware servers, a general trend was recognized: the logic that decides how to process packets has grown more and more complex over time. It is a contention that the potential for this trend to continue is almost unlimited, which suggests the question: what are the important properties for routers in the next generation Internet? The projects answer is a new router architecture, which they call a general purpose router (GPR), that supports arbitrarily complex forwarding logic. The GPR architecture has six unique features:
Performance: Provides the throughput required by the next generation Internet.
Extensible: Easily extended to support new forwarding functions without compromising performance.
Scalable: Scales to relatively large sizes, on the order of a hundred of Gbps ports.
Open: The hardware and software should be open so anyone can build or extend a router.
Commodity Components: Implemented using commercially available components.
Robustness: Robust enough to tolerate programming mistakes and malicious attacks.
The bottom line is that the project recognizes a need for routers to move from being closed, special-purpose network devices to being open, general-purpose computing/communication systems. The central challenge in making this shift is to simultaneously support increasing complex forwarding logic and high performance, while using commercial hardware components and commercial operating systems. The GPR architecture achieves this through two key innovations.
Better integration of the router's switching capacity and compute cycles. The project expects this to result in significantly better scaling properties, and an order of magnitude improvement in performance for packets that require only minimum processing cycles.
A hierarchy of paths through the router, ranging from fast/fixed paths implemented entirely in hardware to slow/programmable paths implemented entirely in software, but also including intermediate paths that exploit the improved integration of cycles and switching.
In addition to implementing the GPR architecture---and solving the configuration, scheduling, and resource management problems that doing so will entail---the project will design and implement several novel applications:
Edge routers that transition between different assumption regions of the Internet. Of particular note, the project will develop router functionality for deeply nested networks that include thin devices (e.g., embedded systems and low-power devices). The router needs to subsume some of the responsibility usually taken by the end node.
A scalable display system that consists of an array of parallel display processors (and associated frame buffers), each of which is responsible for some region of a wall-sized display. The router that serves as a front-end to this array---i.e., connects it to a graphics source--must fragment packets containing graphics directives and forward each fragment to the correct processor.
An internal firewall that implements enclaves and protects hosts within a site from each other. Unlike a firewall that sits at the edge of a site, such a router must authenticate users, enforce access control, log usage, and implement intrusion detection.
|
1 |
1999 — 2003 |
Li, Kai Singh, Jaswinder (co-PI) [⬀] Funkhouser, Thomas (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Next Generation Software: Adaptive, Performance-Portable Software For Next-Generation and Immersive Applications
EIA-9975011 Princeton University Kai Li
Next Generation Software: Adaptive, Performance-Portable Software for Next-Generation and Immersive Applications
A new generation of applications is becoming very important for high-performance computing, including collaborative design, interactive walkthroughs and large data visualization, and telepresence. They require tremendous resources including CPU, memory, storage, and audio/visual devices, and they have substantially different characteristics, performance goals and system interactions than traditional scientific applications. For example, they have extremely irregular and unpredictable data access needs and workload distributions, they interact more dynamically and with many more types of input/output sensors and devices, they involve dynamic user interaction and steering, and their goal is to deliver the best possible quality at a fixed output refresh rate rather than a solution of fixed quality in the minimum possible time. As computer architectures become more complex, it becomes increasingly difficult to develop such applications to achieve the desired performance. Three properties are critical: (i) high performance for rich interactive behavior, (ii) adaptability and isolation in all layers (i.e. the complexity, and unpredictability demand that each layer of application or system software must adapt to the layers above and below it-through performance modeling and through runtime feedback and adaptation-and should try to shield the neighboring layers from each other's complexity), and (iii) performance portability across component upgrades and across the different major types of platforms that may be used in such environments. Our goal is to develop the software building blocks, runtime systems and design methodologies to assist such application development.
|
1 |
2001 — 2007 |
Dobkin, David (co-PI) [⬀] Peterson, Larry [⬀] Li, Kai Felten, Edward (co-PI) [⬀] Martonosi, Margaret (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Infrastructure: Cise Pervasive Computing: Applications and Systems
EIA-0101247 David P. Dobkin Princeton University
CISE Research Infrastructure: CISE Pervasive Computing: Applications and Systems
We are entering a new era in computing, the era of ubiquitous computing. In this world, our classrooms, labs, offices, and homes will be filled with a diverse collection of sensor, display and computing devices. Ubiquitous and pervasive displays will revolutionize the way we use computers.
In such an environment, the conventional view of the network as providing bit-pipes between clients and servers will no longer be appropriate. Many of the devices available in the environment will have limited computational capabilities and be connected by limited-capacity networks. So, we need an intelligent network that will be implemented by a collection of servers and programmable routers that overlay the physical network substrate.
The award is to build a research infrastructure consisting of three components. At the "edge" of the system, will be a variety of display technologies and sensors. At the "core'' of the system, will be an intelligent network using commodity PCs and emerging network processors. Underlying everything will be commodity wired and wireless networks to provide connectivity among the edge devices and nodes in the intelligent network. This network will augment the CS Department's current network, which already includes both wired and wireless components.
|
1 |
2003 — 2006 |
Li, Kai Wang, Randolph [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Ubiquitous Mobile Storage
One consequence of the phenomenal storage density improvement is the emergence of highly compact disk storage that can be integrated into various computing and networking devices of various shapes and forms. Our conjecture is that mobile storage will become a dominant form of storage in the near future, especially for personal user data, subsuming conventional disks enshrined in server rooms. This proposal describes a project that studies how to build, manage, and use discrete storage devices to form ad hoc, distributed storage systems.
In this project, we propose to build system software to intelligently coordinate the discrete storage elements. The system has four core mechanisms: (1) a multicast-like data location mechanism, (2) an invalidation mechanism for purging obsolete data from the system, (3) a snapshot mechanism for supporting sharing and backup, and (4) a storage level solution that can support existing file systems. In this proposal, we describe how combinations of these four core mechanisms allow us to achieve our consistency, transparency, reliability, security, and performance goals.
Furthermore, the data management needs addressed by this project are by no means limited to traditional desktop applications. As the data management functionalities are separated from cumbersome generic computing devices, and as these functionalities are cleanly encapsulated in modular small form factor devices that can readily interact with other consumer electronic devices (such as cameras, MP3 players, phones, and email devices), these application-specific devices would be freed from the burden of having to solve and re-solve a difficult mobile storage problem, and we may multiply the utility of these devices and potentially foster new applications.
|
1 |
2004 — 2008 |
Li, Kai Funkhouser, Thomas (co-PI) [⬀] Rusinkiewicz, Szymon (co-PI) [⬀] Troyanskaya, Olga (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: Software Tools For New-Generation, Display-Centric Applications
The goal of this research project is to develop new software tools and applications for scalable display systems. These primary focus is on methods that coordinate multiple displays, multiple users, and multiple applications to enable true display-centric computing. For coordinating multiple displays, the project will develop dynamic feedback to build adaptive layered multi-resolution display systems and to study how to achieve integrated, continuous calibration capable of delivering high-quality information display. For coordinating multiple users, software tools that manage information display intelligently and securely for seamless exchange of visual information will be developed. For coordinating multiple applications, the project will study how to design an adaptive infrastructure that enables multiple applications to share a scalable display efficiently.
|
1 |
2005 — 2010 |
Peh, Li-Shiuan (co-PI) [⬀] Li, Kai Martonosi, Margaret (co-PI) [⬀] August, David (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr--Ehs: Flow-Based Computer Systems Support For Synergistic Hardware-Software Management of Embedded Systems
Today's embedded systems have been designed in an ad hoc manner, each system re-designed from scratch to handle new system and software requirements. As requirements for embedded systems are changing rapidly, a key challenge is to develop general design methodologies that can scale to new VLSI technologies such as multiple cores on a billion-transistor embedded chip, new power-performance targets, and new-generation software systems.
This research proposes a flow-based embedded system that focuses on an execution model based on flows and a corresponding embedded system platform based on the flow execution model. In a flow-based embedded system, the hardware dynamically adapts to (1) heterogeneity in an embedded system and (2) energy constraints while ensuring (3) real-time deadlines are met, and (4) the software is shielded from all the above hardware complexities through the flow execution model, and is thus (5) portable across hardware generations. Flows indicate all potential partition points in an application; thus they expose points that allow the systems software (and supporting hardware) to dynamically adapt the actual partitioning or parallelism in the face of real-time deadlines, energy and reliability constraints, and heterogeneity. The scope of the project includes investigating flow-parallelizing compiling techniques that automatically extract flows from sequential code, novel hardware mechanisms that ensure low-overhead dynamic execution adaptation, lightweight OS support for the flow model across a range of embedded applications.
|
1 |
2005 — 2009 |
Charikar, Moses (co-PI) [⬀] Li, Kai Cook, Perry (co-PI) [⬀] Troyanskaya, Olga (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr-Pdos-Content-Searchable Storage For Feature-Rich Data
Storage capacity and data volume have been doubling every 18 months during the past two decades. A key challenging issue in building next-generation storage systems is to manage massive amounts of feature-rich (non-text) data, which has dominated the increasing volume of digital information. Comparing noisy, feature-rich data requires fast similarity match instead of exact match, and thus exploring such data requires similarity search instead of exact search. Current file systems are designed for named text files; they do not have mechanisms to manage feature-rich data. To date, there is no practical storage system with the ability to do similarity search for noisy, high-dimensional data and there is no index engine design for efficient similarity search. This research addresses this problem by studying how to design and implement a content-addressable and -searchable storage (CASS) system to manage and explore diverse feature-rich data. The system includes a built-in similarity search engine for general-purpose, noisy, highdimensional metadata using compact data structures and novel indexing methods. The research will also develop segmentation methods and feature extraction methods for audio, image and genomic data, and develop similarity search benchmarks and to evaluate the CASS system.
This research will advance knowledge and understanding in the area of storage system designs such as data structures, mechanisms, and APIs for managing, searching and exploring noisy, high-dimensional feature-rich data. The research will accelerate the development of next-generation storage systems which will revolutionize how to access, search, explore and manage massive amounts of feature-rich data in many disciplines.
|
1 |
2007 — 2009 |
Peh, Li-Shiuan (co-PI) [⬀] Li, Kai |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
U.S. Student Aid For Attending the 35th Annual International Symposium On Computer Architecture
This award is aimed to aid US students to attend the ACM/IEEE 35th Annual International Symposium on Computer Architecture which will be held in Beijing, China, in June 2008. The ACM/IEEE Annual International Symposium on Computer Architecture (ISCA) is the flagship conference in computer architecture and its conference proceedings are viewed as the most prestigious publication venue in the computer architecture community. The symposia started in 1973 and it has been sponsored by the two most important academic organizations in computer architecture, the Special Interest Group on Computer Architecture (SIGARCH) of the Association for Computing Machinery (ACM) and the Technical Committee on Computer Architecture (TCCA) of the Institute of Electrical and Electronics Engineers (IEEE). The majority of historically high-impact publications on computer architecture have been published in this conference. It is a significant event that the 35th Annual Symposium of ISCA will be held in Beijing, China, in June 2008. This will be the first ISCA meeting held in China since the symposia started in 1973. This award will assist deserving students to attend and present their work at the meeting.
|
1 |
2009 — 2010 |
Ostriker, Jeremiah (co-PI) [⬀] Li, Kai August, David [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: a Hybrid Approach For Petascale Computing: Accelerating Scientific
Intellectual Merit: The proposed work is an exploratory research effort to automatically extract parallelism from sequential program and to schedule the resulting fine-grained computational elements on manycore processors. The goal is to allow existing sequential programs to run on many-core processors efficiently and build the foundation to enable a hybrid approach, involving message passing and shared memory, to address petascale programmability This exploratory research will attack the following issues: ? Design a compiler to decompose the code running on a single node into fine-grained computation tasks to utilize the collection of cores on a single chip. ? Develop a highly-efficient runtime system to schedule fine-grained tasks to optimize for available parallelisms and to maximize on-chip cache locality to overcome off-chip memory latency and bandwidth constrains. ? Evaluate our success with a newly released benchmark suite PARSEC which allows us to compare our success with hand-tuned parallel solutions. We also plan to evaluate one computational science application
Broader Impact: The potential impact of this project is significant. First, the success of the proposed research would advance knowledge and understanding in parallel programming to exploit the power of future parallel machines. Second, the success of the project will accelerate software developmentfor petascale computing. Third, the proposed compiler and runtime systems will provide the capability to run existing large-scale computational science programs on petascale computers without burdensome programming efforts.
|
1 |
2012 — 2015 |
Li, Kai Norman, Kenneth (co-PI) [⬀] Turk-Browne, Nicholas (co-PI) [⬀] Lee, Ray Cohen, Jonathan [⬀] Cohen, Jonathan [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Acquisition of High Performance Compute Cluster For Multivariate Real-Time and Whole-Brain Correlation Analysis of Fmri Data
This Major Research Instrumentation award permits Dr. Jonathan Cohen and four co-investigators to purchase a high-performance computing instrumentation (3,584 cores; 2TB/core; 100TB flash storage) to be used by faculty, postdocs, graduate students and undergraduates within the Princeton Neuroscience Institute (PNI). The instrumentation will allow the analysis of human brain imaging data at a speed and scale not previously possible.
The collaborating researchers are cognitive neuroscientists and computer scientists at Princeton with complementary expertise in human brain imaging and large scale computing. Two primary research objectives are proposed, building on recent progress in applying multivariate pattern analysis (MVPA) methods from machine learning to detect neural signals that correspond to internal mental states, such as perceptions, memories and intentions that are otherwise not accessible to direct observation. To date, use of MVPA has been restricted to the "offline" analyses" after data have been fully collected. However, a growing and powerful use of brain imaging is to give participants feedback about their brain states in real time, allowing them to use this information to better control brain function (e.g., providing feedback about pain areas as a way of learning to control chronic pain). Such real-time feedback methods could be greatly enhanced by adding MVPA. However, this has been computationally intractable until now. Objective 1 addresses this challenge, by inserting a high performance computing system into the brain scanning pipeline. This will be tested in an experiment that uses MVPA to detect patterns of brain activity associated with sustained attention, allowing us to provide real-time brain-based feedback to improve attentional abilities (with potential educational and health benefits).
Objective 2 focuses on another major advance in brain imaging, in which correlations between areas of activity are analyzed, rather than areas of activity in isolation of one another. Such correlations - often referred to as "functional connectivity" - are likely to reveal more about how the brain actually functions, by providing critical information about the interactions between areas. At present, virtually all approaches to functional connectivity focus on the correlations among a limited set of brain areas of interest. However, a more powerful approach would be to examine the correlation of every area with all others. This requires computing the whole-brain correlation matrix. The analysis of such high dimensional data would be further enhanced by applying MVPA to patterns of correlation. However, doing this further increases computational demands. Applying this approach to a routine brain imaging dataset, using currently available instrumentation, would take 880 years to complete. The work under Objective 2 addresses this challenge, by coupling massively parallel computing with sophisticated software optimizations. Doing so can bring previously intractable problems into the range of practicality. These methods will be tested in an experiment that seeks to identify neural representations of intentions, and their influence on brain mechanisms responsible for executing these intentions.
|
1 |