1994 — 1996 |
August, David A |
F31Activity Code Description: To provide predoctoral individuals with supervised research training in specified health and health-related areas leading toward the research degree (e.g., Ph.D.). |
Neurally Plausible Methods For Encoding Analog Signals @ University of Virginia Charlottesville |
0.908 |
2000 — 2003 |
Clark, Douglas (co-PI) [⬀] August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Collaborative Research--Ascertaining Runtime Branch Characteristics Through Algebraic Analysis of Programs
The goal of this research is to reduce the cost and improve the performance of observation-based branch characterization mechanisms in the compiler and hardware. Often, correlation discovered at great cost or missed entirely through execution can be determined in a simple algebraic fashion at compile time. Relationships between program structures can be inferred from these algebraic expressions and subsequently conveyed to compiler optimizations and to the hardware through appropriate mechanisms to be developed by this research. Once employed, these relationships can refocus the efforts expended by observation-based mechanisms, or can eliminate the need for them altogether.
|
1 |
2002 — 2007 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Systematic Design Space Exploration
This project involves the research, development, and dissemination of techniques and tools necessary to perform rigorous, systematic processor design-space exploration. The key element is a framework in which architectural simulators and targeted compilers are automatically derived from a common architectural description, providing architects, microarchitects, and compiler developers with a dynamic, yet coherent, environment within which to design processor systems. Such a system would enhance productivity and design quality by tightening the loop between architectural decision-making and realistic performance feedback and by allowing for more productive collaboration among the members of a prototyping team. The resulting reduction in design cycle times will counter increasingly restrictive market pressures, safeguarding the practicality of designing both new general-purpose microprocessors and application-specific processors (ASIPs), such as those found in networking hardware, cellular telephones, and next-generation digital devices. Additionally, this research will provide invaluable experiences for both undergraduate and graduate students in computer architecture and compilers. This same system is an ideal mechanism by which students can interact with tangible examples of computer architecture and compiler concepts -- it will allow them to rapidly prototype systems, experiment with new ideas, and thereby build intuition about computer systems.
|
1 |
2005 — 2010 |
Peh, Li-Shiuan (co-PI) [⬀] Li, Kai (co-PI) [⬀] Martonosi, Margaret (co-PI) [⬀] August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr--Ehs: Flow-Based Computer Systems Support For Synergistic Hardware-Software Management of Embedded Systems
Today's embedded systems have been designed in an ad hoc manner, each system re-designed from scratch to handle new system and software requirements. As requirements for embedded systems are changing rapidly, a key challenge is to develop general design methodologies that can scale to new VLSI technologies such as multiple cores on a billion-transistor embedded chip, new power-performance targets, and new-generation software systems.
This research proposes a flow-based embedded system that focuses on an execution model based on flows and a corresponding embedded system platform based on the flow execution model. In a flow-based embedded system, the hardware dynamically adapts to (1) heterogeneity in an embedded system and (2) energy constraints while ensuring (3) real-time deadlines are met, and (4) the software is shielded from all the above hardware complexities through the flow execution model, and is thus (5) portable across hardware generations. Flows indicate all potential partition points in an application; thus they expose points that allow the systems software (and supporting hardware) to dynamically adapt the actual partitioning or parallelism in the face of real-time deadlines, energy and reliability constraints, and heterogeneity. The scope of the project includes investigating flow-parallelizing compiling techniques that automatically extract flows from sequential code, novel hardware mechanisms that ensure low-overhead dynamic execution adaptation, lightweight OS support for the flow model across a range of embedded applications.
|
1 |
2006 — 2009 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Ehs: Software-Modulated Fault Tolerance
Microprocessor performance has been increasing exponentially due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, their lower threshold voltages and tighter noise margins make them less reliable, rendering processors that use them more susceptible to transient faults. While many fault-tolerance techniques have been proposed for high-end systems, the high hardware costs of these solutions make them impractical for the desktop and embedded computing markets.
This work develops the concept of software-modulated fault tolerance (SMFT) to reduce the cost of reliability by taking advantage of naturally occurring non-uniformity in programs. By letting the system, the programmer, or even the user decide when and how to apply protection, the impact of fault tolerance can be adapted to best suit the needs of the constantly varying system. By increasing reliability only when warranted, SMFT frees up resources to either increase performance or reduce power. With the development of a set of profiler, compiler, and language techniques, this work allows designers to continue scaling processor performance for all markets despite the presence of transient faults.
|
1 |
2006 — 2011 |
Appel, Andrew (co-PI) [⬀] Clark, Douglas (co-PI) [⬀] Martonosi, Margaret (co-PI) [⬀] August, David Walker, David [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ct: Well-Typed Trustworthy Computing in the Presence of Transient Faults
David Walker Princeton University 0627650 Panel: 060970 Well-typed trustworthy computing in the presence of transient faults
Abstract
In recent decades, microprocessor performance has been increasing exponentially, due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, their lower threshold voltages and tighter noise margins make them less reliable, rendering processors that use them more susceptible to transient faults caused by energetic particles striking the chip. Such faults can corrupt computations, crash computers, and cause heavy economic damages. Indeed, Sun Microsystems, Cypress Semiconductor and Hewlett-Packard have all recently acknowledged massive failures at client sites due to transient faults.
This project addresses several basic scientific questions: How does one build software systems that operate on faulty hardware, yet provide ironclad reliability guarantees? For what fault models can these guarantees be provided? Can one prove that a given implementation does indeed tolerate all faults described by the model? Driven in part by the answers to these scientific questions, this project will produce a trustworthy, flexible and efficient computing platform that tolerates transient faults. The multidisciplinary project team will do this by developing: (1) programming language-level reliability specifications so consumers can dictate the level of reliability they need, (2) reliability-preserving compilation and optimization techniques to improve the performance of reliable code but ensure correctness (3) automatic, machine-level verifiers so compiler-generated code can be proven reliable, (4) new software-modulated fault tolerance techniques at the hardware/software boundary to implement the reliability specifications, and finally (5) microarchitectural optimizations that explore trade-offs between reliability, performance, power, and cost.
|
1 |
2008 — 2009 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cpa-Cpl-T: Collaborative Research: Revisiting the Sequential Programming Model For Multicore Systems
Recently, the microprocessor industry has moved toward multicore microprocessor designs as a means of utilizing the increasing transistor counts in the face of physical and micro-architectural limitations. Unfortunately, providing multiple cores does not directly translate into performance for most codes. To make use of multicore, many new languages have been proposed to ease the burden of writing parallel programs, yet the programming effort involved in creating correct and efficient parallel programs is still far more substantial than writing the equivalent single-threaded version. A more attractive approach is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. Unfortunately, despite decades of research on automatic parallelization, most techniques have only been effective in the scientific and data-parallel domains. With recently gained insight, the investigators showed that the limits of prior thread-extraction approaches are not fundamental. By applying known and new compilation techniques in a systematic manner, the investigators found that SPEC CINT2000, among the most sequential of codes, has abundant scalable parallelism.
In this project, the team of investigators is taking the initial steps toward developing the techniques necessary to build an automatic thread extraction framework. These techniques include developing static transformations that extract parallelism and quantifying the opportunities for dynamic optimization. The system will ultimately consist of a series of static transformations and compiler-inserted hints combined with a run-time optimization component. This framework will address the multicore challenge by reliably extracting parallelism from a wide range of applications without burdening the programmer with what should remain to be low-level implementation details.
|
1 |
2009 — 2010 |
Ostriker, Jeremiah (co-PI) [⬀] Li, Kai (co-PI) [⬀] August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: a Hybrid Approach For Petascale Computing: Accelerating Scientific
Intellectual Merit: The proposed work is an exploratory research effort to automatically extract parallelism from sequential program and to schedule the resulting fine-grained computational elements on manycore processors. The goal is to allow existing sequential programs to run on many-core processors efficiently and build the foundation to enable a hybrid approach, involving message passing and shared memory, to address petascale programmability This exploratory research will attack the following issues: ? Design a compiler to decompose the code running on a single node into fine-grained computation tasks to utilize the collection of cores on a single chip. ? Develop a highly-efficient runtime system to schedule fine-grained tasks to optimize for available parallelisms and to maximize on-chip cache locality to overcome off-chip memory latency and bandwidth constrains. ? Evaluate our success with a newly released benchmark suite PARSEC which allows us to compare our success with hand-tuned parallel solutions. We also plan to evaluate one computational science application
Broader Impact: The potential impact of this project is significant. First, the success of the proposed research would advance knowledge and understanding in parallel programming to exploit the power of future parallel machines. Second, the success of the project will accelerate software developmentfor petascale computing. Third, the proposed compiler and runtime systems will provide the capability to run existing large-scale computational science programs on petascale computers without burdensome programming efforts.
|
1 |
2010 — 2014 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Medium: Collaborative Research: Scaling the Implicitly Parallel Programming Model With Lifelong Thread Extraction and Dynamic Adaptation
The microprocessor industry has moved toward multicore designs to leverage increasing transistor counts in the face of physical and micro-architectural limitations. Unfortunately, providing multiple cores does not translate into performance for most applications. Rather than pushing all the burden onto programmers, this project advocates the use of the implicitly parallel programming model to eliminate the laborious and error-prone process of explicit parallel programming. Implicit parallel programming leverages sequential languages to facilitate shorter development and debug cycles, and relies on automatic tools, both static compilers and run-time systems, to identify parallelism and customize it to the target platform. Implicit parallelism can be systematically extracted using: (1) decoupled softwarepipelining, a technique to extract the pipeline parallelism found in many sequential applications; (2) low-frequency and high-confidence speculation to overcome limitations of memory dependence analysis; (3) whole-program scope for parallelization to eliminate analysis boundaries; (4) simple extensions to the sequential programming model that give the programmer the power to refine the meaning of a program; (5) dynamic adaptation to ensure efficiency is maintained across changing environments. This project is developing the set of technologies to realize an implicitly parallel programming system with scalable, lifelong thread extraction and dynamic adaptation. At the broader level, the implicitly parallel programming approach will free programmers to consider the problems they are trying to solve, rather than forcing them to overcome the processor industry's failure to continue to scale performance. This approach will keep computers accessible, helping computing to have the same increasingly positive impact on other fields.
|
1 |
2010 — 2016 |
August, David Walker, David (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Accelerating the Pace of Research Through Implicitly Parallel Programming
Today, two trends conspire to slow down the pace of science, engineering, and academic research progress in general. First, researchers increasingly rely on computation to process ever larger data sets and to perform ever more computationally-intensive simulations. Second, individual processor speeds are no longer increasing with every computer chip generation as they once were. To compensate, processor manufacturers have moved to including more processors, or cores, on a chip with each generation. To obtain peak performance on these multicore chips, software must be implemented so that it can execute in parallel and thereby use the additional processor cores. Unfortunately, writing efficient, explicitly parallel software programs using today's software-development tools takes advanced training in computer science, and even with such training, the task remains extremely difficult, error-prone, and time consuming. This project will create a new high-level programming platform, called Implicit Parallel Programming (IPP), designed to bring the performance promises of modern multicore machines to scientists and engineers without the costs associated with having to teach these users how to write explicitly parallel programs. In the short term, this research will provide direct and immediate benefit to researchers in several areas of science as the PIs will pair computer science graduate students with non-computer science graduate students to study, analyze, and develop high-value scientific applications. In the long term, this research has the potential to fundamentally change the way scientists obtain performance from parallel machines, improve their productivity, and accelerate the overall pace of science. This work will also have major educational impact by developing courseware and tutorial materials, useable by all scientists and engineers, on the topics of explicit and implicit parallel computing.
IPP will operate by allowing users to write ordinary sequential programs and then to augment them with logical specifications that expand (or abstract) the set of sequential program behaviors. This capacity for abstraction will provide parallelizing compilers with the flexibility to more aggressively optimize programs than would otherwise be possible. In fact, it will enable effective parallelization techniques where they were impossible before. The language design and compiler implementation will be accompanied by formal semantic analysis that will be used to judge the correctness of compiler transformations, provide a foundation for about reasoning programs, and guide the creation of static analysis and program defect detection algorithms. Moreover since existing programs and languages can be viewed as (degenerately) implicitly parallel, decades of investment in human expertise, languages, compilers, methods, tools, and applications is preserved. In particular, it will be possible to upgrade old legacy programs or libraries from slow sequential versions without overhauling the entire system architecture, but merely by adding a few auxiliary specifications. Compiler technology will help guide scientists and engineers through this process, further simplifying the task. Conceptually, IPP restores an important layer of abstraction, freeing programmers to write high-level code, designed to be easy to understand, rather than low-level code, architected according to the specific demands of a particular parallel machine.
|
1 |
2012 — 2015 |
Tromp, Jeroen (co-PI) [⬀] Stone, James (co-PI) [⬀] Stone, James (co-PI) [⬀] August, David Couzin, Iain (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ii-New: a Platform For Data-Parallel Gpu Computing At Princeton
This is an Institutional Infrastructure proposal to build a GPU cluster to support research in data-parallel code development and optimization, as well as research applications, in three scientific domains, namely, seismology, biology and astrophysics. These goals build on a close collaboration with an expert team in GPU computing from computer science. The proposed cluster will serve not only as an invaluable resource for computation, but will also aid cross-fostering of techniques and concepts between disciplines and will be used to stimulate collaboration and synergistic research activity in a wide range of areas.
Even though domain scientists are increasingly dependent on computation to achieve their research goals, most are not experts in parallel programming or GPU architectures. The difficulty of parallel programming for GPU clusters is an impediment to scientific progress. In order to relieve scientists of the burdens of parallel programming, computer scientists at Princeton have developed systems for automatically parallelizing programs for GPU. Building on this success, the PIs plan to extend these techniques to GPU clusters and work closely with the seismologists, biologists and astrophysicists to accelerate the pace of science.
|
1 |
2014 — 2017 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Xps: Expl: Cca: a Framework For Portable Parallel Performance
Unable to increase computer processor peformance growth at the historic rate, the microprocessor industry has shifted to bundling more processor cores in each computer. While additional processor cores puts more processing power in each computer, the burden of making use of this power shifts to the programmer. To make matters worse, the cores in each computer are not all the same and each computer model may have a different number and mix of cores. Software optimized for one computer may perform poorly for another, and having programmers optimize for each computer model is not practical. This project intends to overcome these problems by changing the way programmers write code for these systems and having the computer optimize the software for its specific configuration. The project will study ways to convey the richer set of information necessary to help computers best optimize the software running upon it. This project has the potential to fundamentally alter how programmers develop software for modern architectures, relieving them of the arduous task of optimizing their code for different systems. Users of any software---from computational scientists to home desktop users---would experience an increase in performance and faster deployment of new applications.
The proposed software interface must represent code in a way that is automatically analyzable and highly amenable to automatic code transformations at runtime. The project will explore many designs, including explicitly encoding register dependences in Single Static Assignment (SSA) form, statically determining and encoding memory dependency information into instructions, and using an expressive code layout scheme to make runtime analysis and optimization more efficient. In concert with this design space exploration, various runtime optimizations will be developed which will customize the software execution for the specific computer while also optimizing for dynamically changing user desires such as execution speed and energy consumption. The synergistic design of the software interface and runtime optimizations will allow determination of the optimal set of information to include in the interface. This project also aims to create tools for developers (compiler, assembler, and architectural simulator) to enable others to use and evaluate it.
The PI will keep building on his past successes in fostering diversity and educational outreach efforts. He will host several undergraduate student researchers each year, teaming them with graduate student mentors. He will also continue to incorporate current research results into undergraduate and graduate courses.
|
1 |
2014 — 2017 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Satc: An Architecture For Restoring Trust in Our Personal Computing Systems
Computers today are so complex and opaque that a user cannot possibly hope to know, let alone trust, everything occurring within the machine. While software security techniques help ensure the integrity of user computations, they are only as trustworthy as the underlying hardware. Even though many proposals provide some relief to the problem of hardware trust, the user must ultimately rely on the assurances of other parties. This work restores hardware trust through a simple, small, and slow pluggable hardware element. This project investigates techniques that provides a kernel of trust that keeps even the most aggressive systems in line without slowing them down and is easy to manufacture.
For this slow but trusted hardware element to be useful in real world systems, it must not degrade system performance significantly. To achieve this goal, this work develops two complimentary techniques: dependence-free parallel verification of executed instructions, and cryptographic hash-based memory integrity assurance. Additionally, cryptographic hashing also ensures code integrity and prevents the processor from executing its own malicious code. A combination of these techniques provides a secure hardware environment where users need not worry about their data being compromises, as long as their software is also secure. Therefore, when combined with well-developed software security techniques, this work provides a significant increase in the level of trust users place in their computing systems.
|
1 |
2021 — 2025 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Shf: Medium: Collaborative Automatic Parallelization
In the context of the end of Moore's law, the greatest value of multicore is ultimately in its potential to accelerate sequential codes. This potential can only be realized with the reliable extraction of sufficient multicore-appropriate thread-level parallelism (MATLP) from programs. Yet, despite many new tools, languages, and libraries designed for multicore, difficulties in MATLP extraction keep multicore grossly underutilized. The energy and performance impact of this is nearly universal. To address this problem, this project's novelties are in (i) redefining traditional abstractions used within compilers to enable constructive and tight collaborations that aim to coordinate the multiple code analyses and transformations required for MATLP extraction, (ii) producing RAPPORT, the first publicly available compiler with full collaboration support, a necessary element for robust automatic parallelization. This project's impact is in making computing faster and more efficient with reliable MATLP extraction.
In conventional compilers, optimizations perform well greedily and independently, enabling easy compiler modularity without much performance impact. However, in MATLP extraction, key parallelization techniques may succeed only if other transformations clear the path, sometimes by de-optimizing the code. Over the last decade, researchers have made steady progress toward the goal of robust and routine automatic MATLP with new MATLP parallelization patterns, stronger memory analyses, and more efficient speculation techniques. This team believes these MATLP technologies are sufficient but lack the coordination necessary to realize their full potential. This work produces the technology necessary for reliable MATLP extraction by redefining compiler abstractions to enable transformations and analyses to work together actively without loss of modularity. This new technology enables a globally beneficial behavior by centralizing, in a modular way, the decentralized and greedy decision-making found in conventional compilers. In this way, it makes the reliable and robust extraction of MATLP possible.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2021 — 2022 |
August, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Pposs: Planning: a Disciplined Approach to Scaling in the Post-Moore’S Law Era
General-purpose processor speeds have not increased at their historical rates for over 15 years. Designers have instead offered scalable computing systems with additional barriers to realizing the performance gains that once came "for free" with each processor generation. This situation has slowed the progress of all endeavors that involve scalable computing. A disciplined approach to reversing this trend must start and end with a direct engagement with the scalable-system users and programmers faced with these barriers. With this goal, the investigators are performing in-depth face-to-face interviews with a broad set of users. In addition, with these users, the project team is also examining codes, exchanging ideas, and offering assistance. This is producing a deep understanding of how users are coping with their growing demands for computing while computing is placing more demands upon them. The project’s novelties are this in-depth study and the resulting formulation of an approach to address the limitations of scalable computing based on real users' needs. The project’s impacts are the dissemination of the survey results and a recommended approach forward that restores meaningful layers of abstraction to scalable systems, freeing programmers from being drawn deeper into the complexity of scalable computing while delivering higher performance to them.
The investigators performed a similar study in 2011. With this planning grant: (1) They are conducting a more ambitious study with a greater diversity of subjects. By re-engaging as many 2011 subjects as possible, this becomes a longitudinal study capable of revealing trends not visible in any single point-in-time study. (2) The investigators are using these interactions to explore transitioning their foundational work to practice, to build a larger team, and to expand the scope of future work. The results of the 2011 study inspired the investigators to produce breakthroughs in speculation, dependence handling, latency tolerance, and automatic parallelization. The 2021 study serves as a vehicle to explore ways to transition these results to practice. (3) The investigators believe that hardware can be more domain-adept without being domain-specific. Using prior insights, they are exploring hardware/software concepts that deliver top performance levels without undue programmer burden. By testing these ideas in the context of the study, the investigators can best frame the problem, refine approaches, and test hypotheses in the context of actual needs, opportunities, and constraints. All of these activities ensure that future work in scalable systems will have greater impact.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |