1985 — 1988 |
Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Research Initiation: Designing and Programming High-Speed Multiprocessor Systems @ University of Southern California |
0.913 |
1986 — 1987 |
Gaudiot, Jean-Luc (co-PI) [⬀] Hwang, Kai (co-PI) [⬀] Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Engineering Research Equipment Grant: Computing Facilities For Experimentation With Multiprocessor Systems @ University of Southern California |
0.913 |
1987 — 1990 |
Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Implementations and Evaluations of Non-Numerical Algorithms For Mimd Multiprocessors @ University of Southern California
Multiprocessor supercomputers presently available on the market are oriented towards high-speed numerical computation. However, there is more tendency to use computers for computations that are symbolic in nature. Nonnumerical algorithms for multiprocessors are more difficult to design and evaluate than numerical algorithms. This research is aimed at developing a new and integrated methodology for the design and performance prediction of nonnumerical parallel and distributed algorithms for multiprocessors. Because of the unpredictable behavior of such algorithms, and the dependency of algorithm performance on data values, simulation is a fundamental ingredient of the methodology. The facility developed for simulating the execution of multiprocessor algorithms on uniprocessors will be expanded, improved and tuned to the purpose of this research. This simulator is trace-driven. Simulation results will be used to validate existing or new analytical or hybrid modeling techniques.
|
0.913 |
1992 — 1994 |
Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
An Evaluation of Delayed Consistency Protocols @ University of Southern California
This research involves the experimental analysis of delayed consistency in large-scale cache-based multiprocessors including its effects on false sharing and latency tolerance. False sharing refers to the read/write sharing of cache blocks in the absence of data sharing in a parallel computation. Latency tolerance refers to the reduction of the average latency of shared-memory accesses by overlapping them with computation. Delayed consistency takes advantage of weak ordering in cache-based systems. The sending of cache invalidation can be delayed, in which case they can be overlapped with processor accesses to the cache. Additionally, both the sending and receiving of invalidations can be delayed, in which case false sharing effects are reduced by increasing the time during which cached blocks remain accessible by each local processor. The quantitative effects of delaying consistency and its variants will be evaluated through execution-driven simulation of parallel benchmark programs, including nine parallel numerical as well as non-numerical algorithms and thirteen Fortran programs contained in the Perfect Club Benchmark suite, parallelized using an Alliant 2800 compiler. Emphasis will be on the effects of block size, cache size, and granularity of parallelism. Statistics will be collected not only on miss rates, but also on memory traffic, on memory access latencies and on total execution times.
|
0.913 |
1993 — 1997 |
Papavassilopoulos, George (co-PI) [⬀] Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Asynchronous Algorithms: Scalable Algorithms For Multiprocessors @ University of Southern California
This project is concerned with the analysis, implementation and simulation of asynchronous algorithms. Asynchronous algorithms do not require synchronization and thus are particularly suitable for large-scale multiprocessors or massively parallel systems. Moreover, asynchronous algorithms are tolerant to processor and link failures and easily adapt to changes in parameters or sensor data in real-time embedded systems. The theoretical analysis of convergence considers models with stochastic delays. Asynchronous algorithms running on a multiprocessor are processes with random communication delays. With this approach complex problems such as the effects of random message delays due to variable load conditions in the interconnection network and the reliability issues related to probabilistic link or processor failures are easily analyzed. The approach is also applied to the analysis of problems in which the solution changes with time and of algorithms with good time-adaptation. Finally, the convergence of algorithms which are trapped in periodic orbits can be improved through randomization techniques. Six asynchronous algorithms with the above features are implemented on the Intel Touchstone DELTA machine accessible at Caltech. Extensive evaluation of each algorithm is performed in order to assess the effectiveness of asynchronous algorithms in the context of large-scale multiprocessor systems.
|
0.913 |
1993 — 1997 |
Dubois, Michel Danzig, Peter (co-PI) [⬀] Pedram, Massoud (co-PI) [⬀] Saavedra, Rafael |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
The U.S.C. Multiprocessor Testbed: a Testbed For Scalable Shared-Memory Systems @ University of Southern California
Dubois A testbed for experimenting with memory hierarchies in multiprocessors is being supported. A processor node in the testbed contains cache and memory system controllers made from field-programmable gate arrays. To experiment with a memory control mechanism or coherency technique, the investigators program the gate arrays to implement the mechanism. For software support of experimental techniques, the GNU-C compiler is being modified to generate appropriate code, such as non-blocking prefetches, and the Mach microkernel is being ported to provide thread scheduling.
|
0.913 |
2001 — 2005 |
Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Trace-Driven Evaluations of the Memory Behavior of Large Commercial Applications @ University of Southern California
Computer architecture research is based on experimental evaluation of application behavior. One critical issue in the architecture of high-end enterprise servers is the design of the memory hierarchy, which must be designed to support current and future data-intensive applications.
Few evaluations of the memory behavior of large commercial applications exist, especially in the public domain. The research proposed here is based on a collaboration between IBM and USC to take advantage of the IBM Watson Server Performance Laboratory to explore the memory behavior of high-end commercial applications.
More specifically, it is proposed to use the IBM MemorIES board to collect bus activity traces from a modern server machine running large, finely tuned OLTP, DSS and Web workloads. Because of the sheer size of the traces, samples of transaction records will be collected only in selected time intervals.
Besides obtaining traces, the goal of this project is to characterize the memory behavior of these applications, to evaluate alternative memory hierarchies for future high-end commercial servers, and to explore new multiprocessor architectures for commercial systems.
The experimental environment provided by the IBM Watson Server Performance Laboratory cost millions of dollars and years of effort to setup. This proposed research will leverage these efforts. It is a unique opportunity to collect and disseminate traces and experimental data which would be practically impossible to obtain under any reasonable research budget.
|
0.913 |
2006 — 2009 |
Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Sma: Collaborative Research - Stamp: a Universal Algorithmic Model For Next-Generation Multithreaded Machines and Systems @ University of Southern California
Abstract
Due to the limits of power dissipation, the trend today in chip microarchitecture is to implement a multiprocessor with simple cores (Chip MultiProcessors or CMP) in which each core may run multiple threads concurrently (Chip MultiThreading or CMT). Future performance improvements will mostly come from supporting more and more threads in every microprocessor generation. In this context, there is renewed interest in designing highly scalable multithreaded algorithms. The goal of this project is to develop, evaluate and expand a generic model for parallel algorithms called STAMP (Synchronous, Transactional and Asynchronous MultiProcessing model). The STAMP model includes performance and power and is a compromise between simplicity, generality and good predictive value. The work in this project proceeds in two directions: design and exploration of parallel algorithm models, and application of the model to specific algorithms and applications. A simulation infrastructure for CMPs based on Simics called SimWattchMP is used for measuring performance and power and simulating system activity. If the model gains widespread acceptance, it will become a major abstracted platform on which designers and programmers of parallel algorithms can reliably design, program and evaluate their algorithms without the detailed knowledge of the machine and system. This will be critical to the evolution of hardware and software systems in the next 15 years. Research results will be widely disseminated by publications and will also be made available to the community at large through a project web page.
|
0.913 |
2008 — 2011 |
Dubois, Michel Annavaram, Murali |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr-Psce,Sm: Trade-Offs Between Static Power, Performance and Reliability in Future Chip Multiprocessors @ University of Southern California
As the transistor growth outpaced the design and verification effort in chip design, a large fraction of on-chip transistors are now allocated to storage structures, such as caches. The static power consumed by these storage structures worsen the critical problems of power and thermal issues faced by current chip designs. To reduce the static power, drowsy techniques are used, where inactive components of a storage structure can be placed in a low power state. Unfortunately, drowsy power states increase the susceptibility of transistors to transient errors. Motivated by these problems, this research explores the tradeoffs between static power, performance and reliability in chip multiprocessors. The fundamental contribution of this research is to develop a novel hybrid analytical/simulation framework that allows designers to evaluate the impact of reducing static power on processor reliability and performance. Using this framework this research explores new cache management and cache protocols in chip-multiprocessors and the impact of these new schemes on reliability and performance of a computer system. The framework can also be extended to analyze the power, performance and reliability tradeoffs of other storage structures inside each core such as the reorder buffer, the branch prediction tables, and various instruction and scheduling queues.
|
0.913 |
2012 — 2016 |
Annavaram, Murali Dubois, Michel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf:Small: Benchmarking of Transient and Intermittent Errors and Their Application to Microarchitecture @ University of Southern California
Computing infrastructure has been a driving force for our socio-economic progress in the past several decades. From drug discovery to space exploration, every scientific and engineering domain relies on computer systems to accurately analyze complex datasets. Historically, computational accuracy has been taken for granted in all these disciplines, but this notion is changing. While rapidly shrinking transistor dimensions lead to exponential power and performance benefits, the trend is also creating several unwanted side effects in computer system reliability. There are two types of errors that will become prevalent in the near future: (1) multi-bit soft errors where alpha particles and neutrons cause multiple bits to flip at the same time, and (2) intermittent errors that occur due to stress accumulation over the lifetime of a computer. Thus it is critical to benchmark the impact of these errors on the lifetime of a computer chip. Only when the impact is accurately measured is it possible to judiciously deploy solutions to improve reliability. Since any protection scheme comes with a cost, it is necessary to understand when a particular protection scheme being considered, such as parity or single-error-correcting double-error-detecting code, is too much or too little.
This project presents two solutions for benchmarking multi-bit soft errors and intermittent errors. This project will develop a unified methodology to benchmark the impacts of single-bit and multi-bit soft errors on caches protected with an arbitrary protection scheme, such as an inter-leaved, block-level or word-level error correcting code. Such a benchmarking framework will significantly enhance a computer designer's ability to objectively evaluate the performance, power, and reliability tradeoffs of various protection schemes proposed for protecting caches.
This research also develops a methodology to benchmark the vulnerability of an instruction set architecture (ISA) to intermittent errors. Each instruction in an ISA specification is enhanced to quantify the amount of stress that it is expected to cause on the underlying microarchitecture of a chip. The stress level information from the ISA is combined with operating conditions of the chip to continuously monitor intermittent error probability during application execution. Any unwanted degradation in chip reliability is then tackled by software exception handlers, which trigger redundant execution of vulnerable code.
Broader societal impact will result from these research solutions. Benchmarking is essential to objectively evaluate the cost-benefit tradeoffs of various solutions currently being proposed to tackle reliability concerns. Without benchmarking, building a system to meet reliability specifications is a guessing game. By providing the right set of tools to initiate just-in-time error correction and recovery mechanisms, a computer designer can significantly lower the cost of providing reliable computations.
|
0.913 |