1985 — 1988 |
Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Percolation Scheduling: a Hierarchical Parallel Compilation Technique |
0.957 |
1987 — 1991 |
Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Support Environment For Parallel-Program Development
This research describes a support environment for parallelism exploitation in ordinary (scientific) programs. The environment will be designed for mapping programs with real-time constraints and/or massive compilation requirements onto parallel computers. It is envisioned as part of a scientist's workstation, to serve as a front end for the NSF Supercomputing Center at Cornell. Within this system, the user may control the parallelization process while the system deals with the burdensome details of architecture, correctness-preservation and synchronization. Through a graphical interface the user suggests what should be done in parallel, while the system performs the actual changes using semantic-preserving transformations. If a request cannot be satisfied, the system reports the problem causing the failure. The user may then eliminate the problem by supplying guidance of information not explicit in the code.
|
1 |
1997 — 2001 |
Dutt, Nikil [⬀] Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Design Exploration For Memory-Intensive Embedded Systems @ University of California-Irvine
This research is on techniques to support early system-level exploration of memory-intensive behaviors for multimedia applications such as video and image processing. In addition to memory, system performance, power consumption and total cost are constraints being considered. Techniques and tools being developed under this approach are for: (1) estimation of memory requirements from the system's specification under resource, performance and power constraints, enabling a tradeoff of computation time against estimation accuracy by system designers; (2) optimization of the embedded system's specification under area, performance and power constraints using a combination of coarse-grain and fine-grain transformations; and (3) partitioning, organization, and mapping of memory structures, including arrays, records and pointers, for implementation of the system specification into hardware and software. These techniques and tools are being integrated into an exploration environment which permits system designers to evaluate feasible hardware/software implementations of memory-intensive embedded applications.
|
1 |
2002 — 2006 |
Dutt, Nikil (co-PI) [⬀] Nicolau, Alexandru Gupta, Rajesh Schmidt, Douglas Shukla, Sandeep |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: An Application Development Environment For Complex Heterogeneous Distributed Real-Time Embedded Computing Platforms @ University of California-Irvine
EIA-0204028 Alexandru Nicolau University of California-Irvine An Application Development Environment for Complex Heterogeneous Distributed Real-Time Embedded Computing Platforms
This proposal, describes a number of coordinated strategies that will develop to ensure robust and automatic composition of ( DRE) distributed real-time and embedded systems, based on advances in architectural and resource descriptions that enable concurrent exploration. The DRE applications to be explored in this work are representative of a range of real world distributed computing scenarios form Grids to microelectronic systems-on-chip (SOCs). SOC-based computing resources incorporating diverse sensing and substantial processing capabilities are useful in many DRE application domains, such as avionics, biomedical computing resources and tele-medicine, remote sensing, space exploration, and command and control systems. Examples of domains that could benefit from these advances include: automated transportation systems distance learning, tele-medicine, analysis for combat situations, video conferencing, virtual reality simulation, and weather forecasting and analyses.
The core of our proposal focuses on coordinated compile-time and runtime strategies that enable simultaneous and integrated optimization of (1) DRE application and middleware software and (2) the underlying hardware platform consisting of distributed high-performance processing elements and customized memory systems.
|
1 |
2003 — 2008 |
Veidenbaum, Alexander [⬀] Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Framework For Speeding Up Mobile Code Execution in Embedded Systems Using Superoperators and Annotations @ University of California-Irvine
Embedded platforms are increasingly used to connect to the Web and are executing mobile code. These platforms are a resource-constrained environment in which interpreted execution of mobile codes is the norm because dynamic compilation is not feasible. At the same time, the performance of the executed code is of critical importance and is often a limiting factor in both the capabilities of the system and user perception.
The goals of the research proposed here are 1) to significantly improve interpreter performance for mobile code on embedded platforms without increasing resource requirements and 2) to design a resource constrained dynamic compilation system to be used with an interpreter for adaptive optimization to further improve the performance.
The goals will be achieved by using extensive compile-time analysis and by passing the results of the analysis to the interpreter running on a client system via code annotations. Annotations will identify super-operators, groups of instructions that can be executed as a unit and optimized together. This will allow a more efficient interpretation by minimizing communication overhead and dispatch costs. Annotations will also permit adaptive dynamic optimization requiring fewer resources and little or no overhead.
|
1 |
2008 — 2012 |
Veidenbaum, Alexander [⬀] Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cpa-Cpl: Cache-Aware Synchronization and Scheduling of Data-Parallel Programs For Multi-Core Processors @ University of California-Irvine
Project Abstract
Multi-core (parallel) processors are becoming ubiquitous. The use of such systems is key to science, engineering, finance, and other major areas of the economy. However, increased applications performance on such systems can only be achieved with advances in mapping such applications to multi-core machines. This task is made more difficult by the presence of complex memory organizations which is perhaps the key bottleneck to efficient execution, and which was not previously addressed effectively. This research involves making the mapping of the program to the machine aware of the complexities of the memory-hierarchy in all phases of the compilation process. This will ensure a good fit between the application code and the actual machine and thereby guarantee much more effective utilization of the hardware (and thus efficient/fast execution) than was previously possible.
Modern processors (multi-cores) employ increasingly complex memory hierarchies. Management of such hierarchies is becoming critical to the overall success of the compilation process since effective utilization of the memory hierarchy dominates overall performance. This research develops a new cache-hierarchy-aware compilation and runtime system (i.e., including compilation, scheduling, and static/dynamic processor mapping of parallel programs). These tasks have one thing in common: they all need accurate estimates of data element (iteration, task) computation and memory access times which are currently beyond the (cache-oblivious) state-of-the-art. This research thus develops new techniques for iteration space partitioning, scheduling, and synchronization which capture the variability due to cache, memory, and conditional statement behavior and their interaction. This research will have a broad impact on the computer industry as it will allow the ubiquitous multi-core systems of the future to be efficiently exploited by parallel programs.
|
1 |
2012 — 2013 |
Veidenbaum, Alexander (co-PI) [⬀] Banerjee, Utpal (co-PI) [⬀] Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Identifying and Removing Barriers to Autovectorization @ University of California-Irvine
Most modern microprocessors support some form of vector operations that allow the same operation to be applied to small vectors of arguments simultaneously. Studies have shown that use of these instructions can improve the performance of many scientific codes by a factor of 2 or more. Unfortunately, the state of the art in autovectorization falls far short of this goal, only achieving improvements of 20-30% on the same codes.
While studies have shown that current autovectorizing compilers do not identify all of the opportunities for vectorization, little is known about why they fail to do so. The PIs plan to evaluate tradeoffs between different compiler optimizations and vectorization in an effort to understand how optimization choices affect opportunities for autovectorization. They will use an extensive set of benchmarks to evaluate these tradeoffs. This research will make it possible to develop better autovectorizing compilers by avoiding optimization choices that interfere with autovectorization. The performance benefits of such compilers will improve the performance of applications ranging from multimedia software to scientific computing.
|
1 |
2015 — 2019 |
Veidenbaum, Alexander [⬀] Nicolau, Alexandru |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Xps: Full: Fp: Collaborative Research: Advancing Autovectorization @ University of California-Irvine
Title: XPS:FULL:FP:Collaborative Research:Advancing autovectorization
The goal of this project is to advance the state of the art in autovectorization. This is a technique applied by compilers to automatically transform computer programs so that they can take advantage of the vector devices found in most processors. Today, most compilers have autovectorization capabilities, but their effectiveness is limited. The intellectual merit of this project lies in its potential to advance an important and beautiful core area of computer science, compiler technology, by creating new techniques and extending our understanding of programming patterns, program analysis, and transformation techniques. Beyond computer science, the project's broader significance and importance is that its results aim at increasing the fraction of code segments that, without human intervention, make use of vector devices. The effect of this increase is the acceleration of computer programs and the reduction of the energy that they consume. Faster programs are of great importance in all application areas, but are particularly important in science and engineering where computing speed is an enabler of discoveries and better designs.
The research strategy is to develop and evaluate a prototype autovectorizer based on the exploration of the space of equivalent versions of a program guided by an intelligent search engine. The space of equivalent versions is obtained with a source-to-source restructurer. A repository of codelets is planned in order to train the search engine so that it becomes capable of guiding the selection in the space of possibilities in order to identify a highly efficient version of the code.
|
1 |