1992 — 1998 |
Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Young Investigator: a Functional Data-Parallel Language For High Performance Computers @ Carnegie-Mellon University
Current research on a first-order functional language (NESL) that supplies parallelism through a set of data-parallel constructs based on vectors is extended in several directions. One is the testing of the language via scientific applications algorithms. Another is the addition of polymorphic types to the language. Yet another is the examination of lazy load-balancing based upon dynamic testing. In addition, implementations on several newer high performance machines is carried out.
|
0.915 |
1997 — 2002 |
Lee, Peter (co-PI) [⬀] Miller, Gary (co-PI) [⬀] Blelloch, Guy Harper, Robert (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Advanced Languages For Scientific Computation Environments @ Carnegie-Mellon University
The goal of the project is to demonstrate the feasibility of a broad-based language for scientific and engineering computation. Such a language should serve equally well for developing production scientific code (which is now typically written in FORTRAN or C) and as the scripting language for interactive scientific computing environments (such as Mathematica or Matlab). This breadth would greatly simplify the development of code by allowing a clean transition from prototyping and small- scale experiments to full production code, and would better support the evolution of production code. As a broad-based language it should support both numerical and symbolic computation, should include constructs r parallel programming, and should include the high-level language features found in systems such as Mathematica (e.g. pattern matching, higher-order functions, and automatic memory management). The project involves demonstrating that it is possible to support these features while maintaining good code efficiency. As such, the project includes work on compiler techniques, on language design, on the development of scientific codes, and experimentation on the codes using the compiler and language. The project is using ML extended with the parallel constructs from NESL as a testbed.
|
0.915 |
2000 — 2003 |
Lafferty, John (co-PI) [⬀] Blum, Manuel (co-PI) [⬀] Blelloch, Guy Sleator, Daniel (co-PI) [⬀] Blum, Avrim (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Algorithms: From Theory to Application @ Carnegie-Mellon University
With the explosion in connectivity of computers and in the size of data sets available for analysis, mathematically sophisticated algorithms are becoming increasingly crucial for a wide range of real-world applications. Unfortunately, it often takes many years for an algorithm to make it from theory into applications. In fact, the trend has been for different areas to develop their own algorithms independently, with the result that similar techniques reinvented many times in different contexts, and radically new approaches that require an algorithmic level of abstraction take a long time to make it into applications. The intellectual core of this proposal is to create a coordinated effort in "Algorithms from Theory to Practice" that connects the basic development of fundamental algorithms and data structures to their many disparate uses. This work will address critical needs by connecting relevant algorithms to application areas, by exposing and tackling important issues that are common to multiple applications, and by developing fundamentally new approaches to solving key problems via the connections made. This proposal aims to provide impact at a number of different levels. At the lowest level are specific research projects that target key application domains. These include algorithms for mesh generation with applications to scientific simulations and graphics, algorithms for indexing and searching needed for a number of data analysis tasks, and protocols that connect machine learning with cryptography to produce a fundamentally new way for people to securely authenticate to their computers. At a higher level, this proposal will create a center to which researchers in application areas can come to build connections and integrate algorithmic techniques and principles into their own projects. At the highest level, this proposal will create tools to improve the process of moving algorithms from theory to applications more broadly. As one example, the course "Algorithms in the Real World" run by PI Blelloch has already developed a set of web pages detailing how algorithms are used in various applications and what turn out to be the crucial issues involved. A new, extensible version of this database would provide support for theoreticians, practitioners, and educators. We hope the end result to be both a faster pipeline from algorithm design to application, and improved sharing of algorithm techniques across application areas. In addition, we expect the students supported by this effort to fulfill the highest-level goals of this project becoming the next generation vertically-integrated algorithm researchers. The PIs each have a strong track-record in algorithms, both theoretical and applied. Guy Blelloch is developer of the NESL parallel programming language, as well as fast parallel algorithms for a number of core problems. Arvin Blum is known for his work in machine learning and approximation algorithms, and is developer of the Graphphan planning algorithm, used as the basis of many AI planning systems. Manuel Blum is winner of the ACM Turning Award for his work in the foundation of computational complexity theory and its applications to cryptography and program checking. John lafferty is known for his work in language modeling and information retrieval, and is co-developer (along with PI Sleator)of the Link Grammar natural-language parser. Daniel Sleator is winner of this year's ACM Kanellakis "Theory and Practice" award for the development of the Splay Tree data structure, and more recently been developing algorithms for natural language applications.
|
0.915 |
2000 — 2005 |
Walkington, Noel (co-PI) [⬀] Miller, Gary Blelloch, Guy Ghattas, Omar [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Simulation of Flows With Dynamic Interfaces On Multi-Teraflop Computers @ Carnegie-Mellon University
This Project will develop advanced parallel algorithms and software for simulating complex flows with dynamic interfaces. The development of scalable, parallel high-accuracy algorithms for simulating such flows poses enormous challenges in computational science. The project will use these algorithms for microstructural simulation of blood flow. This application provides an excellent testbed for the methods: it is extremely computationally challenging and of critical medical importance.
Blood flow belongs to a class of flow problems with dynamic interfaces. Blood is a mixture of interacting gel-filled solid sells and fluid plasma. Current blood flow models are macroscopic, treating the mixture as a homogeneous continuum. Microstructural models resolve individual cells deformations and their interaction with the surrounding plasma. Because of the computational difficulties of resolving tens of thousands of dynamically deforming cells, no one to date has simulated realistic blood flows at this level. Yet such simulations are necessary in order to gain a better understanding of blood damage - which is central to improved artificial organ design - and for the development of more rational macroscopic blood models.
Simulating flows with dynamic interfaces is much more difficult than flows in well-understood fixed domains. The central challenges are to develop numerical algorithms that stably and accurately couple the moving fluid and solids, and geometric algorithms for computing the resulting dynamic meshes. This project takes the approach of treating both fluid and solid domains as collections of grid points, with associated meshes, that evolve over time and devising numerical algorithms that couple the domains seamlessly. It will attack the difficulty of creating and managing the evolving mesh by developing scalable parallel algorithms for the convex hull, Delaunay triangulation, and mesh partitioning components. With careful attention to fundamental algorithmic issues, these cheap geometric computations will enable these dynamic flow simulations to scale to thousands of processors as on mult-teraflop systems.
This research will benefit a wide community of scientists and engineers. The computational algorithms will be widely applicable to a variety of fluid-solid and fluid-fluid interaction problems. More generally, the core parallel computational geometry kernels will provide generic support for the geometric computations underlying many dynamic irregular problems. The project will distribute a portable library of efficient implementations of these algorithms. Also, the project will undertake a broad-based, interdisciplinary program integrating research and education. It will be part of a new program in Computational Science and Engineering, serving as the archetype of how applications, computational, computer, and mathematical scientists can work together to tackle societal problems that cannot be solved solely by any one discipline.
|
0.915 |
2001 — 2007 |
Lafferty, John (co-PI) [⬀] Blum, Lenore (co-PI) [⬀] Blelloch, Guy Sleator, Daniel (co-PI) [⬀] Ravi, Ramamoorthi (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Sy+Im+Ap: Center For Applied Algorithms @ Carnegie-Mellon University
Algorithms are the basic procedures by which computers solve problems. With the explosion in the use and connectivity of computers, and in the sizes of the data sets being used, the performance of algorithms is becoming increasingly important. Being able to solve a problem ten times faster, for example, could mean designing a drug next year instead of several years later, or reducing the cost of developing a new space structure by allowing faster and more extensive computer simulations. Over the past 30 years there have been significant advances in the basic theory of algorithms. These advances have led to a "core knowledge" concerning algorithms and algorithmic techniques that has now been applied across an amazing diversity of fields and applications---surely more broadly than calculus is now applied.
The problem, however, is that there is a large gap between ongoing theoretical research, and the current use of algorithms in applications. It often takes more than ten years for the core ideas in a new algorithm to make it into an application, and ongoing theoretical research often does not properly address the needs of the applications. The purpose of the Center is to bridge this gap so that efficient and effective algorithms can be deployed more rapidly. This will be achieved through (1) a set of Problem Oriented Explorations (PROBEs), (2) developing an extensive set of web resources on algorithms, and (3) educational activities including holding workshops for educating teachers. The PROBEs will bring together algorithm designers and domain experts to rapidly deploy new algorithmic ideas within a specific domain.
|
0.915 |
2006 — 2011 |
Blelloch, Guy Ravi, Ramamoorthi (co-PI) [⬀] Schwartz, Russell [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Generalizing Haplotype Models For Phylogenetics @ Carnegie-Mellon University
This project will develop a combination of novel computational methods to improve inferences of human genomic history from data on genetic variability between individuals. Large amounts of data have recently been gathered on the small genetic differences that distinguish one person from another. Such information is valuable for inferring how the genome has changed over the course of human history and for understanding the specific biological processes that have shaped it over time. These basic scientific results in turn have relevance to many practical problems in identifying the functions of genes, facilitating future genetic studies, and finding and mitigating risk factors for disease. Making optimal use of this information will, however, require new advances in mathematical models of how genetic variations are produced and in the computer algorithms needed to use those models to analyze real sequence data. The proposed work is based on a strategy to overcome limitations of the data and current computational capabilities by combining theoretical tools for haplotype structure identification, which finds segments of DNA that have been largely conserved over long periods of human history, with tools for phylogenetics, which infers how different individuals or populations are related to one another. Executing this strategy will involve a multidisciplinary effort, bringing together techniques from population genetics, computer science, mathematics, and operations research. In addition to its direct scientific outputs, the project will develop human resources by creating new course material and student research projects to help prepare future scientists to work at the intersection of these fields.
|
0.915 |
2010 — 2014 |
Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Af: Small: Locality With Dynamic Parallelism @ Carnegie-Mellon University
With the recent dominance of computers with many parallel cores, and the widespread use of large data centers, the need to supply high-level, simple and general approaches to developing parallel codes for these machines has become critical. There are many challenges to effectively developing such parallel codes, but certainly a principle difficulty is dealing with communication costs - or when looked at from the other side, taking advantage of locality. Unfortunately this challenge has only become more difficult on modern parallel machines that have many forms of locality - network latency and bandwidth, shared and distributed caches, partitioned memories, and secondary storage in a variety of organizations.
To address this problem this project is developing an approach for programmers to understand locality in their parallel code without needing to know any details of the particular parallel machine they are using. In the approach programmers express the full dynamic parallelism of their algorithm without describing how it is mapped onto processors, and are given a simple high-level model for analyzing locality. The research is based on the conjecture that locality should be viewed by the programmer as a property of the algorithm or code and not the machine. To ensure that real machines can take proper advantage of the locality analyzed in the model, the research is developing scheduling approaches that map the "algorithm locality" onto various forms of machine locality, including shared caches, distributed caches, trees of caches, and distributed memory machines. The results of the research include both theoretical bounds for such schedulers on specific machine organizations, and experimental validation on a set of benchmark applications.
|
0.915 |
2012 — 2013 |
Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Workshop On Research Directions in the Principles of Parallel Computing @ Carnegie-Mellon University
A one day workshop on "Research Directions in the Principles of Parallel Computing" is scheduled on June 28, 2012 at Carnegie Mellon University (CMU) in Pittsburgh, PA. The workshop will bring together researchers from academia and industry to inform, discuss, and initiate collaborations on understanding the key research challenges in the foundations of parallel computing, and directions to possibly address these challenges. The emphasis of the workshop is on theoretical approaches to parallel computation and how to bridge these approaches with practice. It will be held the day after the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), which is to be held June 25-27, 2012 also at CMU.
The workshop will address several issues described in NSF's Advanced Computing Infrastructure Strategic Plan including, but not limited to, computational models, programming languages, rethinking the canonical computing "stack" and new algorithmic paradigms. The workshop will consist of approximately 20 invited speakers who will be asked to present their thoughts on "what are three big research challenges in the principles of parallel computing". Attendance to the workshop is open to all students, faculty and industry researchers at no charge (as space permits). The output of the workshop will be a public report documenting the discussions. It is hoped this report and other interactions at the workshop will have broad impact on future directions of research in the theory and practice of parallel computation.
|
0.915 |
2013 — 2017 |
Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Af: Large: Collaborative Research: Parallelism Without Concurrency @ Carnegie-Mellon University
The widespread deployment of parallel machines --- from multicores to supercomputers --- has made it critical to develop simple approaches to programming them. Significant progress has been made in simplifying parallel programming by developing programming models to support parallelism without concurrency, that is, without the nondeterminacies in the logic of programs caused by the relative and nondeterministic timing of communicating processes. Yet most parallel programs in practice are concurrent, and hence, nondeterministic, leading to code that can only be programmed and understood by experts. This research project aims to understand how parallel computers can be made easier to use by the vast majority of programmers by developing software technology that enables deterministic parallel computing.
The project takes a holistic view of the problem from the key perspectives of programming linguistics, software systems, algorithmic analysis, and absolute performance. It acknowledges the reality that parallel programming cannot be fully deterministic at every level of abstraction. It is pursuing three key strategies for dealing with concurrency: encapsulating concurrency so that it is hidden by layered abstractions at appropriate abstraction levels, avoiding concurrency by restructuring programs to employ deterministic approaches, and managing concurrency when it is impractical to either encapsulate or avoid concurrency completely. Among the specific techniques being studied are commutative building blocks, deterministic nonassociative reducers, deterministic pipelined parallelism, deterministic interfaces, and generalized race detection for detecting invariant races. The project is developing open-source libraries, tools, and runtime extensions integrated into a multicore-software platform, as well as a problem-based benchmark suite to compare approaches.
|
0.915 |
2014 — 2018 |
Acar, Umut [⬀] Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Collaborative Research: Automatic Locality Management For Dynamically Scheduled Parallelism @ Carnegie-Mellon University
Title: Automatic Locality Management for Dynamically Scheduled Parallelism
Today's multicore and manycore computers provide increasing amounts of computational power in the form of parallel processing coupled with a complex memory organization with many levels of hierarchy and orders of magnitude difference in cost between accessing different levels. When software exhibits spatial and temporal locality, meaning that it reads and writes memory addresses that are close to one another in relatively small time span, it is able to primarily access data in fast caches, rather than in slow main memory, and deliver good sequential and parallel performance. Unfortunately, with software written in high-level managed programming languages it is difficult to ensure or to predict the amount of spatial and temporal locality, due to the lack of low-level programmer control and the complexities of and interactions between the specific hardware platform and the thread scheduler and the memory manager. This project explores techniques for automatic management of locality in high-level managed programming languages executing on parallel computers with sophisticated memory hierarchies. Using the theoretical models, efficient algorithms, and practical implementations being developed in the project, programmers are able to reason about the expected locality of their programs independent of the target hardware, while a runtime system, including thread scheduler and memory manager, maps the program onto specific hardware to achieve the established performance bounds.
In particular, this project addresses the problem of automatically managing locality via the runtime system of a high-level garbage-collected parallel functional programming language. A comprehensive approach that considers scheduling, memory allocation, and memory reclamation together is used, allowing the thread scheduler to influence the memory manager and vice versa. A key insight of this research program is to view the allocated data of a program as a hierarchical collection of task- and scheduler-mapped heaps. This view guides the theoretical cost model that enables a programmer to reason about locality at a high-level, the efficient algorithms that control when to create and to garbage collect a heap with provable bounds, and the practical implementation that delivers automatic locality management in a parallel functional programming language. The intellectual merits are advances in understanding the interaction of thread scheduling and memory management with locality on modern parallel hardware, the development of high-level, machine-independent cost model, and a synthesis of programming languages, algorithmic theory, and system design to address the challenges of automatic locality management. The broader impacts are improvements in software quality and programmer productivity, the creation of a parallel functional programming language usable in both education and research, and the integration of results into courses and outreach activities.
|
0.915 |
2015 — 2018 |
Gibbons, Phillip Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Xps: Full: Fp: Write-Efficient Parallel Algorithms For Emerging Memory Technologies @ Carnegie-Mellon University
Chip manufacturers in the past ten years have been enhancing computing performance by including multiple processor cores per chip. Given that all the cores have to access a shared memory, however, this access has increasingly become a bottleneck in terms of energy, latency, and bandwidth. To help deal with these and other problems, industry has been developing a variety of new memory technologies such as phase-change memory, Spin-Torque Transfer Magnetic RAM, and Memristor-based Resistive RAM. These technologies offer the promise of significantly lower energy and higher density than standard DRAM memory technology. One of the key issues, however, is that writing to memory based on the technologies is significantly more costly than reading from memory, suffering from higher latency, lower per-chip bandwidth, and higher energy costs.
The goal of this project is to develop new sequential and parallel algorithms and algorithm design techniques that are efficient in terms of the number of writes they perform, and hence make better use of these new technologies by reducing energy consumption and improving performance. This contrasts with 50 years of research on algorithms in which writes are assumed to be no more costly than reads. If successful the research will have a broad impact on future users of such technologies, which could be very many, as well as on the models and approaches for future algorithm design. The PIs also plan to develop efficient implementations of algorithms that they will make freely and openly available. The project includes an educational outreach component in which, as part of courses on databases and applied algorithms, the PIs will teach students about the new memory technologies and algorithms that can take advantage of them.
Within the scope of work the PIs will (1) develop appropriate abstract models for capturing the asymmetric costs in memories, (2) develop and analyze algorithms in the models, (3) prove lower bounds, (4) develop programming abstractions that help express such algorithms, (5) develop working applications (e.g., in graph analytics and databases) based on the algorithms developed, and (6) experimentally verify the utility of the models and abstractions in guiding the development of efficient algorithms. The intellectual challenge within this context will be in developing such models, algorithms, and programming abstractions that are simultaneously simple, elegant, and practical, while at the same time gaining insights into fundamental limits and trade-offs.
|
0.915 |
2016 — 2019 |
Acar, Umut (co-PI) [⬀] Harchol-Balter, Mor (co-PI) [⬀] Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Xps: Full: Bridging Parallel and Queueing-Theoretic Scheduling @ Carnegie-Mellon University
Scheduling computational tasks on computational resources has played a fundamental role in computer science since the 1950s. One branch of scheduling (queueing theoretic) has focused on analyzing multiple sequential jobs competing for a shared resource with the goal of minimizing the response time (latency) over all jobs. Most operating system and server schedulers use ideas developed as part of this research. Another branch of scheduling (parallel) has focused on analyzing a single parallel job running on a dedicated parallel machine, with the goal of maximizing efficiency (throughput) of the job. Most schedulers for dynamically parallel programs use ideas developed as part of this research. Until recently these branches were adequate on their own, and the communities studying them have had very little interaction. However, the mainstream availability of parallel hardware, and the need to handle many parallel jobs sharing a single resource, has recently changed this. The goal of this project is to bridge the two branches by developing new theory and practical scheduling algorithms, that can handle multiple dynamically parallel jobs competing for shared resources. The project brings together PIs with expertise from each area, and will apply and combine techniques from each area. The project has the potential to have significant broad impact on the theory and practice of widely used shared parallel systems. The project will include an educational outreach component in which the PIs will include ideas from the project in courses they teach.
This project is the first to tackle the union of these two domains. The PIs will develop scheduling algorithms for a stream of arriving jobs, where the jobs are complex multi-threaded fine-grained parallel jobs with dynamic parallelism and dependencies among tasks. The research will address three specific challenges, as follows. Challenge 1, Statistical Characterization/Modeling: Scheduling multiple jobs requires knowing something about when each job will complete and what it will do in the future, such as the number of tasks it will create or the granularity of tasks. A significant part of the project is devoted to measuring parallel jobs and statistically characterizing them to create simple models of their behavior. Challenge 2, Algorithmic Development and Analysis: There are currently no scheduling algorithms for the problem that is being considered. While queueing theory touts the optimality of always running the job with the "shortest expected remaining time," this has little meaning when jobs are parallel and there are many resources. New theorems and analytical techniques will be developed. Challenge 3, Implementation and Benchmarking: An important component of the project will be the implementation and benchmarking of our algorithms on prototype systems. The PIs will investigate multiple metrics including latency, throughput, fairness, and robustness.
|
0.915 |
2019 — 2022 |
Blelloch, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Af: Small: Shared-Memory Parallel Algorithms: Theory and Practice @ Carnegie-Mellon University
With the advent in recent years of multicore processors ranging from fifty dollar hobby kits to multi-million dollar supercomputers, shared-memory parallel algorithms have increasingly significant practical and theoretical relevance. This project is developing new algorithmic approaches and results relevant to today's shared-memory parallel machines. The impact of this project will be felt in applications being able to make better use of the computational power of modern multi-core architectures. The project seeks to develop library implementations of many of these algorithms which will be made available to the public. On the educational side, the project will result in coursework that will help undergraduate students learn about parallel algorithms and their implementation.
The project focuses on three areas. The first is research on developing results in a model, the binary forking model, that is more relevant to today's machines than some previous models. In particular the model matches the software platforms that are available on most parallel machines, and supports an asynchronous form of parallelism that are most relevant to the machines they run on. The second area is to better understand the parallelism already available in many sequential algorithms. The goal is to derive algorithms that are simpler and more efficient. The third area is to develop algorithms that allow the user to efficiently make batches of updates to underlying data structures. This is referred to as batch parallel dynamic algorithms, and follows significant prior work on sequential single update dynamic updates. In the binary forking model each task can only fork into two child tasks, but can do so recursively and asynchronously. At present no tight performance bounds for the binary forking model are known even for some basic problems such as sorting and graph connectivity, which this project seeks to remedy. For the thrust on understanding parallelism in sequential algorithms, the project will study the dependencies among sub-computations in iterative sequential algorithms. In the thrust on parallel batched algorithms the project is looking at applying the ideas to graph connectivity and related problems. The goal is to achieve algorithms that are work-efficient relative to the best (or near best) sequential algorithms---and in particular for graph connectivity to achieve O(log^2 n) amortized work per update, while allowing batches of updates.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2019 — 2023 |
Blelloch, Guy Harper, Robert (co-PI) [⬀] Acar, Umut (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Algorithmic Lambda-Calculus For the Design, Analysis, and Implementation of Parallel Algorithms @ Carnegie-Mellon University
The hardware advances of recent years have brought multicore chips and parallel computing to the mainstream. While there have been many advances in parallel software for such multicore chips over the past decade, there is still no satisfactory model for analyzing the performance of parallel algorithms. In particular, and unlike the case for sequential algorithms, there is a gap between cost models for estimating performance of algorithms and easy to use programming models for implementing the algorithms. This project is developing a practical approach to parallel algorithms by bringing together two fundamental but distinct theories of computing---one based on machine models and following the work that dates back to Dr. Alan Turing, and the other based on language models and following the work that dates back to Dr. Alonzo Church. The novelty of the project is in combining these two theories, for which there is currently a large rift. The impact of the project is in making significant steps in simplifying the design and analysis of parallel algorithms, and better understanding of the relationship between the two theories of computing. The educational component of this project involves teaching undergraduates parallel algorithms and creates ample opportunities to test the practical effectiveness of the proposed approach, along with concrete efforts to broaden participation in computing through programs at Carnegie-Mellon University.
The work is following an end-to-end methodology bridging Church and Turing's theories along with practice. On the theory side, the project is developing a calculus called the ``algorithmic lambda-calculus,'' that equips Church's lambda-calculus with a cost semantics, making it possible to reason about the total work and parallel span (time) of programs, which in turn can be used to understand the performance of algorithms, at least asymptotically. To show that this calculus is realizable, the project is theoretically establishing that the calculus is faithful to a transition semantics, which can then be efficiently realized on an abstract machine such as the Parallel-RAM. On the practical side, the project is developing a programming language and a compiler that faithfully implements this theory. The project is also extending the algorithmic lambda-calculus, the realizability theorems, and the implementation to support important features such as aggregate parallel data structures, interaction, and different forms of parallelism.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2019 — 2023 |
Blelloch, Guy Gibbons, Phillip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Spx: Parallel Models and Algorithms For Emerging Memory Systems @ Carnegie-Mellon University
With the advent of highly-parallel many-core (computing) machines, memory has increasingly become a limiting factor in continued performance improvement and scalability, in terms of energy usage, component density, latency, bandwidth, and reliability. To help deal with these and other problems, the semiconductor industry has been developing new byte-addressable nonvolatile random access memory (NVRAM) technologies. These offer the promise of significantly lower energy needs and higher density than standard dynamic random access memory (DRAM), while not losing their state on power loss. However, in NVRAM technology, operations that write to memory are more costly in terms of throughput and energy than operations that read from memory. This project is developing and testing new abstractions for emerging large and extreme-scale computer systems based on NVRAM, and how to effectively leverage this asymmetry for better performance in large computing systems. The focus will be on combining theory and practice, and considering issues across multiple levels of abstraction, from the hardware itself, to high-level algorithms and programming models. The project will include an educational component that will teach students about the new technology and how to effectively use it.
The project consists of three main components: (1) developing methodologies for systems combining volatile and nonvolatile memory that allow individual processors to fail while permitting the overall system to continue correctly, (2) developing efficient algorithms and caching policies for settings where writes are more expensive than reads, and (3) developing techniques to take advantage of the significant computing capability in each memory controller. In the first component, the project is studying how to automatically convert arbitrary concurrent programs into a setting where processors can fail so that the overhead for both running the converted program and recovering from a failure is low. In the second component, the project is developing general purpose techniques to reduce the numbers of writes compared to reads, or reduce the fraction of the memory that needs to be written to, and applying the techniques across a broad class of algorithms. The research team will both develop theory and experimentally measure the effectiveness of these techniques and algorithms. In the third component, the project is looking at how to use the memory controllers to reduce the cost of fault tolerance and allow for weaker memory models, with the purpose of scaling to large systems. A key intellectual challenge is to ensure that the models, techniques, and algorithms are simultaneously simple, elegant, and practical.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2021 — 2025 |
Blelloch, Guy Acar, Umut [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Pposs: Large: Unifying Software and Hardware to Achieve Performant and Scalable Frictionless Parallelism in the Heterogeneous Future @ Carnegie-Mellon University
Exploiting parallelism is essential to making full use of computer systems, from phones to supercomputers. It is thus intrinsic to most applications today, and is becoming increasingly so with time, especially as hardware becomes more heterogeneous. Programming effective and performant parallel applications remains a serious challenge, however. Achieving both high productivity and high performance currently requires multiple experts. The project seeks to reduce this to an ordinary programmer. This problem is often approached along only one of two lines, "theory down", focusing on high-level parallel languages and the theory and practice of parallel algorithms, or "architecture up", focusing on rethinking abstractions at multiple layers, starting with the hardware. The project’s core novelties are (1) to unify these two approaches, combining their strengths to reduce the expertise needed to write performant parallel programs, and (2) to develop integrated techniques that can enable taking advantage of heterogeneous hardware. Realizing these novelties will require designing a "full-stack" approach to parallelism and innovation across the hardware/software stack. The project's impacts are (1) the development of techniques that dramatically simplify parallel programming, including for heterogeneous machines, putting it into the purview of the ordinary programmer, and (2) the development of systems and educational materials to teach this skill to broader audiences including students at the researchers' institutions.
The technical strategy of the project is to bridge high-level parallel languages, which allow clean expression and analysis of program parallelism, to heterogeneous, extensible hardware (modeled using FPGAs) through an integrated series of intermediate representations (IRs) of a program and of the hardware/software capabilities of the target platform. The design of these representations will be geared to avoid the information loss (going both up and down the compiler/runtime/OS/hardware stack) that currently hampers optimization at all levels. A new compilation model for high-level parallel languages is being developed that extensively leverages modern compiler technology, but also avoids "premature lowering" of parallel constructs, and "premature abstraction" of hardware and low-level software features. Benchmarks are beinge developed to measure the effectiveness of the approach.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |