2002 — 2006 |
Appel, Andrew [⬀] Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: High-Assurance Common Language Runtime
The proposed research focuses on the design and implementation of new technologies for building high-confidence component software systems. The new technology is directly relevant to improving security of commercial virtual machines such as the Java virtual machine (JVM) and Microsoft's .NET Common Language Runtime (CLR). The work concentrates on three areas:
1. High-level specifications for low-level software. General and flexible logic-based type systems (LTS) are being designed. The type systems are derived from the Certified Binaries technology developed by the authors and extend the scope, expressiveness and precision of verification techniques used in current JVM and CLR implementations.
2. A high-assurance virtual machine. Using the authors Foundational Proof-Carrying Code technology, higher-assurance, validated implementations of the JVM or CLR infrastructure are being built. The authors are engaged in technology transfer of their ideas to a virtual machine being built at Intel.
3. Resource certification. New technologies for specifying, composing, and verifying advanced properties such as resource bounds on memory and network bandwidth are being developed. These properties are crucial for safe and secure interoperation between untrusted components in large-scale systems.
|
1 |
2003 — 2008 |
Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Programming Languages For Secure and Reliable Computing
ABSTRACT CCR-0238328 David Walker Princeton University
This research develops advanced program analysis techniques that increase the security and reliability of component software systems. In particular, the researchers design, implement and analyze programming languages that contain both of the following complementary technologies.
1) Effective Type Refinements (ETR) that conservatively extend modern type systems. ETR allow programmers to specify fine-grained invariants about contents of component data structures and the ordering and effect of operations in a computation. ETR guarantee properties of programs when they are written and as they evolve through the software development cycle.
2) Run-time Program Monitors (RPM). RPM execute in parallel with an untrusted application, monitoring, auditing and modifying the untrusted application's actions so that it obeys critical security and resource usage policies. RPM complement ETR by enforcing those properties at run-time that cannot be enforced at compile-time.
Together effective type refinements and run-time program monitors provide a cohesive language-based framework for improving the reliability and security of large-scale software systems.
|
1 |
2006 — 2010 |
Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Seiii: Automatic Tool Generation For Ad Hoc Scientific Data
In every scientific discipline, researchers are digitizing their knowledge and using computational methods. This process has generated enormous amounts of ad hoc data, data for which standard data processing tools are not readily available. This kind of data poses challenges to both the users and the software that manipulate it. In order to maximize the efficiency and accuracy with which scientists deal with ad hoc data, new work will expand the existing data description and processing language. The ability to process prevalent kinds of data sources will be expanded and the system modified to generate tools in a robust and automatic way. This will permit the system to provide a way to provide descriptive information about the data directly to the tools. The system will be formalized to prove the correctness of the tools. The research combines novel programming language design, high-performance system engineering and theoretical analysis to solve crucial data processing problems. The system will be tested to address real problems such as fraud detection and in the context of genomic pathway modeling, as well as in cosmology data. Both graduate and undergraduate students are engaged in the interdisciplinary research.
|
1 |
2006 — 2010 |
Walker, David Fiuczynski, Marc |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Csr-Pdos: Managing Os Extensibilty Via Aspect-Oriented Programming Technology
Developers often extend large, C-based software systems, including operating systems, databases, and web servers, in an ad-hoc manner to meet their performance and functionality requirements. Many of the extensions represent crosscutting concerns in that they do not fit within a single program module and are scattered throughout the source. Maintaining and applying such extensions with commonly used tools, notably diff and patch, is time consuming and error prone. Furthermore, it is challenging to ensure correctness when composing multiple extensions. This work makes crosscutting concerns part of the architecture of C-based systems by leveraging aspect-oriented software development techniques. Specifically, extensions are captured as aspects, which provide a language-supported methodology for expressing crosscutting concerns. The work also builds a new set of tools that extract, inject, and translate system extensions represented as aspects. Moreover, the work develops static program analysis technology that aids in the semantic separation of extension from mainline code and safe composition of extensions. To validate the tools' effectiveness, the PIs conduct case studies on the Linux kernel and open source extensions such as the Nooks system that provides device driver isolation and recovery. For the educational community, the work offers an attractive approach to hands-on learning about real-world systems. For the system software community, the work offers the ability to more rapidly and seamlessly move from idea to design to implementation for new system-level extensions. Finally, for the broader community of commercial developers and their customers, the work offers a path to more flexible reuse of system software.
|
1 |
2006 — 2009 |
Pai, Vivek Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr-Sma: Language Support For Data-Centric Systems Monitoring
Complex distributed systems must be monitored to find problems, record/archive system health, oversee system operation, detect malicious activity, and perform a myriad of other tasks. To improve the reliability, security, performance, ease-of-construction and maintaince of system monitors, this research develops a high-level, domain-specific language to specify the data that monitoring systems accumulate, archive and present to users. Using high-level specifications of distributed data sources, our compiler generates reliable, secure, and high-performance monitoring tools that concurrently fetch distributed data, archive (self-describing) data for later analysis, query data to troubleshoot problems, and display statistical data summaries to monitor real-time system health.
To make a broad impact, the PIs will develop tools for monitoring distributed systems, including clusters and wide-area platforms such as PlanetLab, a global network research testbed with 400-450 nodes and 200-250 network experiments running at any given time. We will make the technology open-source, enabling others to use the monitoring system for their own projects. Additionally, we will also work with industrial partners to transfer our monitoring technology to industry.
Finally, the language and compiler system being built will make a broad impact outside the networking community as the kind of ad hoc data found in monitoring systems also appears across the natural and social sciences, including biology, chemistry, physics and economics. Hence it will be possible to use the specification language to describe the formats of scientific data sets and to generate querying and visualization tools that help improve the productivity of a wide range of computational scientists.
|
1 |
2006 — 2011 |
Appel, Andrew (co-PI) [⬀] Clark, Douglas (co-PI) [⬀] Martonosi, Margaret (co-PI) [⬀] August, David (co-PI) [⬀] Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ct: Well-Typed Trustworthy Computing in the Presence of Transient Faults
David Walker Princeton University 0627650 Panel: 060970 Well-typed trustworthy computing in the presence of transient faults
Abstract
In recent decades, microprocessor performance has been increasing exponentially, due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, their lower threshold voltages and tighter noise margins make them less reliable, rendering processors that use them more susceptible to transient faults caused by energetic particles striking the chip. Such faults can corrupt computations, crash computers, and cause heavy economic damages. Indeed, Sun Microsystems, Cypress Semiconductor and Hewlett-Packard have all recently acknowledged massive failures at client sites due to transient faults.
This project addresses several basic scientific questions: How does one build software systems that operate on faulty hardware, yet provide ironclad reliability guarantees? For what fault models can these guarantees be provided? Can one prove that a given implementation does indeed tolerate all faults described by the model? Driven in part by the answers to these scientific questions, this project will produce a trustworthy, flexible and efficient computing platform that tolerates transient faults. The multidisciplinary project team will do this by developing: (1) programming language-level reliability specifications so consumers can dictate the level of reliability they need, (2) reliability-preserving compilation and optimization techniques to improve the performance of reliable code but ensure correctness (3) automatic, machine-level verifiers so compiler-generated code can be proven reliable, (4) new software-modulated fault tolerance techniques at the hardware/software boundary to implement the reliability specifications, and finally (5) microarchitectural optimizations that explore trade-offs between reliability, performance, power, and cost.
|
1 |
2010 — 2014 |
Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf:Small:Language Support For Ad Hoc Data Processing
In every business, engineering endeavour and scientific discipline, workers are digitizing their knowledge with the hope of using computational methods to categorize, query, filter, search, diagnose, and visualize their data. While this effort is leading to remarkable industrial and scientific advances, it is also generating enormous amounts of ad hoc data (i.e., that data for which standard data processing tools such as query engines, statistical packages, graphing tools, or other software is not readily available). Ad hoc data poses tremendous challenges to its users because it is often highly varied, poorly documented, filled with errors, and continuously evolving --- yet ad hoc data also contains much valuable information. The goal of this research is to develop general-purpose software tools and techniques capable of managing ad hoc data efficiently. This research has the potential for a broad impact on society by dramatically improving the productivity of industrial data analysts, computer systems administrators and academics who must deal with ad hoc data on a day-to-day basis.
The central technical challenge of the research involves designing, implementing and evaluating a new domain-specific programming language that facilitates the management of ad hoc data sets. This new programming language will allow data analysts to specify the structure of ad hoc data files, how those files are arranged in a file system and what meta-data is associated with them. Once a specification is complete, it will be possible to use it as documentation for the data set or for generating data-processing tools. The research will also involve developing new methods for enabling users to generate specifications quickly and accurately, without actually having to write down all of the details by hand. Finally, the research will develop new algorithms for implementing the generated data-processing tools efficiently.
|
1 |
2010 — 2016 |
August, David [⬀] Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Accelerating the Pace of Research Through Implicitly Parallel Programming
Today, two trends conspire to slow down the pace of science, engineering, and academic research progress in general. First, researchers increasingly rely on computation to process ever larger data sets and to perform ever more computationally-intensive simulations. Second, individual processor speeds are no longer increasing with every computer chip generation as they once were. To compensate, processor manufacturers have moved to including more processors, or cores, on a chip with each generation. To obtain peak performance on these multicore chips, software must be implemented so that it can execute in parallel and thereby use the additional processor cores. Unfortunately, writing efficient, explicitly parallel software programs using today's software-development tools takes advanced training in computer science, and even with such training, the task remains extremely difficult, error-prone, and time consuming. This project will create a new high-level programming platform, called Implicit Parallel Programming (IPP), designed to bring the performance promises of modern multicore machines to scientists and engineers without the costs associated with having to teach these users how to write explicitly parallel programs. In the short term, this research will provide direct and immediate benefit to researchers in several areas of science as the PIs will pair computer science graduate students with non-computer science graduate students to study, analyze, and develop high-value scientific applications. In the long term, this research has the potential to fundamentally change the way scientists obtain performance from parallel machines, improve their productivity, and accelerate the overall pace of science. This work will also have major educational impact by developing courseware and tutorial materials, useable by all scientists and engineers, on the topics of explicit and implicit parallel computing.
IPP will operate by allowing users to write ordinary sequential programs and then to augment them with logical specifications that expand (or abstract) the set of sequential program behaviors. This capacity for abstraction will provide parallelizing compilers with the flexibility to more aggressively optimize programs than would otherwise be possible. In fact, it will enable effective parallelization techniques where they were impossible before. The language design and compiler implementation will be accompanied by formal semantic analysis that will be used to judge the correctness of compiler transformations, provide a foundation for about reasoning programs, and guide the creation of static analysis and program defect detection algorithms. Moreover since existing programs and languages can be viewed as (degenerately) implicitly parallel, decades of investment in human expertise, languages, compilers, methods, tools, and applications is preserved. In particular, it will be possible to upgrade old legacy programs or libraries from slow sequential versions without overhauling the entire system architecture, but merely by adding a few auxiliary specifications. Compiler technology will help guide scientists and engineers through this process, further simplifying the task. Conceptually, IPP restores an important layer of abstraction, freeing programmers to write high-level code, designed to be easy to understand, rather than low-level code, architected according to the specific demands of a particular parallel machine.
|
1 |
2011 — 2016 |
Walker, David Rexford, Jennifer (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Tc: Large: Collaborative Research: High-Level Language Support For Trustworthy Networks
Computer networks are now, arguably, the United States' most critical infrastructure. They control all communication amongst our citizenry, our businesses, our government, and our military. Worryingly, however, today's networks are remarkably unreliable and insecure. A significant source of vulnerability is the fact that the underlying network equipment (e.g., routers and switches) run complicated programs written in obtuse, low-level programming languages, which makes managing networks a difficult and error-prone task. Simple mistakes can have disastrous consequences including making the network vulnerable to denial-of-service attacks, hijackings, and wide-scale outages.
The goal of this research is to transform the way that networks are managed by introducing a new class of network programming languages with the following essential features: (i) network-wide, correct-by-construction abstractions; (ii) support for fault-tolerance and scalability; (iii) coordination with end-hosts and independently-administered networks, as well as mechanisms for establishing trust between them; (iv) formal verification tools based on rigorous semantic foundations; and (v) compilers capable of generating efficient and portable code that runs on heterogeneous equipment. To demonstrate how to build a language with these features, the researchers are designing a language for OpenFlow networks called Frenetic, and evaluating it on several novel security applications. This project will have broad impact by (i) discovering key techniques for increasing the reliability of our networks, (ii) opening up the interfaces used to program networks, thereby enabling grass-roots innovation where it was previously not possible, and (iii) educating a new community of researchers with advanced skills in both networking and programming languages.
|
1 |
2017 — 2021 |
Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nets: Medium: Collaborative Research: Network Configuration Synthesis: a Path to Practical Deployment
All sectors of society depend on properly functioning computer networks. For example, every day, millions of citizens order prescription drug refills, pay their electricity bills, book hotels, shop for groceries, and participate in thousands more activities online, through the cloud. But none of these services will work if the networks that deliver information are down. Moreover, modern business, healthcare, the military and the government are just as dependent on reliable networks as everyday citizens. Many network outages are caused by operators manually (and incorrectly) programming the 'configuration files' that manage the ways that network devices forward information. While the flexibility allowed by configuration files is essential, network outages are often caused by operators using hundreds of low-level directives at each network device to create network-wide policy. Because the global consequences of making even small configuration changes is so drastic, many organizations take several weeks to audit even small changes, limiting their ability to respond effectively to traffic fluctuations, business opportunities, security threats and hardware failures. A natural solution to these problems -- analogous to the trend in programming languages for software development over the last several decades as programmers have moved from machine code to Java -- is to define more robust, higher-level programming languages for implementing network policies. However, there are technical and pragmatic hurdles to surmount before it will be possible to deploy new languages in industrial settings on a large scale. In particular, existing network-wide policy languages are not expressive enough for many desired network policies and often require wholesale migration to new networking platforms. Hence, the overarching goal of this project is to surmount the technical challenges that impede practical deployment of high-level network programming languages. The project is developing the core technology necessary to efficiently support and incrementally deploy high-level network policies. The project leverages connections to two major cloud providers as a means to test the resulting languages and systems on real industrial networks, identify pragmatic barriers to adoption, and ultimately deploy the technology where possible.
The project builds on the PIs' recent work on Propane, a new network programming language that allows users to describe end-to-end paths for intra- and inter-domain routing, along with a compiler that produces configurations for the industry-standard BGP protocol. The results of this project will extend Propane in several ways to support practical deployment: First, users will be able to declare device roles (e.g., top-of-rack switch) and the connectivity invariants related to them to enable concise specifications. A new compiler will verify safety properties of policies in the presence of such declarations and generate parameterized templates that make compiler outputs more intelligible for operators. Second, users will specify financial contracts that govern transit costs using a new declarative language and the compiler will optimize routes automatically by generating refined policies that meet objectives. Third, the Butane compiler will target and exploit the benefits of heterogeneous back-end protocols and platforms. Fourth, tools will help network operators infer new high-level configurations from existing low-level configurations and to verify that new configurations are equivalent to old ones. Finally, Butane will support mixed mode (legacy- and high-level network operations) so engineers can migrate their networks slowly over time and test partial deployment on small fractions of their live traffic.
|
1 |
2018 — 2022 |
Rexford, Jennifer (co-PI) [⬀] Walker, David Gupta, Aarti [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Fmitf: Openrdc: a Framework For Implementing Open, Reliable, Distributed, Network Control
Computer networks, whether connecting servers across a data center or users across the globe, are an important part of society's critical infrastructure. However, existing network protocols and services are simply not worthy of the trust society now places in them. Today's networks suffer from poor performance, cyberattacks, configuration errors, software bugs, and more, leading to serious consequences for consumers, businesses, and governments alike. The goal of this project is to enable the design and operation of better networks which requires enabling both innovation (to create better protocols and services) and verification (to ensure these services work correctly). A major part of the functionality of the network depends on the software running in the control plane, which computes routes, collects and analyzes network measurement data, balances load over multiple paths or servers and even hosts in-network applications. This project involves the theory, design, and implementation of OpenRDC, a new platform constructed for programming reliable, distributed network control planes.
The technical core of OpenRDC centers around computations of Stable Information Trees (SITs) that communicate information (e.g., traffic conditions, failure information, available external routes, end-host job statistics, etc.) across a network, and then perform local actions to change network functionality or record information gathered. These structured computations suffice to express core control plane algorithms and yet can also be converted into logical representations that can be used to verify a variety of important properties of operational networks ranging from reachability to access control to multi-path consistency. The OpenRDC platform will simultaneously: (1) allow researchers to develop new control-plane algorithms, (2) enable automatic verification of network properties, and (3) make use of emerging programmable switch capabilities. The project involves acceleration of the development of new control-plane algorithms, via new abstractions for network programming. The project will also define new compiler technology for translating these abstractions to programmable network hardware. In addition, its open source infrastructure will lay a foundation for academic and industrial engagement and for the training of students. The project will also have impact on formal methods, with new algorithms for the verification of graph-oriented programming languages based on abstraction and modular decomposition.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2022 — 2024 |
Rexford, Jennifer (co-PI) [⬀] Walker, David Kim, Hyojoon |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Imr: Mt: Tools For Programming Distributed Data-Plane Measurements
Understanding the flow of traffic across key networks---what it is composed of and how it changes---is critical for improving modern information services. Traditionally, however, it has been difficult for researchers to develop new tools for dissecting this traffic and analyzing its characteristics, while taking care to maintain user privacy. Recently, though, the development of relatively cheap programmable switches has made it possible to develop diagnostic tools and place them directly inside the network, on the path through which traffic flows. In such a position, new tools have the potential to see all the internet traffic as it flows by, from a university campus to the broader internet, for instance, or along a corporate wide-area network or data center. Unfortunately, while it is possible to develop such tools, doing so is currently an incredibly difficult and error-prone process. To ameliorate this situation, the research team will develop Lucid, a new programming language and system that will facilitate the process of developing, debugging, and deploying network measurement tools in live programmable networks. The research team will deliver a compiler that translates high-level Lucid programs into lower-level code that execute in multiple places---directly on programmable switches, or in support, on servers connected to the network in question. In addition, the team will deliver a collection of reusable components that network measurement researchers can plug together to get started on a new idea quickly. To help teach researchers how to use the new language, the team is developing tutorials for major conferences in networking. To summarize, this project will impact the performance, reliability, and security of critical networks by facilitating the development of new measurement tools that can discover network optimization opportunities, detect failures, and rapidly recognize attacks that disrupt online services.<br/><br/>Traditional measurement tools and datasets, while incredibly useful, have significant limitations in scale and coverage. Measurement researchers should capitalize on the exciting advances in programmable data planes to analyze Internet traffic and performance as packets traverse the network. Analyzing traffic directly in the data plane (e.g., network switches, routers) enables sophisticated analysis without sacrificing efficiency or divulging sensitive user information, and enterprise networks, such as university campuses, provide an excellent opportunity to use these programmable data planes in practice. However, programming the data plane is not easy. Existing languages, such as P4, are very low-level, have an extremely steep learning curve, and are notoriously difficult to work with (with seemingly legitimate programs often failing to compile). This project addresses these pain points by delivering new programming support in the form of Lucid, a high-level language designed to support cooperative measurement across multiple locations and device types. More specifically, the research team is developing compilers that will target both Intel Tofino programmable switches (via P4) and software servers (via eBPF). Using both kinds of devices, researchers will be able to develop and deploy a range of different kinds of distributed measurement tools. The research team will also develop an interpreter for the language so that interesting new research ideas may be developed and debugged prior to deployment. The infrastructure developed by the research team will also include a suite of libraries that implement key data structures and utilities useful in network measurement and in support of data privacy. To teach the community how to use our language, libraries, tools, and infrastructure, the team will develop documentation and tutorials.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2022 — 2025 |
Walker, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Fmitf: Track I: Specifying and Verifying Network-Wide Properties of Dynamic Data Planes
Computer networks connect users to bank accounts, online commerce, information about government, social media, and other cloud services. They are essential for modern business and everyday life. However, managing such networks is difficult and error-prone. As a result, damaging outages that have shut down everything from food deliveries to plane flights to critical information services occur all too frequently. This research project develops new technology for verification of a key component of modern computer networks---their data planes---which traditionally have been used just to forward traffic, but are increasingly used to implement complex network control logic and next-generation applications. The verification tools being developed will check network configurations both prior to deployment of those configurations, and also as those configurations execute, thereby making networks more reliable, and preventing many future outages before they can occur. This research project also provides educational opportunities for a diverse range of students, teaching them key skills in both networking and the application of formal verification methods.<br/> <br/>Spurred by the recent development of programmable network switches and programmable network interface cards (NICs), the data planes of industrial computer networks are becoming highly dynamic, allowing them not only to forward traffic but also to implement network control logic that can adjust to evolving network conditions. The goal of the proposed research is to make these new networks substantially more reliable via tools that can formally specify and verify network-wide behavior. These specifications use a common language of events to represent data plane activity (e.g., packet arrivals), control plane messages (e.g., routing announcements), and application layer events (e.g., installation of a cache entry) in a uniform way. The verification technology to validate these specifications uses a combination of static and dynamic techniques. Static verification catches bugs before network program changes are deployed, while dynamic verification, via monitors synthesized from high-level specifications, checks that static assumptions hold up, bridges scaling gaps, and provides defense in depth.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |