2010 — 2015 |
Varghese, George (co-PI) [⬀] Vahdat, Amin (co-PI) [⬀] Porter, George |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Medium: Scale, Isolation, and Performance in Data Center Networks @ University of California-San Diego
Modern data centers host operations as varied as flight planning, drug discovery, and Internet search running on thousands of machines. These services are often limited by the speed of the underlying network to coordinate parallel data access. In current data centers, network I/O remains a primary bottleneck and a significant fraction of capital expenditure ($10B/year). Compounding the problem are operational issues caused by interference between services, down times due to failures, and violations of performance requirements. This project will develop a hardware/software architecture with the following capabilities: i) non-blocking bandwidth to hundreds of thousands of hosts; ii) ``slicing'' across services with minimum bandwidth guarantees; iii) detecting fine-grained performance violations; iv) tolerating a range of failure scenarios; v) supporting end host virtualization and migration. Our goal is to enable modular deployment and management of networking infrastructure to keep pace with the burgeoning computation and storage explosion in data centers. This work will result in a prototype fully functional virtualizable data center network fabric to support higher-level services. Broader impacts include: i) outreach to under-represented minorities through the UCSD COSMOS program; ii) a public release of the data center communication workloads, protocols, and algorithms we develop; iii) working with our industrial partners and advisory board to address key performance and reliability issues in a critical portion of the national computation infrastructure. A significant outcome will be students trained in data center networking and cloud computing.
|
0.903 |
2011 — 2014 |
Vahdat, Amin (co-PI) [⬀] Porter, George |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Highly Efficient, Pipeline-Oriented Data-Intensive Scalable Computing @ University of California-San Diego
Intellectual Merit: Data-intensive Scalable Computing (DISC) is increasingly important to peta-scale problems, from search engines and social networks to biological and scientific applications. Already datacenters built to support large-scale DISC computing operate at staggering scale, housing up to hundreds of thousands of compute nodes, exabytes of storage, and petabytes of memory. Current DISC systems have addressed these data sizes through scalability, however the resulting per-node performance has lagged behind per-server capacity by more than an order of magnitude. For example, in current systems as much as 94% of available disk I/O and 33% of CPU remain idle. This results in unsustainable cost and energy requirements. Meeting future data processing challenges will only be possible if DISC systems can be deployed in a sustainable, efficient manner.
This project focuses on two specific, unaddressed challenges to building and deploying sustainable DISC systems:
-a lack of per-node efficiency and cross-resource balance as the system scales, and -highly-efficient storage fault tolerance tailored to DISC workloads.
This project's approach is to automatically and dynamically ensure cross-resource balance between compute, memory, network, and underlying storage components statically during system design, as well as dynamically during runtime. The goal is to support general DISC processing in a balanced manner despite changing application behavior and heterogeneous computing and storage configurations. This work will result in a fully functional prototype DISC system supporting the Map/Reduce programming model to support general-purpose application programs.
Broader impacts include: -training diverse students, such as undergraduates and underrepresented groups - to understand DISC services as an interesting part of the overall curriculum and as a resource for interdisciplinary collaboration. -a public release of the proposed balanced runtime system, including support for higher-level programming models; -working with industrial partners as part of UCSD's Center for Networked Systems to address sustainability and efficiency issues in this critical portion of industrial and governmental data processing.
|
0.903 |
2013 — 2016 |
Papen, George Ford, Joseph (co-PI) [⬀] Snoeren, Alex (co-PI) [⬀] Porter, George |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nets: Large: Collaborative Research: Hcpn: Hybrid Circuit/Packet Networking @ University of California-San Diego
Ever-larger data centers are powering the cloud computing revolution, but the scale of these installations is currently limited by the ability to provide sufficient internal network connectivity. Delivering scalable packet-switched interconnects that can support the continually increasing data rates required between literally hundreds of thousands of servers is an extremely challenging problem that is only getting harder. This project leverages microsecond optical circuit-switch technology to develop a hybrid switching paradigm that spans the gap between traditional circuit switching and full-fledged packet switching, achieving a level of performance and scale not previously attainable. This will result in a hybrid switch whose optical switching capacity is orders of magnitude larger than the electrical packet switch, yet whose performance from an end-to-end perspective is largely indistinguishable from a giant (electrical) packet switch.
The research provides a quantitative baseline for hybrid network design across a wide range of present and future technologies. The project will consist of five parts: i) traffic characterization to identify the class of network traffic that a circuit switch can support as well as the partitioning of the traffic between the circuit and packet portions of the network; ii) circuit scheduling to enable the circuit switch to rapidly multiplex a set of circuits across a large set of data center traffic flows; iii) traffic conditioning to reduce the variability of traffic at the end hosts, easing the demands placed on switch scheduling; iv) a prototype hybrid network that can use an optical circuit switch that operates three orders of magnitude faster than existing solutions; and v) a trend analysis to understand the tradeoffs resulting from potential future technology advances.
The work stands to dramatically improve data center networks, significantly reducing operating costs and increasing energy efficiency. The research material will be incorporated into courses, helping to train the next generation of computer networking scientists and engineers. The PIs will also continue ongoing outreach to high school students, both through the UCSD COSMOS summer program and through talks delivered at local high schools.
|
0.903 |
2016 — 2021 |
Porter, George |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: a Scalable Multiplane Data Center Network @ University of California-San Diego
Large Internet data center providers, both public and private, must support ever-increasing data rates between literally hundreds of thousands of servers to meet processing and storage demand. Operators have relied on similar scale-out network fabrics (typically folded-Clos topologies) to construct their networks. Since their deployment in the mid-2000s, these scale-out designs have leveraged the steadily increasing performance and decreasing cost of complementary metal-oxide semiconductor (CMOS)-based switching silicon to keep pace with demand. Unfortunately, these trends cannot continue: network switches face the same CMOS process-scaling limitations that currently hamper central processing unit (CPU) manufacturers. Just as CPUs have moved to multi-core designs to side-step their scaling limitations, so too will data center operators need to adopt alternative architectures to scale to next-generation link rates.
This project will demonstrate a hybrid electrical/optical nework topology, called SelectorNet, which scales to hundreds of thousands of servers at link rates reaching 1.6 terabits per second. Unlike recent proposals which utilize two dimensional- or three-dimensional microelectromechanical systems (2D or 3D-MEMS) optical crossbar switches, SelectorNet relies on a novel optical device that abandons the crossbar abstraction. Instead, it relies on indirection to deliver packets between hosts that are not directly connected by our novel "selector" switches. The result is a network fabric that is not only cost-competitive with state-of-the-art Clos-based designs in 2020, but continues to scale in terms of cost, energy, performance, and reliability as link rates surpass 400 gigabits per second.
Broader Impact: Ensuring that the benefits of this work have impact beyond the traditional metrics of research is integral to its design. The results of this research will make it easier to design and build scalable, efficient, and highly-available cloud and data center services. By reducing the cost to deploy cloud infrastructure, the researchers hope to lower costs for the largest operators, while reducing the barrier to entry of the cloud for smaller organizations. They will further expand the research skills of graduate and undergraduate students to address necessary datacenter efficiency and cloud computing research challenges in a hands-on manner. Exposing undergraduate students to cloud computing technologies in their courses and through mentored research will enhance their marketability at graduation and has the potential to inspire their curiosity and encourage the pursuit of graduate studies. Teaching students how to build state-of-the-art networked systems that are grounded in rigorous analysis and practical constraints is essential in our increasingly networked world. An additional component of this research will be the creation and dissemination of videos that will broaden public awareness and appreciation of the science and engineering challenges facing large-scale computing, machine learning, and Internet systems.
|
0.903 |
2016 — 2019 |
Voelker, Geoffrey (co-PI) [⬀] Savage, Stefan (co-PI) [⬀] Snoeren, Alex [⬀] Porter, George Levchenko, Kirill (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ii-New: a Dual-Purpose Data Analytics Laboratory @ University of California-San Diego
The research enabled by the supported infrastructure has the potential to dramatically impact society in two ways. First, by undermining entire cyber-crime ecosystems: disrupting underground activities, infrastructure, and social networks through strategic intervention. Inhibiting the flow of money reduces the profitability of these activities, thereby subverting the key incentive underlying modern cybercrime. Second, improved efficiency of data center networks will significantly reduce operating costs and increase energy efficiency. The infrastructure will also create educational opportunities for students at a variety of levels, expanding the research skills of postdoc, graduate, and undergraduate students to address both data center network design and security research challenges.
This project is to pursue two separate multi-year research agendas. One is to collect and analyze extremely large datasets pertaining to various aspects of Internet malware and cybercrime while concurrently exploring new high-performance hybrid optical/electrical network architectures that dramatically decrease the cost and complexity of the infrastructure required to support such analytics. This award supports compute and storage resources to both conduct the analytics required for the Ecrime research, while simultaneously serving as a testbed for our prototype hybrid network switches.
The research enabled by this infrastructure has two key components: 1) Through in-depth empirical analyses of a range of online criminal activities, the PIs are developing an understanding of the shape of key economic and social forces---as seen at scale---in terms of relevance for both attackers and defenders. 2) Characterizing network traffic generated by large-scale data analytics, focusing specifically on identifying the class of network traffic that a circuit switch can support as well as the partitioning of the traffic between the circuit and packet portions of the network.
|
0.903 |
2016 — 2019 |
Snoeren, Alex (co-PI) [⬀] Porter, George |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nets: Medium: Improving Network Performance and Efficiency Through Multi-Channel Network Links @ University of California-San Diego
This project seeks to challenge the conventional abstract model of a high-speed network link as a single, logical point of attachment. Instead, the proposed approach exposes the inherent parallelism that exists within the end host, network links, and the network fabric as a whole, to applications and the network control plane. The result is a network fabric consisting of a number of distinct physical networks that coexist within a single physical topology. By decoupling the channels making up network links, the project radically redesigns the network fabric to address today's requirements and challenges. A key hypothesis of this work is that composing multiple, potentially heterogeneous networks provides for greater scaling, performance, service quality, and manageability than maintaining the legacy fat-pipe link abstraction.
The proposed research will impact the broader community in four ways: (1) by addressing the societal need of large-scale networking infrastructure to support next-generation clusters and data centers, by (2) engaging with industry to help inform the design and construction of new devices, by (3) interacting with other scientific communities through interdisciplinary research, and by (4) engaging with graduate and undergraduate students to translate the resulting research into structured courses and hands-on learning experiences for traditionally under-represented student groups.
|
0.903 |
2019 — 2022 |
Snoeren, Alex [⬀] Porter, George |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cns Core: Small: Designing Efficient Cloud Datacenter Network Fabrics @ University of California-San Diego
Cloud datacenter networks are tasked with providing connectivity between an ever-increasing number of end hosts whose link rates improve by orders of magnitude every few years. What network operators would ideally like is a single, full-bandwidth switch that could connect every endpoint at full rate. Such an idealized network would enable them to place jobs and data where it is convenient, without worrying about bandwidth bottlenecks, hotspots, and other network-induced limitations. Unfortunately, preserving this ``big-switch'' illusion of a single network with full bandwidth is increasingly cost prohibitive and likely soon infeasible.
This project will explore an alternative method of constructing datacenter network fabrics based upon a provably optimal topological construct, an expander graph. If successful, the project will result in network fabrics that are more flexible, capable, and scalable than existing state-of-the-art approaches. This project will develop a family of cloud datacenter network topologies based on expander graphs that eliminate the capacity bottlenecks inherent in hierarchical Clos-based topologies while minimizing the bandwidth tax incurred due to indirect routing. A single, large expander-graph network topology can be constructed out of multiple, disjoint expander graphs; this project will show how judicious tenant placement can then provide both isolation and dynamic capacity while minimizing the bandwidth tax. Moreover, by employing reconfigurable network components (i.e., circuit switches), it is even possible to evolve the set of constituent expander graphs over various time scales, allowing cloud datacenter operators to better suit the needs of their current tenants. Indeed, if the timescales are sufficiently small (e.g., 100s of milliseconds) tenants may then choose to buffer traffic until a particularly favorable (set of) path(s) is available, further decreasing the overall bandwidth inefficiency or "tax". If the network topology evolves at a rapid rate, it is possible to choose, on a per-packet basis, whether to either (1) immediately send a packet over whatever static expander is currently instantiated, incurring a modest tax on this small fraction of traffic, or (2) buffer the packet and wait until a direct link is established to the ultimate destination, eliminating the bandwidth tax on the vast majority of bytes.
This project will engage graduate and undergraduate students through structured courses, intense mentorship, and hands-on research activities through participation in the NSF-funded UC San Diego Early Research Scholars Program (ERSP).
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.903 |