1994 — 1997 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Research Initiation Award: Scheduling Task and Loop Parallelism On Message-Passing Architectures @ University of California-Santa Barbara
9409695 Yang Efficient scheduling of computational task execution and data movement is essential to high-performance computing on massively parallel machines and workstation clusters. This research focuses on the study of graph scheduling algorithms and the run-time schedule executing methods for mapping program parallelism on message-passing architectures. The goal of optimization is to minimize the parallel time by balancing load among a limited number of processors, eliminating unnecessary communication and overlapping computation with communication. There exists two types of parallelism in a program, task parallelism is expressed as a collection of functional tasks with communication between them. The dependence structure is a directed acyclic graph (DAG). Loop parallelism is expressed as a collection of iterated computation with loop- carried dependence. The main research topics identified in this project are to develop scheduling algorithms for both task and loop parallelism and run-time methods to execute the scheduled graph, study the methods of loop transformation to help graph scheduling algorithm to explore more parallelism, apply these algorithms in scientific computing such as sparse matrix factorization and iterative methods, analyze the impact of program partitioning on scheduling performance. Research activities include developing and implementing algorithms for the above problems and evaluating the performance on parallel machines and workstation clusters, integrating these algorithms in a scheduling tool and distributing the implementation. ***
|
0.961 |
1996 — 1997 |
Yang, Tao Ibarra, Oscar (co-PI) [⬀] Schauser, Klaus (co-PI) [⬀] Rinard, Martin [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Instrumentation: a Next-Generation High Performance Network of Commodity Pcs @ University of California-Santa Barbara
CDA 9529418 Ibarra, Oscar H. Rinard, Martin C. Schauser, Klaus E. Yang, Tao University of California, Santa Barbara A Next-Generation High-Performance Network of Commodity PCs The proposed instrumentation consists of a collection of commodity personal computers (PCs) connected by a commodity ATM network. This computing platform supports four experimental research projects in the area of high performance parallel and distributed systems. The first project investigates efficient communication primitives for cost-effective commodity computing and networking platforms. The objective is to enable important parallel computations to execute efficiently in this computing environment. The second project investigates a new parallelizing compiler technique called commutativity analysis. The objective is to extend the range of parallelizing compilers to include computations that manipulate complex pointer-based data structures. The third project investigates scheduling and run-time supporting techniques for irregular scientific computations. The objective is to understand how to map these computations efficiently onto a modern parallel computing environment consisting of commodity hardware components. The final project investigates issues in developing a scalable WWW server for digital library applications. The objective is to strengthen the server's processing capabilities to match huge expected increases in simultaneous access requests from the Internet. The main research activity in all of these projects is developing, testing and measuring software. The instrumentation provides the hardware platform required to perform these activities.
|
0.961 |
1996 — 2001 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
U.S.-France Cooperative Research: Parameterized Task Graph Scheduling @ University of California-Santa Barbara
This three-year award supports U.S.-France cooperative research in computer architecture between Tao Yang at the University of California, Santa Barbara and Michel Cosnard of the Laboratory for Parallel Computing at the Ecole Normale Superieure, Lyon, France. Dr. Apostolos Gerasoulis of Rutgers University will also participate in the project. The investigators propose to study symbolic scheduling of tasks for parallel distributed memory architectures. Efficient scheduling computational task execution and data movement is considered essential to high performance computing on massively parallel machines and workstation clusters. The U.S. investigators have developed scheduling algorithms and software tools. This is complemented by French expertise in techniques for automated parallelization of sequential FORTRAN programs. The project takes advantage of a prototype system developed by the French research group.
|
0.961 |
1997 — 2002 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Scheduling and Run-Time Support For Parallel Irregular Computations @ University of California-Santa Barbara
This project will focus on the development and evaluation of scheduling and run-time optimization techniques for supporting the parallelization of irregular scientific code with mixed granularity on distributed memory machines and workstation clusters. The main research topics will be to develop general space/time efficient scheduling techniques and integrate memory and communication support for solving large-scale irregular problems. A software tool will be developed for evaluating the effectiveness of the proposed techniques and providing an infrastructure which can be used by other researchers. Interoperability with other tools will also be investigated if time and funds permit. The targeted applications will be mainly in irregular scientific computing such as iterative methods for nonlinear equations and sparse matrix based solvers. The educational activities will include course development for undergraduate parallel computing, with an emphasis on parallel programming and basic scientific algorithms on distributed and shared memory machines. ***
|
0.961 |
1999 — 2002 |
Yang, Tao Acharya, Anurag (co-PI) [⬀] Abbadi, Amr El (co-PI) [⬀] Egecioglu, Omer Agrawal, Divyakant [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Instrumentation: Scalable Storage Servers For Advanced Information Systems @ University of California-Santa Barbara
9818320 Agrawal, Divyakant Acharya, Anurag University of California at Santa Barbara
Scalable Storage Servers for Advanced Information Systems
This research instrumentation enables research projects in:
- Architecture for Rapidly Growing Datasets, - A Scalable Distributed Architecture for Locating Heterogeneous Information Sources, - Data Placement and Information Access, and - Clustering Support for Parallel Digital Library/Web Servers.
To support the aforementioned projects, this award contributes in building an instrumentation infrastructure for conducting experimental research on scalable storage servers at the University of California at Santa Barbara, Computer Science Department. The funds will contribute to the purchase of a cluster of quad-processor Workstations, several single processor PCs, a Fast Ethernet switch and a Fibervault with a large number of FiberChannel disks. This equipment will be dedicated to support research in databases, operating systems, and parallel computing. The equipment will be used for several interrelated research projects, including in particular: evaluation of architectures for rapidly growing datasets, design and implementation of a scalable distributed architecture for locating information, data placement for efficient content-based retrieval and design and implementation of cluster-based servers for digital libraries. Active Disk architectures which integrate significant processing power and memory into a disk drive will be investigated. Data placement and information access techniques on active disks will be explored. A scalable distributed architecture for locating heterogeneous information sources will be developed on the proposed instrumentation infrastructure. Finally, clustering support for parrallel digital library and parallel web servers will be developed. A common theme among the proposed research projects is to provide efficient and scalable support for advanced information systems.
|
0.961 |
2000 — 2007 |
Singh, Ambuj [⬀] El Abbadi, Amr Manjunath, Bangalore (co-PI) [⬀] Yang, Tao Madhow, Upamanyu (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Infrastructure: Digital Campus: Scalable Information Services On a Campus-Wide Wireless Network @ University of California-Santa Barbara
EIA-0080134 Singh, Ambuj University of California - Santa Barbara
CISE Research Infrastructure: Digital Campus: Scalable Information Services on a Campus-Wide Wireless Network
Researchers at the University of California at Santa Barbara will implement a wireless-networked, distributed heterogeneous environment on campus and use it to conduct research in databases, networking, distributed systems, and multimedia. The PIs will focus on large-scale systems in which data is the critical resource and system services are based on various data manipulation functions including data collection, movement/delivery, aggregation/processing, and presentation. A significant part of the research will be conducted using a digital classroom, a remote classroom, and individual and team kiosks. Services such as lecture on demand, virtual offices, and remote learning will be provided using this infrastructure. Specific research issues that will be investigated include content-based access, personalized views, multi-dimensional indexing, smart end-to-end applications, joint source-network coding, scalable storage, reliable network service, information summarization, distributed collaboration, multimedia annotation, and interactivity.
|
0.961 |
2000 — 2005 |
Yang, Tao Petzold, Linda [⬀] Mezic, Igor (co-PI) [⬀] Macdonald, Noel (co-PI) [⬀] Tirrell, Matthew (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Computational Infrastructure For Microfluidic Systems With Applications to Biotechnology @ University of California-Santa Barbara
The applications of microfluidic devices (which involve liquids moving in spaces measured in micrometers, i.e. millionths of a meter) are growing explosively. As a specific example, consider the development of microsystems for blood testing and screening. For consumers, one could envision devices available in drugstores that could perform genetic screening for conditions of concern to individuals. At a larger scale, use of such devices in blood banks could significantly reduce the time and blood lost in screening the 14 million pints of blood donated per year. Sample preparation is a critical bottleneck in the development of integrated miniature analytical systems, and it remains largely unaddressed. It is currently done outside the microsystem by mixing, shaking, and pipetting, because there are no effective integrated design method. Improved computational methods promise to allow integration and interconnection of microfluidics. This will have an effect analogous to automated methods for VLSI design on microelectronics; it will revolutionize the field.
This project will develop a computational infrastructure for simulation and design of microfluidic systems involving non-Newtonian, micrometer/nanometer-scale flows dominated by surface-related phenomena. Computational tools and analytical tools will be developed and used to compare with theoretical and experimental results. The project emphasizes methods to deliver complex molecules to flow surfaces, to create surface reaction sites and to provide the components for molecular-scale mixing and dispensing. It will design, fabricate, and characterize both stationary and oscillating MEMS fluidic channels and surfaces to evaluate molecular-scale mixing, flow, delivery, and dispensing of complex biological fluids. The focus will be on surface dominated flow and reaction phenomena that can be scaled for delivery of single molecules to programmed reaction sites. Such surface-related phenomena should find broad application in making MEMS-based, "chip-scale" analytical instruments and "biochips". The computational tools required to analyze and design such devices are currently nonexistent. This project brings together a team of computer scientists, numerical analysts, fluid dynamicists, experimentalists, and microscale process theoreticians who will collaborate closely on creating those tools and using them.
|
0.961 |
2000 — 2004 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Optimizing Execution of Parallel Programs On a Cluster of Shared Memory Machines @ University of California-Santa Barbara
Parallel programs using MPI are widely used in compute-intensive scientific and engineering applications. They perform well on dedicated distributed memory machines and workstation clusters. However, their performance can deteriorate on multiprogrammed shared memory machines (SMMs) or clusters of those machines. This project will optimize execution of parallel programs through both program transformation and efficient run-time support. The resulting programs will deliver robust performance in both dedicated and multiprogrammed SMM clusters.
Technically, the work has three aspects:
It will study compile-time code transformations to achieve threaded execution of parallel code on a cluster of SMMs, allowing each MPI node to be executed safely as a thread. It will study thread-safe run-time execution and fast lock-free communication that takes advantage of address space sharing among threads within an SMM. It will evaluate and model a variety of scientific applications (including sparse-matrix algorithms with irregular computation, PDE computations with coarse-grain computation, and data-intensive applications) to verify the proposed techniques.
|
0.961 |
2004 — 2010 |
Yang, Tao Liu, Xu-Dong (co-PI) [⬀] Petzold, Linda [⬀] Alkire, Richard |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr - (Ase) - (Sim+Dmc): Computational Toolbox For the Investigation of Multiscale Surface Processes @ University of California-Santa Barbara
This project focuses on the development of a computational toolbox for investigation of multiscale surface processes that are central to nanotechnology as well as other current technologies. Two physical systems will be studied that span from nano-scale phenomena to large-scale deterministic transport phenomena. The algorithms and software, developed to simulate and extract information from multiscale systems, are generic over a broad class of problems, and will contribute well beyond the applications used in their development. The physical systems include electrodeposition of metallic nanoclusters with additives to achieve specific shapes, and environmental degradation through interaction of pits, crevices and cracks. The physical systems, chosen for their computational structure, are characteristic of a large class of systems where controlled shape evolution is exploited to produce desired structures. Key issues are to understand how small-scale surface interactions guide spontaneous self-organization, how to extract insight from noisy data and uncertain fundamental understanding, and how to insure quality control at multiple scales in manufacturing. Computational tools will be developed for simulation and sensitivity analysis in multi-phenomena multiscale systems that require methods for coupling of stochastic and deterministic models. Challenges for deterministic simulation include the effective use of parallel computers, and dealing with moving boundaries, ill-conditioning and stiffness. We will explore classes of preconditioners for the iterative methods that solve large linear systems of equations at each time step, in particular a newly-developed multigrid method that is well-suited to moving boundary problems. Challenges for stochastic simulation include stiffness, which has only recently been recognized as a barrier to efficiency for stochastic simulation. Sensitivity analysis is an important part of this effort. For the deterministic computations, we will make use of recently developed methods that are adaptive in space and time. We will develop new sensitivity analysis methods and software for stochastic systems, and couple them to deterministic sensitivity analysis for the physical systems of interest. We will facilitate the use of our toolkit by extension to larger-scale software systems of a recently-developed environment for the rapid creation of GUIs for scientific and numerical software. This project addresses the National Priority Area of Advanced Science and Engineering (ASE), and the Technical Focus Areas of Innovation in computational modeling or simulation in research or education (sim) (primary), and of Innovative approaches to the integration of data, models, communications, analysis and/or control systems, including dynamic, data-driven applications for use in prediction, risk-assessment and decision-making (dmc) (secondary). Broader Impacts The proposed project will impact the National Priority Area of ASE through the development of algorithms and software to enhance the use of high performance computers in the investigation of multiscale surface processes. The availability of such a toolbox will accelerate fundamental scientific research and engineering design in an area with the potential for large economic impact. Software developed as a result of this project will be widely distributed in the scientific and engineering, computer science and mathematical sciences communities. The educational activities feature a multidisciplinary, cross-institutional approach to graduate education. Students will work in multidisciplinary teams, with joint thesis advisors from a primary and a secondary discipline. This approach has recently been undertaken at UCSB with some success; we plan to institutionalize this approach to graduate education in Computational Science and Engineering (CSE) at UCSB, and to export the model to UIUC. The model also includes industrial internships, career development workshops, and mentoring of undergraduates. Both UIUC and UCSB have been pioneers in developing graduate programs in CSE and have programs with a similar structure which will facilitate the sharing of educational ideas and innovations across the institutions.
|
0.961 |
2004 — 2008 |
Yang, Tao Krintz, Chandra (co-PI) [⬀] Belding, Elizabeth (co-PI) [⬀] El Abbadi, Amr Agrawal, Divyakant [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
RR: Wireless Sensor Network Laboratory Infrastructure @ University of California-Santa Barbara
This project, conducting research in the areas of sensor networks and distributed and cluster computing, experiments with precision-based query processing over sensor networks and wireless networking. Since sensor devices are powered by ordinary batteries, power constitutes a limiting resource in sensor networks. To address this limitation, power-aware query techniques are proposed for aggregation queries. Instead of providing users with exact answers, precision is introduced, giving full control to the users of the tradeoff between precision and energy usage. Another project studies the design and implementation of a self-organizing storage cluster built upon commodity components, investigating performance and availability of large-scale storage clusters as well as load balancing and data replication support, suitable for data intensive applications. Off loading techniques that effectively partition computation across devices are also under study, since these techniques can significantly extend the life of hand-held devices. Still another project investigates techniques to effectively partition computation across devices conserving the "virtual battery life" of the system. improving the efficacy of future partitioning decisions. Considering feedback from the system, the techniques should be able to "adapt" to changing conditions in the system. The off-loading techniques for heterogeneous systems include three categories of devices: sensors, mobile devices, and power workstations. The research seeks lightweight service discovery solutions that can determine the resources provided by nodes in the heterogeneous network. The infrastructure enables research in the following areas: Sensor Information Processing, Sensor Data Collection, Cluster-Based Data Storage, Power Management of Sensor-Based System Components, and Wireless and Service and Resource Discovery in Sensor Networks. The proposed infrastructure will be used to experiment with precision-based query processing over sensor networks.
Broader Impact: The lab infrastrucure provides means to integrate a variety of sensing devices with different characteristics. Academic, as well as commercial research will benefit. Software artifacts and experience into novel curricular directions impact the training of students.
|
0.961 |
2011 — 2016 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Parallel Similarity Comparison and Duplicate Detection With Incremental Computing @ University of California-Santa Barbara
All-pairs similarity comparison is one of the core algorithms in many data-intensive mining and search applications such as near duplicate detection among web pages, spam detection, advertisement click analysis, similar news/fresh content grouping, and recommendation for similar product purchases and search queries. Conducting similarity search on large datasets is time consuming and becomes more challenging when data are being updated continuously. It is important to develop high performance algorithms and software to meet the increasing speed demands in many consumer and business applications using similarity computation.
This project studies efficient and cost-effective parallel algorithms when data are being updated periodically or dynamically. Techniques for partitioning data and balancing computation on a cluster of machines are developed to optimize input/output operations, communication, and computing resource usage. As data are often updated continuously, leveraging previously computed results to handle updated data can eliminate a large amount of unnecessary operations and speedup the entire computation process by an order of magnitude. The project develops efficient software on a cluster of machines. The project starts with incremental duplicate detection for web data analysis and search, and continues to work on similarity comparison in several other applications. Performance of developed software is evaluated in those applications.
This research has the potential to develop fully-optimized solutions with significantly reduced cost and increased speed for a variety of big data applications that perform similarity analysis. Developed software will be made available for application developers or data engineers to conduct large-scale computation without involving the complexity of managing parallelism. The project web site (http://www.cs.ucsb.edu/projects/psc/) is used for dissemination of results. The educational plan contains research mentoring, undergraduate and graduate instruction improvement, and outreach activities such as working with high school students.
|
0.961 |
2015 — 2018 |
Yang, Tao Tessaro, Stefano (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Low-Cost Deduplication and Search For Versioned Datasets @ University of California-Santa Barbara
Organizations and companies often archive high volumes of versioned digital datasets. There are research challenges and opportunities for developing integrated archival and search support needed for data preservation, electronic discovery, and regulatory compliance. Since versioned datasets contain highly repetitive content, deduplication can reduce the storage demand by an order of magnitude or more; however such an optimization is resource-intensive. After deduplication, the structure of an inverted index for versioned data becomes complex and it is expensive to search relevant results. This project will study low-cost solutions for compact archiving and indexing and develop efficient algorithms and systems techniques for searching versioned datasets. It will also consider that the archived data can be stored in an untrusted server environment and investigate tradeoffs in efficiency and privacy-preservation for search. The developed solutions will bring significant computing and storage cost advantages for application users involving large-scale versioned data management and search. The developed software will be made public for research communities. The research effort will be integrated with an educational plan containing research mentoring, instruction improvement, and outreach activities.
This project will be focused on studying key challenges and cost-sensitive technical aspects in integrated archival and search support for managing large versioned datasets. The main tasks include efficient software architecture and optimization for detecting duplicated content on a cloud cluster architecture, fast multi-phase search with a hybrid index structure to exploit content similarity and query characteristics, and an efficient privacy-preserving framework with top result ranking.
|
0.961 |
2020 — 2022 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Efficient Privacy-Aware Document Search in the Cloud @ University of California-Santa Barbara
As sensitive information is increasingly stored in the cloud, privacy protection is a critical factor for users to adopt cloud-based information services such as document search. A cloud server can observe the client-initiated query processing flow, extract statistical patterns, and reason about client's data. As a result, the risk of leakage-abuse attacks exists when searching in the cloud. The main challenge to perform privacy-preserving search is that index visitation can reveal sensitive data patterns, and computation involved in advanced ranking can further expose private feature information. On the other hand, hiding index and feature information through full encryption prevents the server from performing effective scoring and result comparison. This project explores the challenging open problems in algorithmic indexing and ranking solutions for privacy-aware cloud data search. The approach emphasizes an evaluation-driven design where search performance is assessed in multiple aspects of relevance, efficiency, and privacy for practical system deployment. The project integrates the proposed research with an educational plan including undergraduate and graduate students' involvement in the research project, instructional material development, and outreach activities.
The exploratory research addresses two fundamental research challenges: (1) privacy-aware indexing and runtime support in matching documents for a given query with an emphasis to curtail statistical text information leakage while providing efficient and private access of ranking features; (2) privacy-aware end-to-end top-K ranking with a multi-stage scheme which seeks a combination of linear and nonlinear methods such as neural nets and learning ensembles. The design goal is to minimize the leakage of document features and characteristics while still accomplishing a reasonable response time and competitive relevance. The evaluation process will use public datasets to assess the effectiveness of the developed techniques for practical system deployment. This research effort will open the door for bridging the gap between privacy and advanced information retrieval in searching large encrypted datasets. The developed research results will be made public for research and industry communities.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.961 |
2022 — 2025 |
Yang, Tao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Efficiency Optimization For Neural Document Ranking With Compact Representations @ University of California-Santa Barbara
Over the last few years, the resurgence of neural models has greatly advanced the field of information retrieval enabling retrieval engines to effectively match and rank search results in response to a user query. For example, this new technology has enable to determine the most relevant documents in response to a query even when some query keywords may not appear in these documents. The main drawback of using deep neural models for ranking is that the retrieval is extremely time consuming. As a result, such models cannot be deployed in many practical search applications. This project is focused on studying efficient solutions to perform neural ranking computation and the developed techniques will be evaluated using public datasets to assess the solution’s effectiveness. The project integrates the research with an educational plan including undergraduate and graduate students' involvement, instructional material development, and outreach activities. <br/><br/>This project carries out a two-thrust research agenda for efficient neural ranking. The first thrust investigates a fast re-ranking scheme for a dual-encoding architecture by leveraging precomputed embeddings to compose a query representation with approximation, and combining deep contextual token interactions and traditional lexical matching features. The second thrust of this project investigates a compact representation of document embeddings and strike a balance of relevance and space efficiency which affects online inference latency. The project exploits the composite nature of ranking inference for answering a query to approximate query embeddings, and decouples ranking contribution of document embeddings in deriving a compact representation. This research will advance our fundamental understanding of relevance and efficiency tradeoffs in neural information retrieval, and significantly reduce the computing and space cost of online inference while retaining the essential benefits of deep learning for effective ranking on affordable computing platforms.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.961 |