2019 — 2022 
Chen, Yuxin 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Cif: Small: Taming Nonconvexity in HighDimensional Statistical Estimation
Many of today's applications in science and engineering require the efficient information processing of massive data sets in order to extract critical information and actionable insights for reliable decision making. Yet, even with the enormous power of cloud computing, it is computationally infeasible for classical statistical algorithms to process and analyze the massive amount of data generated daily. At the core of such challenges is the mathematical concept of 'nonconvexity', that permeates contemporary information processing tasks. Due to the highly complex nature of data acquisition mechanisms, classical statistical estimators often require the solution of highly nonconvex optimization problems. Current theory predicts that such tasks can be daunting to solve in the worstcase, yet simple iterative algorithms like gradient descent are used thousands of times every day to solve highly nonconvex problems with remarkable empirical success. This huge gap between theory and practice needs to be bridged, and the goal of this project is to do so by developing new theory that better explains and predicts the performance of nonconvex optimization algorithms. The impact of this new theory will be felt by virtue of creating a foundational understanding of nonconvexity and will suggest novel ways to tackle some of the hard practical problems that feature nonconvexity as well.
This research project plans to address these pressing challenges by investigating lowcomplexity nonconvex optimization methods that enable efficient statistical estimation. The main goal is to demystify the unreasonable effectiveness of simple optimization algorithms through a novel combination of ideas from statistics and optimization, offering scalable statistical estimation solutions that are of immediate value to guide scientific discovery. In particular, the objective of this research project is fourfold: (1) Understand why random initialization suffices for solving important nonconvex statistical problems; (2) Understand why simple optimization algorithms are guaranteed to work even without sophisticated regularization; (3) Investigate how to reduce the undesired variability of optimization algorithms in the samplestarved regime; and (4) Study the effectiveness and benefits of simple spectral methods. The algorithms and techniques to be developed in this project will significantly enhance signal processing capabilities beyond the stateoftheart methods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

0.907 
2019 — 2023 
Chen, Yuxin 
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information 
Ri: Medium: Collaborative Research:Algorithmic HighDimensional Statistics: Optimality, Computtional Barriers, and HighDimensional Corrections
This research aims to address the pressing challenges on learning and inference from largedimensional data. Contemporary sensing and data acquisition technologies produce data at an unprecedented rate. A ubiquitous challenge in modern data applications is thus to efficiently and reliably extract relevant information and associated insights from a deluge of data. In the meantime, this challenge is exacerbated by the unprecedented growth of relevant features one needs to reason about, which oftentimes even outpaces the growth of data samples. Classical statistical inference paradigms, which either only work in the presence of an enormous number of data samples, or ignore the computational cost of the estimators at all, become highly insufficient, or even unreliable, for many emerging applications of machine learning and bigdata analytics.
To address the above pressing issues in high dimensions, novel theoretical tools need to be brought in the picture in order to provide a comprehensive understanding of the performance limits of various algorithms and tasks. The goal of this project is fourfold: First, to develop a modern theory to characterize precise performance of classical statistical algorithms in high dimensions. Second, to suggest proper corrections of classical statistical inference procedures to accommodate the samplestarved regime. Third, to develop computationally efficient algorithms that can provably attain the fundamental statistical limits, if possible. Finally, forth, to identify potential computational barriers if the fundamental statistical limits cannot be met. The transformative potential of the proposed research program is in the development of foundational statistical data analytics theory through a novel combination of statistics, approximation theory, statistical physics, mathematical optimization, and information theory, offering scalable statistical inference and learning algorithms. The theory and algorithms developed within this project will have direct impact on various engineering and science applications such as largescale machine learning, DNA sequencing, genetic disease analysis, and natural language processing. This collaborative program provides crossuniversity opportunities for students training, and we are committed to engaging and helping underrepresented and women students in STEM through longterm mentorships and outreach activities.This research aims to address the pressing challenges on learning and inference from largedimensional data. Contemporary sensing and data acquisition technologies produce data at an unprecedented rate. A ubiquitous challenge in modern data applications is thus to efficiently and reliably extract relevant information and associated insights from a deluge of data. In the meantime, this challenge is exacerbated by the unprecedented growth of relevant features one needs to reason about, which oftentimes even outpaces the growth of data samples. Classical statistical inference paradigms, which either only work in the presence of an enormous number of data samples, or ignore the computational cost of the estimators at all, become highly insufficient, or even unreliable, for many emerging applications of machine learning and bigdata analytics.
To address the above pressing issues in high dimensions, novel theoretical tools need to be brought in the picture in order to provide a comprehensive understanding of the performance limits of various algorithms and tasks. The goal of this project is fourfold: First, to develop a modern theory to characterize precise performance of classical statistical algorithms in high dimensions. Second, to suggest proper corrections of classical statistical inference procedures to accommodate the samplestarved regime. Third, to develop computationally efficient algorithms that can provably attain the fundamental statistical limits, if possible. Finally, forth, to identify potential computational barriers if the fundamental statistical limits cannot be met. The transformative potential of the proposed research program is in the development of foundational statistical data analytics theory through a novel combination of statistics, approximation theory, statistical physics, mathematical optimization, and information theory, offering scalable statistical inference and learning algorithms. The theory and algorithms developed within this project will have direct impact on various engineering and science applications such as largescale machine learning, DNA sequencing, genetic disease analysis, and natural language processing. This collaborative program provides crossuniversity opportunities for students training, and we are committed to engaging and helping underrepresented and women students in STEM through longterm mentorships and outreach activities.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

0.907 