東華大學圖書館 |

Learning with Neural Networks in High Dimensions /

Record Type:	Electronic resources : Monograph/item
Title/Author:	Learning with Neural Networks in High Dimensions // Theodor Misiakiewicz.
Author:	Misiakiewicz, Theodor,
Description:	1 electronic resource (393 pages)
Notes:	Source: Dissertations Abstracts International, Volume: 85-04, Section: B.
Contained By:	Dissertations Abstracts International85-04B.
Subject:	Neural networks. -
Online resource:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30615088
ISBN:	9798380485166

Learning with Neural Networks in High Dimensions /
Misiakiewicz, Theodor,

Learning with Neural Networks in High Dimensions /Theodor Misiakiewicz. - 1 electronic resource (393 pages)

Source: Dissertations Abstracts International, Volume: 85-04, Section: B.

A central challenge of modern statistics concerns the curse of dimensionality,which refers to the difficulty of learning in high dimensions due to an exponential increase in degrees of freedom. Over the past half-century, researchers have developed various methods to overcome this curse, including clever feature engineering and prior assumptions on the data structure. In recent years, however, deep learning has emerged as a standard and powerful approach to exploiting massive highdimensional datasets, vastly outperforming previous handcrafted and domain-specific methods. Despite the blackbox nature of neural networks, they appear to avoid the curse of dimensionality by automatically learning relevant features and adapting to inherent structures in the data. However, understanding precisely the mechanism behind this capability remains an outstanding challenge. For example, it is unclear what relevant features neural networks can learn efficiently, how gradient-type algorithms construct these features dynamically, or how different resources (sample size, network width, and training time) should scale for solving different tasks. Historically, neural networks' approximation and statistical properties have been investigated separately from computational aspects. This gap leaves open the study of the subset of neural networks that are of practical interest, namely those efficiently constructed by gradient-based algorithms.In this thesis, we consider three models, corresponding to two-layer neural networks trained in three different optimization regimes, and investigate their properties in high dimensions. In particular, our aim is to precisely characterize in each setting how the performance of neural networks depends on the number of samples, neurons, and gradient descent iterations used in their training.• Neural networks in the linear regime:Neural networks trained in this regime effectively behave as linear methods. Specifically, we study random feature and kernel regression, which correspond respectively to finite-width and infinite-width models for linearized neural networks. We derive sharp asymptotics for the test error in a new polynomial high-dimensional scaling, under certain abstract conditions on the underlying kernel. We show that these conditions are verified by some classical high-dimensional examples. These results allow us to precisely characterize the performance of neural networks in the linear regime, including optimal width and impact of network architecture. In addition, they explain phenomena such as multiple descents in the risk curve and benign overfitting.• Neural networks in the mean-field regime:Training in this regime is non-linear and allows neural networks to perform feature learning, i.e., to construct features that are adapted to the data. As an instructive example, we consider the problem of learning multi-index functions on Boolean or isotropic Gaussian data. Those functions only depend on a latent low-dimensional subspace and are not learnable efficiently by kernel methods. In contrast, we show that in the mean-field regime, the training sequentially learns the function support with a saddle-to-saddle dynamic. The overall time complexity now depends on the target function's leap, which measures how "hierarchical" the function is, and not on the target function's smoothness as in the linear regime. In particular, this illustrates how non-linear training of neural networks can vastly outperform kernel methods by exploiting low-dimensional and hierarchical structures in the data to construct good features.• Convex neural networks: We introduce a family of convex problems over the space of infinite-width two-layer neural networks. These can be seen as generalizations of kernel methods, whereby the RKHS norm-which can be seen as a weighted ℓ2 norm over the second layer weights-is replaced by a weighted functional ℓp norm, 1 ≤ p ≤ 2. We first show that, for p > 1, these problems can be solved in polynomial time using a random feature approximation. This implies that convex neural networks are tractable methods that break the curse of dimensionality on an increasing set of functions as p decreases to 1. On the other hand, we argue that for p= 1, which achieves near-optimal statistical rates on multi-index functions, learning becomes computationally hard under some standard hardness assumptions.The picture that emerges from this study is that neural networks can realize a rich range of trade-offs between statistical and computational complexity. In particular, neural networks trained in the linear, mean-field, and convex regimes-which can be seen as implementing three different statistical learning paradigms (fixed features, feature learning, and feature selection respectively)- suffer very differently from the curse of dimensionality. While linearized neural networks are simple computational models, they are statistically inefficient in high dimensions, except for very smooth target functions. On the other hand, convex neural networks can achieve near statistical optimal rates on multi-index functions, but do not admit general tractable algorithms. Finally, non-linear training of neural networks can be seen as a tractable middle-ground between these two extremes. In particular, it outperforms fixed feature methods by allowing efficient construction of relevant features for data with low-dimensional and favorable hierarchical structure.

English

ISBN: 9798380485166Subjects--Topical Terms:

677449
Neural networks.

Learning with Neural Networks in High Dimensions /
LDR:06838nmm a22003853i 4500 001 2400452
005 20250522084129.5
006 m o d
007 cr|nu||||||||
008 251215s2023 miu||||||m |||||||eng d
020 $a 9798380485166
035 $a (MiAaPQD)AAI30615088
035 $a (MiAaPQD)STANFORDqx556bh6619
035 $a AAI30615088
040 $a MiAaPQD $b eng $c MiAaPQD $e rda
100 1 $a Misiakiewicz, Theodor, $e author. $3 3770451
245 1 0 $a Learning with Neural Networks in High Dimensions / $c Theodor Misiakiewicz.
264 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2023
300 $a 1 electronic resource (393 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-04, Section: B.
500 $a Advisors: Donoho, David; Johnstone, Iain; Montanari, Andrea Committee members: Bent, Stacey F.
502 $b Ph.D. $c Stanford University $d 2023.
520 $a A central challenge of modern statistics concerns the curse of dimensionality,which refers to the difficulty of learning in high dimensions due to an exponential increase in degrees of freedom. Over the past half-century, researchers have developed various methods to overcome this curse, including clever feature engineering and prior assumptions on the data structure. In recent years, however, deep learning has emerged as a standard and powerful approach to exploiting massive highdimensional datasets, vastly outperforming previous handcrafted and domain-specific methods. Despite the blackbox nature of neural networks, they appear to avoid the curse of dimensionality by automatically learning relevant features and adapting to inherent structures in the data. However, understanding precisely the mechanism behind this capability remains an outstanding challenge. For example, it is unclear what relevant features neural networks can learn efficiently, how gradient-type algorithms construct these features dynamically, or how different resources (sample size, network width, and training time) should scale for solving different tasks. Historically, neural networks' approximation and statistical properties have been investigated separately from computational aspects. This gap leaves open the study of the subset of neural networks that are of practical interest, namely those efficiently constructed by gradient-based algorithms.In this thesis, we consider three models, corresponding to two-layer neural networks trained in three different optimization regimes, and investigate their properties in high dimensions. In particular, our aim is to precisely characterize in each setting how the performance of neural networks depends on the number of samples, neurons, and gradient descent iterations used in their training.• Neural networks in the linear regime:Neural networks trained in this regime effectively behave as linear methods. Specifically, we study random feature and kernel regression, which correspond respectively to finite-width and infinite-width models for linearized neural networks. We derive sharp asymptotics for the test error in a new polynomial high-dimensional scaling, under certain abstract conditions on the underlying kernel. We show that these conditions are verified by some classical high-dimensional examples. These results allow us to precisely characterize the performance of neural networks in the linear regime, including optimal width and impact of network architecture. In addition, they explain phenomena such as multiple descents in the risk curve and benign overfitting.• Neural networks in the mean-field regime:Training in this regime is non-linear and allows neural networks to perform feature learning, i.e., to construct features that are adapted to the data. As an instructive example, we consider the problem of learning multi-index functions on Boolean or isotropic Gaussian data. Those functions only depend on a latent low-dimensional subspace and are not learnable efficiently by kernel methods. In contrast, we show that in the mean-field regime, the training sequentially learns the function support with a saddle-to-saddle dynamic. The overall time complexity now depends on the target function's leap, which measures how "hierarchical" the function is, and not on the target function's smoothness as in the linear regime. In particular, this illustrates how non-linear training of neural networks can vastly outperform kernel methods by exploiting low-dimensional and hierarchical structures in the data to construct good features.• Convex neural networks: We introduce a family of convex problems over the space of infinite-width two-layer neural networks. These can be seen as generalizations of kernel methods, whereby the RKHS norm-which can be seen as a weighted ℓ2 norm over the second layer weights-is replaced by a weighted functional ℓp norm, 1 ≤ p ≤ 2. We first show that, for p > 1, these problems can be solved in polynomial time using a random feature approximation. This implies that convex neural networks are tractable methods that break the curse of dimensionality on an increasing set of functions as p decreases to 1. On the other hand, we argue that for p= 1, which achieves near-optimal statistical rates on multi-index functions, learning becomes computationally hard under some standard hardness assumptions.The picture that emerges from this study is that neural networks can realize a rich range of trade-offs between statistical and computational complexity. In particular, neural networks trained in the linear, mean-field, and convex regimes-which can be seen as implementing three different statistical learning paradigms (fixed features, feature learning, and feature selection respectively)- suffer very differently from the curse of dimensionality. While linearized neural networks are simple computational models, they are statistically inefficient in high dimensions, except for very smooth target functions. On the other hand, convex neural networks can achieve near statistical optimal rates on multi-index functions, but do not admit general tractable algorithms. Finally, non-linear training of neural networks can be seen as a tractable middle-ground between these two extremes. In particular, it outperforms fixed feature methods by allowing efficient construction of relevant features for data with low-dimensional and favorable hierarchical structure.
546 $a English
590 $a School code: 0212
650 4 $a Neural networks. $3 677449
690 $a 0800
710 2 $a Stanford University. $e degree granting institution. $3 3765820
720 1 $a Donoho, David $e degree supervisor.
720 1 $a Johnstone, Iain $e degree supervisor.
720 1 $a Montanari, Andrea $e degree supervisor.
773 0 $t Dissertations Abstracts International $g 85-04B.
790 $a 0212
791 $a Ph.D.
792 $a 2023
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30615088