語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Deep Learning on a Diet: An Error La...
~
Paul, Mansheej,
FindBook
Google Book
Amazon
博客來
Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning /
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning // Mansheej Paul.
作者:
Paul, Mansheej,
面頁冊數:
1 electronic resource (107 pages)
附註:
Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
Contained By:
Dissertations Abstracts International85-06B.
標題:
Sparsity. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30726832
ISBN:
9798381019674
Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning /
Paul, Mansheej,
Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning /
Mansheej Paul. - 1 electronic resource (107 pages)
Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
Example, for many applications on edge devices, networks need to have a small memory footprint to fit on-device. Current models on the other hand need entire servers to run on. The latency of current LLMs is also too large for many applications that require fast generation. Additionally, the large capital investment required to train current AI systems is a threat to the democratization of AI as only a few incumbents make all the decisions that go into developing these systems. It is thus imperative to develop methods that can be used to train models of similar capabilities under compute and data resource constraints.This leads us to the central question that motivates this thesis: when and how can we train small networks on small datasets that nevertheless perform as well as large, state of the art models. For network parameter eciency, we study sparse networks that share the same architectural structure as larger networks but have only a fraction of the parameters. For data eciency, we study how to prune datasets by identifying the subset of data most important for achieving good performance. Throughout this work, our focus will be on large scale empirical investigations that develop a scientific understanding of the principles that enable data and parameter eciency in deep learning. The goal is to not only build better methods for pruning network parameters and training datasets but to also build a foundational understanding about why pruning is possible in the first place.This intuition is critical for identifying the most important directions for developing future methods.To investigate these questions, we empirically study the training dynamics of deep networks from the perspective of the geometric structure of the error landscape. This approach has been used in recent works to scientifically understand many aspects of deep learning such as generalization, calibration, and the di↵erent phases of training [18, 20-22, 25, 45, 53]. We will use the tools developed in these works to study when and why pruning both network parameters and training examples is possible. In particular, in Chapter 2, we will show that the curvature of the error landscape around well performing dense networks governs our ability to find equally well performing sparse networks. In Chapter 3, we will develop a method that ranks examples by diculty and discover how example diculty impacts the large scale structure of the optimization landscape by shaping its error basins. Finally, in Chapter 4, we will use these insights to bias the training data distribution in the early phase of training to create an optimization landscape that is more amenable to eciently finding sparse deep networks (here we return to parameter sparsity). Through these investigations, we will uncover novel connections between training dynamics, the geometric properties of the error landscape, and eciency in deep learning.
English
ISBN: 9798381019674Subjects--Topical Terms:
3680690
Sparsity.
Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning /
LDR
:04225nmm a22003613i 4500
001
2400489
005
20250522084137.5
006
m o d
007
cr|nu||||||||
008
251215s2023 miu||||||m |||||||eng d
020
$a
9798381019674
035
$a
(MiAaPQD)AAI30726832
035
$a
(MiAaPQD)STANFORDwh462kf8223
035
$a
AAI30726832
040
$a
MiAaPQD
$b
eng
$c
MiAaPQD
$e
rda
100
1
$a
Paul, Mansheej,
$e
author.
$3
3770506
245
1 0
$a
Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning /
$c
Mansheej Paul.
264
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
1 electronic resource (107 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 85-06, Section: B.
500
$a
Advisors: Ganguli, Surya Committee members: Druckmann, Shaul; Yamins, Dan; Bent, Stacey F.
502
$b
Ph.D.
$c
Stanford University
$d
2023.
520
$a
Example, for many applications on edge devices, networks need to have a small memory footprint to fit on-device. Current models on the other hand need entire servers to run on. The latency of current LLMs is also too large for many applications that require fast generation. Additionally, the large capital investment required to train current AI systems is a threat to the democratization of AI as only a few incumbents make all the decisions that go into developing these systems. It is thus imperative to develop methods that can be used to train models of similar capabilities under compute and data resource constraints.This leads us to the central question that motivates this thesis: when and how can we train small networks on small datasets that nevertheless perform as well as large, state of the art models. For network parameter eciency, we study sparse networks that share the same architectural structure as larger networks but have only a fraction of the parameters. For data eciency, we study how to prune datasets by identifying the subset of data most important for achieving good performance. Throughout this work, our focus will be on large scale empirical investigations that develop a scientific understanding of the principles that enable data and parameter eciency in deep learning. The goal is to not only build better methods for pruning network parameters and training datasets but to also build a foundational understanding about why pruning is possible in the first place.This intuition is critical for identifying the most important directions for developing future methods.To investigate these questions, we empirically study the training dynamics of deep networks from the perspective of the geometric structure of the error landscape. This approach has been used in recent works to scientifically understand many aspects of deep learning such as generalization, calibration, and the di↵erent phases of training [18, 20-22, 25, 45, 53]. We will use the tools developed in these works to study when and why pruning both network parameters and training examples is possible. In particular, in Chapter 2, we will show that the curvature of the error landscape around well performing dense networks governs our ability to find equally well performing sparse networks. In Chapter 3, we will develop a method that ranks examples by diculty and discover how example diculty impacts the large scale structure of the optimization landscape by shaping its error basins. Finally, in Chapter 4, we will use these insights to bias the training data distribution in the early phase of training to create an optimization landscape that is more amenable to eciently finding sparse deep networks (here we return to parameter sparsity). Through these investigations, we will uncover novel connections between training dynamics, the geometric properties of the error landscape, and eciency in deep learning.
546
$a
English
590
$a
School code: 0212
650
4
$a
Sparsity.
$3
3680690
650
4
$a
Connectivity.
$3
3560754
650
4
$a
Deep learning.
$3
3554982
690
$a
0800
710
2
$a
Stanford University.
$e
degree granting institution.
$3
3765820
720
1
$a
Ganguli, Surya
$e
degree supervisor.
773
0
$t
Dissertations Abstracts International
$g
85-06B.
790
$a
0212
791
$a
Ph.D.
792
$a
2023
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30726832
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9508809
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入