東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Linked to FindBook

Google Book

Amazon

博客來

Optimisation & Generalisation in Networks of Neurons.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Optimisation & Generalisation in Networks of Neurons./
Author:	Bernstein, Jeremy.
Description:	1 online resource (99 pages)
Notes:	Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Contained By:	Dissertations Abstracts International84-12B.
Subject:	Applied mathematics. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30548381click for full text (PQDT)
ISBN:	9798379694326

Optimisation & Generalisation in Networks of Neurons.
Bernstein, Jeremy.

Optimisation & Generalisation in Networks of Neurons. - 1 online resource (99 pages)

Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

Thesis (Ph.D.)--California Institute of Technology, 2023.

Includes bibliographical references

The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture:Which weight setting will generalise best to unseen data, and why?What optimiser should be used to recover this weight setting?On optimisation, an essential feature of neural network training is that the network weights affect the loss function only indirectly through their appearance in the network architecture. This thesis proposes a three-step framework for deriving novel "architecture aware" optimisation algorithms. The first step-termed functional majorisation-is to majorise a series expansion of the loss function in terms of functional perturbations. The second step is to derive architectural perturbation bounds that relate the size of functional perturbations to the size of weight perturbations. The third step is to substitute these architectural perturbation bounds into the functional majorisation of the loss and to obtain an optimisation algorithm via minimisation. This constitutes an application of the majorise-minimise meta-algorithm to neural networks.On generalisation, a promising recent line of work has applied PAC-Bayes theory to derive non-vacuous generalisation guarantees for neural networks. Since these guarantees control the average risk of ensembles of networks, they do not address which individual network should generalise best. To close this gap, the thesis rekindles an old idea from the kernels literature: the Bayes point machine. A Bayes point machine is a single classifier that approximates the aggregate prediction of an ensemble of classifiers. Since aggregation reduces the variance of ensemble predictions, Bayes point machines tend to generalise better than other ensemble members. The thesis shows that the space of neural networks consistent with a training set concentrates on a Bayes point machine if both the network width and normalised margin are sent to infinity. This motivates the practice of returning a wide network of large normalised margin.Potential applications of these ideas include novel methods for uncertainty quantification, more efficient numerical representations for neural hardware, and optimisers that transfer hyperparameters across learning problems.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798379694326Subjects--Topical Terms:

2122814
Applied mathematics.
Index Terms--Genre/Form:

542853
Electronic books.

Optimisation & Generalisation in Networks of Neurons.
LDR:03618nmm a2200349K 4500 001 2360144
005 20230925052836.5
006 m o d
007 cr mn ---uuuuu
008 241011s2023 xx obm 000 0 eng d
020 $a 9798379694326
035 $a (MiAaPQ)AAI30548381
035 $a (MiAaPQ)Caltech_oaithesislibrarycaltechedu15041
035 $a AAI30548381
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Bernstein, Jeremy. $3 924212
245 1 0 $a Optimisation & Generalisation in Networks of Neurons.
264 0 $c 2023
300 $a 1 online resource (99 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
500 $a Advisor: Yue, Yisong.
502 $a Thesis (Ph.D.)--California Institute of Technology, 2023.
504 $a Includes bibliographical references
520 $a The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture:Which weight setting will generalise best to unseen data, and why?What optimiser should be used to recover this weight setting?On optimisation, an essential feature of neural network training is that the network weights affect the loss function only indirectly through their appearance in the network architecture. This thesis proposes a three-step framework for deriving novel "architecture aware" optimisation algorithms. The first step-termed functional majorisation-is to majorise a series expansion of the loss function in terms of functional perturbations. The second step is to derive architectural perturbation bounds that relate the size of functional perturbations to the size of weight perturbations. The third step is to substitute these architectural perturbation bounds into the functional majorisation of the loss and to obtain an optimisation algorithm via minimisation. This constitutes an application of the majorise-minimise meta-algorithm to neural networks.On generalisation, a promising recent line of work has applied PAC-Bayes theory to derive non-vacuous generalisation guarantees for neural networks. Since these guarantees control the average risk of ensembles of networks, they do not address which individual network should generalise best. To close this gap, the thesis rekindles an old idea from the kernels literature: the Bayes point machine. A Bayes point machine is a single classifier that approximates the aggregate prediction of an ensemble of classifiers. Since aggregation reduces the variance of ensemble predictions, Bayes point machines tend to generalise better than other ensemble members. The thesis shows that the space of neural networks consistent with a training set concentrates on a Bayes point machine if both the network width and normalised margin are sent to infinity. This motivates the practice of returning a wide network of large normalised margin.Potential applications of these ideas include novel methods for uncertainty quantification, more efficient numerical representations for neural hardware, and optimisers that transfer hyperparameters across learning problems.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Applied mathematics. $3 2122814
650 4 $a Deep learning. $3 3554982
650 4 $a Computer peripherals. $3 659962
650 4 $a Hilbert space. $3 558371
650 4 $a Neural networks. $3 677449
650 4 $a Computer engineering. $3 621879
650 4 $a Algorithms. $3 536374
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0364
690 $a 0464
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a California Institute of Technology. $b Biology and Biological Engineering. $3 3700756
773 0 $t Dissertations Abstracts International $g 84-12B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30548381 $z click for full text (PQDT)