語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Information-Directed Sampling for Re...
~
Lu, Xiuyuan.
FindBook
Google Book
Amazon
博客來
Information-Directed Sampling for Reinforcement Learning.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Information-Directed Sampling for Reinforcement Learning./
作者:
Lu, Xiuyuan.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:
84 p.
附註:
Source: Dissertations Abstracts International, Volume: 82-10, Section: B.
Contained By:
Dissertations Abstracts International82-10B.
標題:
Algorithms. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28354109
ISBN:
9798597045030
Information-Directed Sampling for Reinforcement Learning.
Lu, Xiuyuan.
Information-Directed Sampling for Reinforcement Learning.
- Ann Arbor : ProQuest Dissertations & Theses, 2020 - 84 p.
Source: Dissertations Abstracts International, Volume: 82-10, Section: B.
Thesis (Ph.D.)--Stanford University, 2020.
This item must not be sold to any third party vendors.
Reinforcement learning has enjoyed a resurgence in popularity over the past decade thanks to the ever-increasing availability of computing power. Many success stories of reinforcement learning seem to suggest a potential gateway to creating intelligent agents that are capable of performing tasks with human-level proficiency. However, many state-of-the-art reinforcement learning algorithms require a tremendous amount of simulated data, which is not practical when data is generated from actual interactions in the real world. Addressing data efficiency will be crucial for making reinforcement learning practical for real-world applications.In this dissertation, we take an information-theoretic approach to reason about how an agent should acquire information in an environment to improve decision-making. We generalize the information-directed sampling (IDS) decision rule from online decision-making literature to reinforcement learning. This decision rule aims to acquire useful information about the environment while also taking into consideration the costs of information acquisition. We argue that IDS can demonstrate desirable information-seeking behaviors in a reinforcement learning problem where existing methods fail. We hypothesize that in practical environments that are typically rich in observations, IDS has the potential to significantly improve data efficiency relative to existing exploration schemes.Furthermore, we analyze the expected regret of IDS for three stylized classes of environments, linear bandits, tabular Markov decision processes (MDPs), and factored MDPs. We derive regret bounds that are nearly competitive with state-of-the-art regret bounds, which demonstrate promise of our information-theoretic design concept.Lastly, the form of IDS studied in this dissertation should be viewed as an agent design concept rather than a concrete algorithm. Major work needs to be done to design practical algorithms that preserve the benefits of this conceptual decision rule while being computationally tractable. We highlight some key aspects for designing a practical IDS agent and propose several research directions for addressing each aspect.
ISBN: 9798597045030Subjects--Topical Terms:
536374
Algorithms.
Subjects--Index Terms:
Reinforcement learning
Information-Directed Sampling for Reinforcement Learning.
LDR
:03321nmm a2200349 4500
001
2284753
005
20211124102955.5
008
220723s2020 ||||||||||||||||| ||eng d
020
$a
9798597045030
035
$a
(MiAaPQ)AAI28354109
035
$a
(MiAaPQ)STANFORDmx606hx2868
035
$a
AAI28354109
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Lu, Xiuyuan.
$3
3563937
245
1 0
$a
Information-Directed Sampling for Reinforcement Learning.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2020
300
$a
84 p.
500
$a
Source: Dissertations Abstracts International, Volume: 82-10, Section: B.
500
$a
Advisor: Van Roy, Benjamin;Brunskill, Emma;Johari, Ramesh.
502
$a
Thesis (Ph.D.)--Stanford University, 2020.
506
$a
This item must not be sold to any third party vendors.
520
$a
Reinforcement learning has enjoyed a resurgence in popularity over the past decade thanks to the ever-increasing availability of computing power. Many success stories of reinforcement learning seem to suggest a potential gateway to creating intelligent agents that are capable of performing tasks with human-level proficiency. However, many state-of-the-art reinforcement learning algorithms require a tremendous amount of simulated data, which is not practical when data is generated from actual interactions in the real world. Addressing data efficiency will be crucial for making reinforcement learning practical for real-world applications.In this dissertation, we take an information-theoretic approach to reason about how an agent should acquire information in an environment to improve decision-making. We generalize the information-directed sampling (IDS) decision rule from online decision-making literature to reinforcement learning. This decision rule aims to acquire useful information about the environment while also taking into consideration the costs of information acquisition. We argue that IDS can demonstrate desirable information-seeking behaviors in a reinforcement learning problem where existing methods fail. We hypothesize that in practical environments that are typically rich in observations, IDS has the potential to significantly improve data efficiency relative to existing exploration schemes.Furthermore, we analyze the expected regret of IDS for three stylized classes of environments, linear bandits, tabular Markov decision processes (MDPs), and factored MDPs. We derive regret bounds that are nearly competitive with state-of-the-art regret bounds, which demonstrate promise of our information-theoretic design concept.Lastly, the form of IDS studied in this dissertation should be viewed as an agent design concept rather than a concrete algorithm. Major work needs to be done to design practical algorithms that preserve the benefits of this conceptual decision rule while being computationally tractable. We highlight some key aspects for designing a practical IDS agent and propose several research directions for addressing each aspect.
590
$a
School code: 0212.
650
4
$a
Algorithms.
$3
536374
650
4
$a
Decision making.
$3
517204
650
4
$a
Markov analysis.
$3
3562906
653
$a
Reinforcement learning
653
$a
Decision making
653
$a
Information acquisition
690
$a
0984
690
$a
0454
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
82-10B.
790
$a
0212
791
$a
Ph.D.
792
$a
2020
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28354109
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9436486
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入