語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Scaling Human Feedback.
~
Candidate, Minae Kwon.
FindBook
Google Book
Amazon
博客來
Scaling Human Feedback.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Scaling Human Feedback./
作者:
Candidate, Minae Kwon.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:
120 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Contained By:
Dissertations Abstracts International85-11B.
標題:
Robots. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679
ISBN:
9798382642543
Scaling Human Feedback.
Candidate, Minae Kwon.
Scaling Human Feedback.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 120 p.
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Thesis (Ph.D.)--Stanford University, 2023.
Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
ISBN: 9798382642543Subjects--Topical Terms:
529507
Robots.
Scaling Human Feedback.
LDR
:02554nmm a2200313 4500
001
2398374
005
20240812064624.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798382642543
035
$a
(MiAaPQ)AAI31049679
035
$a
(MiAaPQ)STANFORDsy876pv8068
035
$a
AAI31049679
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Candidate, Minae Kwon.
$3
3768282
245
1 0
$a
Scaling Human Feedback.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
120 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
500
$a
Advisor: Goodman, Noah;Yang, Diyi;Sadigh, Dorsa.
502
$a
Thesis (Ph.D.)--Stanford University, 2023.
520
$a
Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
590
$a
School code: 0212.
650
4
$a
Robots.
$3
529507
650
4
$a
Negotiations.
$3
3564485
650
4
$a
Games.
$3
525308
650
4
$a
Robotics.
$3
519753
690
$a
0771
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
85-11B.
790
$a
0212
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9506694
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入