東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Scaling Human Feedback.

Candidate, Minae Kwon.

FindBook

Google Book

Amazon

博客來

Scaling Human Feedback.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Scaling Human Feedback./
作者:	Candidate, Minae Kwon.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:	120 p.
附註:	Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Contained By:	Dissertations Abstracts International85-11B.
標題:	Robots. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679
ISBN:	9798382642543

Scaling Human Feedback.
Candidate, Minae Kwon.

Scaling Human Feedback. - Ann Arbor : ProQuest Dissertations & Theses, 2023 - 120 p.

Source: Dissertations Abstracts International, Volume: 85-11, Section: B.

Thesis (Ph.D.)--Stanford University, 2023.

Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.

ISBN: 9798382642543Subjects--Topical Terms:

529507
Robots.

Scaling Human Feedback.
LDR:02554nmm a2200313 4500 001 2398374
005 20240812064624.5
006 m o d
007 cr#unu||||||||
008 251215s2023 ||||||||||||||||| ||eng d
020 $a 9798382642543
035 $a (MiAaPQ)AAI31049679
035 $a (MiAaPQ)STANFORDsy876pv8068
035 $a AAI31049679
040 $a MiAaPQ $c MiAaPQ
100 1 $a Candidate, Minae Kwon. $3 3768282
245 1 0 $a Scaling Human Feedback.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2023
300 $a 120 p.
500 $a Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
500 $a Advisor: Goodman, Noah;Yang, Diyi;Sadigh, Dorsa.
502 $a Thesis (Ph.D.)--Stanford University, 2023.
520 $a Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
590 $a School code: 0212.
650 4 $a Robots. $3 529507
650 4 $a Negotiations. $3 3564485
650 4 $a Games. $3 525308
650 4 $a Robotics. $3 519753
690 $a 0771
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 85-11B.
790 $a 0212
791 $a Ph.D.
792 $a 2023
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679