語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
到查詢結果
[ null ]
切換:
標籤
|
MARC模式
|
ISBD
End-to-End Modeling for Abstractive ...
~
Sharma, Roshan.
FindBook
Google Book
Amazon
博客來
End-to-End Modeling for Abstractive Speech Summarization.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
End-to-End Modeling for Abstractive Speech Summarization./
作者:
Sharma, Roshan.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2024,
面頁冊數:
130 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Contained By:
Dissertations Abstracts International85-09A.
標題:
Computer science. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30993760
ISBN:
9798381953312
End-to-End Modeling for Abstractive Speech Summarization.
Sharma, Roshan.
End-to-End Modeling for Abstractive Speech Summarization.
- Ann Arbor : ProQuest Dissertations & Theses, 2024 - 130 p.
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Thesis (Ph.D.)--Carnegie Mellon University, 2024.
In our increasingly interconnected world, where speech remains the most intuitive and natural form of communication, spoken language processing systems face a crucial challenge: they must do more than just categorize speech, they need to truly understand it to generate meaningful responses. One key aspect of this understanding is speech summarization, where a system condenses the important information from spoken input into a concise summary. This thesis delves into the challenge of generating abstractive textual summaries directly from speech.The classical approach involves cascade systems that realize speech summarization by first transcribing speech, and then summarizing the resulting transcript. However, this comes with many challenges including computational efficiency, domain mismatches, and error propagation. In this thesis, we propose an alternative-an end-to-end framework that directly optimizes a single sequence model for speech summarization. To implement such end-to-end models with constrained computing resources, we address challenges such as abstract learning, learning global acoustic context, dealing with paucity of data, and improving the quality of summaries using multiple references. We also shed light on observations from human annotation for speech summarization.We present multi-stage training using speech transcription as a pre-training task to address abstract learning and facilitate improved performance of end-to-end models. We describe multiple solutions to address the problem of global acoustic context-restricted self-attention, replacing self-attention with the Fourier transform, and two block-wise adaptation solutions BASS and R-BASS that reframe speech summarization through the lens of block-wise processing. To address the challenge of data paucity, we introduce work on two new datasets-SLUE-TED and Interview for abstractive speech summarization. An exploration of human annotation provides insights into best practices and the nature of differences between speech-based and transcript-based summaries. Finally, we propose a novel method called AugSumm to improve the diversity and fluency of speech summaries by leveraging auxiliary references from generative text models.
ISBN: 9798381953312Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Abstractive speech summarization
End-to-End Modeling for Abstractive Speech Summarization.
LDR
:03419nmm a2200397 4500
001
2398355
005
20240812064619.5
006
m o d
007
cr#unu||||||||
008
251215s2024 ||||||||||||||||| ||eng d
020
$a
9798381953312
035
$a
(MiAaPQ)AAI30993760
035
$a
AAI30993760
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Sharma, Roshan.
$3
3432720
245
1 0
$a
End-to-End Modeling for Abstractive Speech Summarization.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2024
300
$a
130 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
500
$a
Advisor: Raj, Bhiksha.
502
$a
Thesis (Ph.D.)--Carnegie Mellon University, 2024.
520
$a
In our increasingly interconnected world, where speech remains the most intuitive and natural form of communication, spoken language processing systems face a crucial challenge: they must do more than just categorize speech, they need to truly understand it to generate meaningful responses. One key aspect of this understanding is speech summarization, where a system condenses the important information from spoken input into a concise summary. This thesis delves into the challenge of generating abstractive textual summaries directly from speech.The classical approach involves cascade systems that realize speech summarization by first transcribing speech, and then summarizing the resulting transcript. However, this comes with many challenges including computational efficiency, domain mismatches, and error propagation. In this thesis, we propose an alternative-an end-to-end framework that directly optimizes a single sequence model for speech summarization. To implement such end-to-end models with constrained computing resources, we address challenges such as abstract learning, learning global acoustic context, dealing with paucity of data, and improving the quality of summaries using multiple references. We also shed light on observations from human annotation for speech summarization.We present multi-stage training using speech transcription as a pre-training task to address abstract learning and facilitate improved performance of end-to-end models. We describe multiple solutions to address the problem of global acoustic context-restricted self-attention, replacing self-attention with the Fourier transform, and two block-wise adaptation solutions BASS and R-BASS that reframe speech summarization through the lens of block-wise processing. To address the challenge of data paucity, we introduce work on two new datasets-SLUE-TED and Interview for abstractive speech summarization. An exploration of human annotation provides insights into best practices and the nature of differences between speech-based and transcript-based summaries. Finally, we propose a novel method called AugSumm to improve the diversity and fluency of speech summaries by leveraging auxiliary references from generative text models.
590
$a
School code: 0041.
650
4
$a
Computer science.
$3
523869
650
4
$a
Linguistics.
$3
524476
650
4
$a
Acoustics.
$3
879105
650
4
$a
Communication.
$3
524709
653
$a
Abstractive speech summarization
653
$a
Speech recognition
653
$a
Human summarization
653
$a
Human annotations
653
$a
Spoken language processing
690
$a
0984
690
$a
0459
690
$a
0290
690
$a
0986
710
2
$a
Carnegie Mellon University.
$b
Electrical and Computer Engineering.
$3
2094139
773
0
$t
Dissertations Abstracts International
$g
85-09A.
790
$a
0041
791
$a
Ph.D.
792
$a
2024
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30993760
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9506675
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入
(1)帳號:一般為「身分證號」;外籍生或交換生則為「學號」。 (2)密碼:預設為帳號末四碼。
帳號
.
密碼
.
請在此電腦上記得個人資料
取消
忘記密碼? (請注意!您必須已在系統登記E-mail信箱方能使用。)