東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁 到查詢結果 [ null ]

切換: 標籤 | MARC模式 | ISBD

End-to-End Modeling for Abstractive ...

Sharma, Roshan.

FindBook

Google Book

Amazon

博客來

End-to-End Modeling for Abstractive Speech Summarization.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	End-to-End Modeling for Abstractive Speech Summarization./
作者:	Sharma, Roshan.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2024,
面頁冊數:	130 p.
附註:	Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Contained By:	Dissertations Abstracts International85-09A.
標題:	Computer science. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30993760
ISBN:	9798381953312

End-to-End Modeling for Abstractive Speech Summarization.
Sharma, Roshan.

End-to-End Modeling for Abstractive Speech Summarization. - Ann Arbor : ProQuest Dissertations & Theses, 2024 - 130 p.

Source: Dissertations Abstracts International, Volume: 85-09, Section: A.

Thesis (Ph.D.)--Carnegie Mellon University, 2024.

In our increasingly interconnected world, where speech remains the most intuitive and natural form of communication, spoken language processing systems face a crucial challenge: they must do more than just categorize speech, they need to truly understand it to generate meaningful responses. One key aspect of this understanding is speech summarization, where a system condenses the important information from spoken input into a concise summary. This thesis delves into the challenge of generating abstractive textual summaries directly from speech.The classical approach involves cascade systems that realize speech summarization by first transcribing speech, and then summarizing the resulting transcript. However, this comes with many challenges including computational efficiency, domain mismatches, and error propagation. In this thesis, we propose an alternative-an end-to-end framework that directly optimizes a single sequence model for speech summarization. To implement such end-to-end models with constrained computing resources, we address challenges such as abstract learning, learning global acoustic context, dealing with paucity of data, and improving the quality of summaries using multiple references. We also shed light on observations from human annotation for speech summarization.We present multi-stage training using speech transcription as a pre-training task to address abstract learning and facilitate improved performance of end-to-end models. We describe multiple solutions to address the problem of global acoustic context-restricted self-attention, replacing self-attention with the Fourier transform, and two block-wise adaptation solutions BASS and R-BASS that reframe speech summarization through the lens of block-wise processing. To address the challenge of data paucity, we introduce work on two new datasets-SLUE-TED and Interview for abstractive speech summarization. An exploration of human annotation provides insights into best practices and the nature of differences between speech-based and transcript-based summaries. Finally, we propose a novel method called AugSumm to improve the diversity and fluency of speech summaries by leveraging auxiliary references from generative text models.

ISBN: 9798381953312Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Abstractive speech summarization

End-to-End Modeling for Abstractive Speech Summarization.
LDR:03419nmm a2200397 4500 001 2398355
005 20240812064619.5
006 m o d
007 cr#unu||||||||
008 251215s2024 ||||||||||||||||| ||eng d
020 $a 9798381953312
035 $a (MiAaPQ)AAI30993760
035 $a AAI30993760
040 $a MiAaPQ $c MiAaPQ
100 1 $a Sharma, Roshan. $3 3432720
245 1 0 $a End-to-End Modeling for Abstractive Speech Summarization.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2024
300 $a 130 p.
500 $a Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
500 $a Advisor: Raj, Bhiksha.
502 $a Thesis (Ph.D.)--Carnegie Mellon University, 2024.
520 $a In our increasingly interconnected world, where speech remains the most intuitive and natural form of communication, spoken language processing systems face a crucial challenge: they must do more than just categorize speech, they need to truly understand it to generate meaningful responses. One key aspect of this understanding is speech summarization, where a system condenses the important information from spoken input into a concise summary. This thesis delves into the challenge of generating abstractive textual summaries directly from speech.The classical approach involves cascade systems that realize speech summarization by first transcribing speech, and then summarizing the resulting transcript. However, this comes with many challenges including computational efficiency, domain mismatches, and error propagation. In this thesis, we propose an alternative-an end-to-end framework that directly optimizes a single sequence model for speech summarization. To implement such end-to-end models with constrained computing resources, we address challenges such as abstract learning, learning global acoustic context, dealing with paucity of data, and improving the quality of summaries using multiple references. We also shed light on observations from human annotation for speech summarization.We present multi-stage training using speech transcription as a pre-training task to address abstract learning and facilitate improved performance of end-to-end models. We describe multiple solutions to address the problem of global acoustic context-restricted self-attention, replacing self-attention with the Fourier transform, and two block-wise adaptation solutions BASS and R-BASS that reframe speech summarization through the lens of block-wise processing. To address the challenge of data paucity, we introduce work on two new datasets-SLUE-TED and Interview for abstractive speech summarization. An exploration of human annotation provides insights into best practices and the nature of differences between speech-based and transcript-based summaries. Finally, we propose a novel method called AugSumm to improve the diversity and fluency of speech summaries by leveraging auxiliary references from generative text models.
590 $a School code: 0041.
650 4 $a Computer science. $3 523869
650 4 $a Linguistics. $3 524476
650 4 $a Acoustics. $3 879105
650 4 $a Communication. $3 524709
653 $a Abstractive speech summarization
653 $a Speech recognition
653 $a Human summarization
653 $a Human annotations
653 $a Spoken language processing
690 $a 0984
690 $a 0459
690 $a 0290
690 $a 0986
710 2 $a Carnegie Mellon University. $b Electrical and Computer Engineering. $3 2094139
773 0 $t Dissertations Abstracts International $g 85-09A.
790 $a 0041
791 $a Ph.D.
792 $a 2024
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30993760