Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
End-to-End Modeling for Abstractive ...
~
Sharma, Roshan.
Linked to FindBook
Google Book
Amazon
博客來
End-to-End Modeling for Abstractive Speech Summarization.
Record Type:
Electronic resources : Monograph/item
Title/Author:
End-to-End Modeling for Abstractive Speech Summarization./
Author:
Sharma, Roshan.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2024,
Description:
130 p.
Notes:
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Contained By:
Dissertations Abstracts International85-09A.
Subject:
Computer science. -
Online resource:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30993760
ISBN:
9798381953312
End-to-End Modeling for Abstractive Speech Summarization.
Sharma, Roshan.
End-to-End Modeling for Abstractive Speech Summarization.
- Ann Arbor : ProQuest Dissertations & Theses, 2024 - 130 p.
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Thesis (Ph.D.)--Carnegie Mellon University, 2024.
In our increasingly interconnected world, where speech remains the most intuitive and natural form of communication, spoken language processing systems face a crucial challenge: they must do more than just categorize speech, they need to truly understand it to generate meaningful responses. One key aspect of this understanding is speech summarization, where a system condenses the important information from spoken input into a concise summary. This thesis delves into the challenge of generating abstractive textual summaries directly from speech.The classical approach involves cascade systems that realize speech summarization by first transcribing speech, and then summarizing the resulting transcript. However, this comes with many challenges including computational efficiency, domain mismatches, and error propagation. In this thesis, we propose an alternative-an end-to-end framework that directly optimizes a single sequence model for speech summarization. To implement such end-to-end models with constrained computing resources, we address challenges such as abstract learning, learning global acoustic context, dealing with paucity of data, and improving the quality of summaries using multiple references. We also shed light on observations from human annotation for speech summarization.We present multi-stage training using speech transcription as a pre-training task to address abstract learning and facilitate improved performance of end-to-end models. We describe multiple solutions to address the problem of global acoustic context-restricted self-attention, replacing self-attention with the Fourier transform, and two block-wise adaptation solutions BASS and R-BASS that reframe speech summarization through the lens of block-wise processing. To address the challenge of data paucity, we introduce work on two new datasets-SLUE-TED and Interview for abstractive speech summarization. An exploration of human annotation provides insights into best practices and the nature of differences between speech-based and transcript-based summaries. Finally, we propose a novel method called AugSumm to improve the diversity and fluency of speech summaries by leveraging auxiliary references from generative text models.
ISBN: 9798381953312Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Abstractive speech summarization
End-to-End Modeling for Abstractive Speech Summarization.
LDR
:03419nmm a2200397 4500
001
2398355
005
20240812064619.5
006
m o d
007
cr#unu||||||||
008
251215s2024 ||||||||||||||||| ||eng d
020
$a
9798381953312
035
$a
(MiAaPQ)AAI30993760
035
$a
AAI30993760
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Sharma, Roshan.
$3
3432720
245
1 0
$a
End-to-End Modeling for Abstractive Speech Summarization.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2024
300
$a
130 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
500
$a
Advisor: Raj, Bhiksha.
502
$a
Thesis (Ph.D.)--Carnegie Mellon University, 2024.
520
$a
In our increasingly interconnected world, where speech remains the most intuitive and natural form of communication, spoken language processing systems face a crucial challenge: they must do more than just categorize speech, they need to truly understand it to generate meaningful responses. One key aspect of this understanding is speech summarization, where a system condenses the important information from spoken input into a concise summary. This thesis delves into the challenge of generating abstractive textual summaries directly from speech.The classical approach involves cascade systems that realize speech summarization by first transcribing speech, and then summarizing the resulting transcript. However, this comes with many challenges including computational efficiency, domain mismatches, and error propagation. In this thesis, we propose an alternative-an end-to-end framework that directly optimizes a single sequence model for speech summarization. To implement such end-to-end models with constrained computing resources, we address challenges such as abstract learning, learning global acoustic context, dealing with paucity of data, and improving the quality of summaries using multiple references. We also shed light on observations from human annotation for speech summarization.We present multi-stage training using speech transcription as a pre-training task to address abstract learning and facilitate improved performance of end-to-end models. We describe multiple solutions to address the problem of global acoustic context-restricted self-attention, replacing self-attention with the Fourier transform, and two block-wise adaptation solutions BASS and R-BASS that reframe speech summarization through the lens of block-wise processing. To address the challenge of data paucity, we introduce work on two new datasets-SLUE-TED and Interview for abstractive speech summarization. An exploration of human annotation provides insights into best practices and the nature of differences between speech-based and transcript-based summaries. Finally, we propose a novel method called AugSumm to improve the diversity and fluency of speech summaries by leveraging auxiliary references from generative text models.
590
$a
School code: 0041.
650
4
$a
Computer science.
$3
523869
650
4
$a
Linguistics.
$3
524476
650
4
$a
Acoustics.
$3
879105
650
4
$a
Communication.
$3
524709
653
$a
Abstractive speech summarization
653
$a
Speech recognition
653
$a
Human summarization
653
$a
Human annotations
653
$a
Spoken language processing
690
$a
0984
690
$a
0459
690
$a
0290
690
$a
0986
710
2
$a
Carnegie Mellon University.
$b
Electrical and Computer Engineering.
$3
2094139
773
0
$t
Dissertations Abstracts International
$g
85-09A.
790
$a
0041
791
$a
Ph.D.
792
$a
2024
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30993760
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9506675
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login