語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Towards Inclusive Low-Resource Speec...
~
Johnson, Alexander.
FindBook
Google Book
Amazon
博客來
Towards Inclusive Low-Resource Speech Technologies: A Case Study of Educational Systems for African American English-Speaking Children.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Towards Inclusive Low-Resource Speech Technologies: A Case Study of Educational Systems for African American English-Speaking Children./
作者:
Johnson, Alexander.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2024,
面頁冊數:
119 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Contained By:
Dissertations Abstracts International85-09A.
標題:
Electrical engineering. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31139575
ISBN:
9798381953404
Towards Inclusive Low-Resource Speech Technologies: A Case Study of Educational Systems for African American English-Speaking Children.
Johnson, Alexander.
Towards Inclusive Low-Resource Speech Technologies: A Case Study of Educational Systems for African American English-Speaking Children.
- Ann Arbor : ProQuest Dissertations & Theses, 2024 - 119 p.
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
Thesis (Ph.D.)--University of California, Los Angeles, 2024.
The potential of speech technology to improve educational outcomes has been a topic of great interest in recent years. For example, automatic speech recognition (ASR) systems could be employed to provide kindergarten-aged children with real-time feedback on their literacy and pronunciation as they practice reading aloud. Within these systems, speaker identification (SID) technology could additionally be used to identify the user's speaker characteristics in order to ensure that they receive age, language, and dialect-appropriate feedback. While these technologies are more established for well-represented groups in STEM (ie. able-bodied, adult, first-language speakers of mainstream dialects), they give much worse performance for underrepresented groups (young children, speakers of non-mainstream dialects, people with speech-related disabilities, etc.). This work focuses on improving speech technology performance for children's speech and African American English (AAE) dialect speech with the goal of creating more equitable outcomes in early education. The contributions of this work span three primary areas: 1) Dialect identification and density scoring, 2) data augmentation for speech recognition, and 3) Natural Language Processing for fair and inclusive automatic speech assessment.First, we create a robust system for dialect identification of African American English for both children and adult's speech. This system aims to take an input utterance from a speaker of either African American English or Mainstream American English and determine which of the two dialects the utterance belongs. The system fuses features from paralinguistics, self-supervised learning representations, automatic speech recognition system outputs, prosodic contours, and other descriptors of the speech signal in order to learn a mapping from the input acoustic information to a dialect classification decision. We further explore this architecture in automatic dialect density estimation, a task we create and develop. In dialect density scoring, we train a system to automatically predict a speaker's frequency of usage of dialect-specific patterns. This information can then be passed to a speech recognition system for more dialect-informed processing.Second, we develop a data augmentation algorithm to improve zero-shot and few-shot speech recognition of low-resource dialects. The algorithm, named LPCAugment, deconstructs an input speech signal into a source and filter representation using linear predictive coding (LPC) analysis. The poles of the filter representation can then be perturbed independently of the source representation in order to model formant shifts that may be seen across accents and dialects. We use this perturbation method to artificially generate speech samples with shifted formant locations to serve as additional training data for a speech recognition system. This speech recognition system is then evaluated on children's speech for child speakers of a Southern California dialect and child speakers of an Atlanta, Georgia, area dialect.Third, we explore automatic analysis and scoring of speech recognition transcripts for educational assessments. Given information about a student's spoken dialect and automatically generated transcripts of their oral response to an assessment prompt, we train a system to automatically grade the quality of the response with respect to a pre-determined criterion. This system uses language modeling and spoken information retrieval to identify key features in the spoken response and holistically decide if the response aligns with the grading criteria. Combined, the steps in this work form a framework for inclusive spoken language understanding technology that can be used to perform provide students with dialect-appropriate language training or language assessment.
ISBN: 9798381953404Subjects--Topical Terms:
649834
Electrical engineering.
Subjects--Index Terms:
Speech technology
Towards Inclusive Low-Resource Speech Technologies: A Case Study of Educational Systems for African American English-Speaking Children.
LDR
:05165nmm a2200409 4500
001
2398387
005
20240812064627.5
006
m o d
007
cr#unu||||||||
008
251215s2024 ||||||||||||||||| ||eng d
020
$a
9798381953404
035
$a
(MiAaPQ)AAI31139575
035
$a
AAI31139575
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Johnson, Alexander.
$3
3768298
245
1 0
$a
Towards Inclusive Low-Resource Speech Technologies: A Case Study of Educational Systems for African American English-Speaking Children.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2024
300
$a
119 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-09, Section: A.
500
$a
Advisor: Alwan, Abeer A.
502
$a
Thesis (Ph.D.)--University of California, Los Angeles, 2024.
520
$a
The potential of speech technology to improve educational outcomes has been a topic of great interest in recent years. For example, automatic speech recognition (ASR) systems could be employed to provide kindergarten-aged children with real-time feedback on their literacy and pronunciation as they practice reading aloud. Within these systems, speaker identification (SID) technology could additionally be used to identify the user's speaker characteristics in order to ensure that they receive age, language, and dialect-appropriate feedback. While these technologies are more established for well-represented groups in STEM (ie. able-bodied, adult, first-language speakers of mainstream dialects), they give much worse performance for underrepresented groups (young children, speakers of non-mainstream dialects, people with speech-related disabilities, etc.). This work focuses on improving speech technology performance for children's speech and African American English (AAE) dialect speech with the goal of creating more equitable outcomes in early education. The contributions of this work span three primary areas: 1) Dialect identification and density scoring, 2) data augmentation for speech recognition, and 3) Natural Language Processing for fair and inclusive automatic speech assessment.First, we create a robust system for dialect identification of African American English for both children and adult's speech. This system aims to take an input utterance from a speaker of either African American English or Mainstream American English and determine which of the two dialects the utterance belongs. The system fuses features from paralinguistics, self-supervised learning representations, automatic speech recognition system outputs, prosodic contours, and other descriptors of the speech signal in order to learn a mapping from the input acoustic information to a dialect classification decision. We further explore this architecture in automatic dialect density estimation, a task we create and develop. In dialect density scoring, we train a system to automatically predict a speaker's frequency of usage of dialect-specific patterns. This information can then be passed to a speech recognition system for more dialect-informed processing.Second, we develop a data augmentation algorithm to improve zero-shot and few-shot speech recognition of low-resource dialects. The algorithm, named LPCAugment, deconstructs an input speech signal into a source and filter representation using linear predictive coding (LPC) analysis. The poles of the filter representation can then be perturbed independently of the source representation in order to model formant shifts that may be seen across accents and dialects. We use this perturbation method to artificially generate speech samples with shifted formant locations to serve as additional training data for a speech recognition system. This speech recognition system is then evaluated on children's speech for child speakers of a Southern California dialect and child speakers of an Atlanta, Georgia, area dialect.Third, we explore automatic analysis and scoring of speech recognition transcripts for educational assessments. Given information about a student's spoken dialect and automatically generated transcripts of their oral response to an assessment prompt, we train a system to automatically grade the quality of the response with respect to a pre-determined criterion. This system uses language modeling and spoken information retrieval to identify key features in the spoken response and holistically decide if the response aligns with the grading criteria. Combined, the steps in this work form a framework for inclusive spoken language understanding technology that can be used to perform provide students with dialect-appropriate language training or language assessment.
590
$a
School code: 0031.
650
4
$a
Electrical engineering.
$3
649834
650
4
$a
Educational technology.
$3
517670
650
4
$a
Linguistics.
$3
524476
650
4
$a
African American studies.
$3
2122686
650
4
$a
Communication.
$3
524709
653
$a
Speech technology
653
$a
Automatic speech recognition
653
$a
Speaker identification
653
$a
Linear predictive coding
653
$a
African American English
690
$a
0544
690
$a
0459
690
$a
0290
690
$a
0710
690
$a
0296
710
2
$a
University of California, Los Angeles.
$b
Electrical and Computer Engineering 0333.
$3
3542511
773
0
$t
Dissertations Abstracts International
$g
85-09A.
790
$a
0031
791
$a
Ph.D.
792
$a
2024
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31139575
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9506707
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入