語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Morphological Generation with Deep Learning Approaches.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Morphological Generation with Deep Learning Approaches./
作者:
Liu, Ling.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
161 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-07, Section: B.
Contained By:
Dissertations Abstracts International83-07B.
標題:
Linguistics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28777127
ISBN:
9798762107280
Morphological Generation with Deep Learning Approaches.
Liu, Ling.
Morphological Generation with Deep Learning Approaches.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 161 p.
Source: Dissertations Abstracts International, Volume: 83-07, Section: B.
Thesis (Ph.D.)--University of Colorado at Boulder, 2021.
This item must not be sold to any third party vendors.
Most human languages use inflection to express different grammatical meanings like case, person, number, gender, tense, aspect etc. How to generate the correct inflected forms is an essential part of linguistic research, and morphological generation is also a fundamental task in natural language processing.The outstanding performance of deep learning approaches on morphological generation relies on large amounts of annotated data. The dependence on a large amount of annotated data makes it challenging to apply deep learning approaches to morphological generation in low-resource scenarios effectively. Therefore, this thesis aims at exploring different ways to improve the performance of deep learning approaches for morphological generation when faced with data scarcity.We compare different data augmentation methods and propose to use substrings rather than individual characters for data hallucination. Experimental results show that data hallucination with substrings is more effective than data hallucinated from individual characters. We also apply backtranslation to morphological generation, but we find that adding backtranslated data does not significantly improve the performance for morphological generation, though backtranslation has become a common-practice approach of data augmentation for low-resource machine translation.We further propose methods inspired by the linguistic notion of ``principal parts'' morphology to diversify the source form, and find that this method is helpful. Motivated by the analogy mechanism humans usually resort to for processing morphology, we propose to include both source words and target inflected forms for different words in the input in order to provide interparadigmatic and intraparadigmatic resources for the models to learn from. Previous approaches only have source words as input, i.e. intraparadigmatic resources. Our experiments show that models trained with both interparadigmatic and intrapradigmatic resources perform better in low-resource situations. We find that the overlap in the lemmata in the training and evaluation sets of current common-practice setup for morphological generation obscured the difficulty of the task. The model performance is much worse than reported in literature when evaluated on unseen lemmata, meaning that the model does not learn the generalization ability very well. Based on this finding, we suggest that future work should include lemma counts as a metric for defining data amount rather than using only inflected word form counts and evaluate model performance on seen and unseen lemmas separately.Another way to solve the data scarcity problem is to reduce the cost of data annotation. We propose a method to use deep learning approaches to facilitate morphological data annotation by detecting annotation errors. Experiments show that our proposed method is robust to noise in the training data and can detect annotation errors very effectively.Lastly, we conduct experiments on morphological generation with deep learning approaches in context. This task turns out to be much more challenging than morphological generation out of context. There is still a large scope for improvement for morphological generation in context with deep learning approaches.
ISBN: 9798762107280Subjects--Topical Terms:
524476
Linguistics.
Subjects--Index Terms:
Deep learning
Morphological Generation with Deep Learning Approaches.
LDR
:04419nmm a2200385 4500
001
2350666
005
20221020130412.5
008
241004s2021 ||||||||||||||||| ||eng d
020
$a
9798762107280
035
$a
(MiAaPQ)AAI28777127
035
$a
AAI28777127
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Liu, Ling.
$3
1036240
245
1 0
$a
Morphological Generation with Deep Learning Approaches.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
161 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-07, Section: B.
500
$a
Advisor: Hulden, Mans.
502
$a
Thesis (Ph.D.)--University of Colorado at Boulder, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
Most human languages use inflection to express different grammatical meanings like case, person, number, gender, tense, aspect etc. How to generate the correct inflected forms is an essential part of linguistic research, and morphological generation is also a fundamental task in natural language processing.The outstanding performance of deep learning approaches on morphological generation relies on large amounts of annotated data. The dependence on a large amount of annotated data makes it challenging to apply deep learning approaches to morphological generation in low-resource scenarios effectively. Therefore, this thesis aims at exploring different ways to improve the performance of deep learning approaches for morphological generation when faced with data scarcity.We compare different data augmentation methods and propose to use substrings rather than individual characters for data hallucination. Experimental results show that data hallucination with substrings is more effective than data hallucinated from individual characters. We also apply backtranslation to morphological generation, but we find that adding backtranslated data does not significantly improve the performance for morphological generation, though backtranslation has become a common-practice approach of data augmentation for low-resource machine translation.We further propose methods inspired by the linguistic notion of ``principal parts'' morphology to diversify the source form, and find that this method is helpful. Motivated by the analogy mechanism humans usually resort to for processing morphology, we propose to include both source words and target inflected forms for different words in the input in order to provide interparadigmatic and intraparadigmatic resources for the models to learn from. Previous approaches only have source words as input, i.e. intraparadigmatic resources. Our experiments show that models trained with both interparadigmatic and intrapradigmatic resources perform better in low-resource situations. We find that the overlap in the lemmata in the training and evaluation sets of current common-practice setup for morphological generation obscured the difficulty of the task. The model performance is much worse than reported in literature when evaluated on unseen lemmata, meaning that the model does not learn the generalization ability very well. Based on this finding, we suggest that future work should include lemma counts as a metric for defining data amount rather than using only inflected word form counts and evaluate model performance on seen and unseen lemmas separately.Another way to solve the data scarcity problem is to reduce the cost of data annotation. We propose a method to use deep learning approaches to facilitate morphological data annotation by detecting annotation errors. Experiments show that our proposed method is robust to noise in the training data and can detect annotation errors very effectively.Lastly, we conduct experiments on morphological generation with deep learning approaches in context. This task turns out to be much more challenging than morphological generation out of context. There is still a large scope for improvement for morphological generation in context with deep learning approaches.
590
$a
School code: 0051.
650
4
$a
Linguistics.
$3
524476
650
4
$a
Computer science.
$3
523869
650
4
$a
Artificial intelligence.
$3
516317
653
$a
Deep learning
653
$a
Generation
653
$a
Inflection
653
$a
Language
653
$a
Low-resource
653
$a
Morphology
690
$a
0290
690
$a
0984
690
$a
0800
710
2
$a
University of Colorado at Boulder.
$b
Linguistics.
$3
1037973
773
0
$t
Dissertations Abstracts International
$g
83-07B.
790
$a
0051
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28777127
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9473104
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入