語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
A tightness continuum measure of Chi...
~
Xu, Ying.
FindBook
Google Book
Amazon
博客來
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
紀錄類型:
書目-語言資料,印刷品 : Monograph/item
正題名/作者:
A tightness continuum measure of Chinese semantic units, and its application to information retrieval./
作者:
Xu, Ying.
面頁冊數:
68 p.
附註:
Source: Masters Abstracts International, Volume: 48-05, page: 2561.
Contained By:
Masters Abstracts International48-05.
標題:
Language, Linguistics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=MR60539
ISBN:
9780494605394
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
Xu, Ying.
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
- 68 p.
Source: Masters Abstracts International, Volume: 48-05, page: 2561.
Thesis (M.Sc.)--University of Alberta (Canada), 2010.
Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks such as machine translation (MT) and information retrieval (IR). Previous work has shown a non-monotonic relation between improvements in Chinese segmentation performance and performance on NLP tasks. Our research also suggests that different tasks need different criteria for Chinese segmentation.
ISBN: 9780494605394Subjects--Topical Terms:
1018079
Language, Linguistics.
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
LDR
:03039nam 2200289 4500
001
1395489
005
20110518115305.5
008
130515s2010 ||||||||||||||||| ||eng d
020
$a
9780494605394
035
$a
(UMI)AAIMR60539
035
$a
AAIMR60539
040
$a
UMI
$c
UMI
100
1
$a
Xu, Ying.
$3
1674185
245
1 2
$a
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
300
$a
68 p.
500
$a
Source: Masters Abstracts International, Volume: 48-05, page: 2561.
502
$a
Thesis (M.Sc.)--University of Alberta (Canada), 2010.
520
$a
Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks such as machine translation (MT) and information retrieval (IR). Previous work has shown a non-monotonic relation between improvements in Chinese segmentation performance and performance on NLP tasks. Our research also suggests that different tasks need different criteria for Chinese segmentation.
520
$a
We propose a tightness continuum for Chinese semantic units which provides a more principled approach to the coupling of segmentation methods and NLP application tasks. The construction of the continuum is based on calculating the frequency distribution of units' segmentation patterns. For a Chinese character sequence of length n, 2 n-1 potential segmentation candidates exist. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks.
520
$a
In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter: it combines units which are tightly connected according to statistical information but segmented by the former segmenter, and segments units which are not tight but previously treated as one unit. The second method embeds the tightness value into IR score functions according to our hypothesis that terms in tight queries are more likely to be consecutive in relevant documents than terms in loose queries. After analyzing the currently available Chinese test collections, we found that they are not suitable for evaluating the effects of Chinese segmentation, especially the segmentation of Chinese compounds, on IR. So we created a focused test collection. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems. As another consequence our experiments demonstrate a strong need for additional corpora for the investigation of Chinese IR.
590
$a
School code: 0351.
650
4
$a
Language, Linguistics.
$3
1018079
650
4
$a
Information Science.
$3
1017528
650
4
$a
Computer Science.
$3
626642
690
$a
0290
690
$a
0723
690
$a
0984
710
2
$a
University of Alberta (Canada).
$3
626651
773
0
$t
Masters Abstracts International
$g
48-05.
790
$a
0351
791
$a
M.Sc.
792
$a
2010
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=MR60539
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9158628
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入