Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
A tightness continuum measure of Chi...
~
Xu, Ying.
Linked to FindBook
Google Book
Amazon
博客來
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
Record Type:
Language materials, printed : Monograph/item
Title/Author:
A tightness continuum measure of Chinese semantic units, and its application to information retrieval./
Author:
Xu, Ying.
Description:
68 p.
Notes:
Source: Masters Abstracts International, Volume: 48-05, page: 2561.
Contained By:
Masters Abstracts International48-05.
Subject:
Language, Linguistics. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=MR60539
ISBN:
9780494605394
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
Xu, Ying.
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
- 68 p.
Source: Masters Abstracts International, Volume: 48-05, page: 2561.
Thesis (M.Sc.)--University of Alberta (Canada), 2010.
Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks such as machine translation (MT) and information retrieval (IR). Previous work has shown a non-monotonic relation between improvements in Chinese segmentation performance and performance on NLP tasks. Our research also suggests that different tasks need different criteria for Chinese segmentation.
ISBN: 9780494605394Subjects--Topical Terms:
1018079
Language, Linguistics.
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
LDR
:03039nam 2200289 4500
001
1395489
005
20110518115305.5
008
130515s2010 ||||||||||||||||| ||eng d
020
$a
9780494605394
035
$a
(UMI)AAIMR60539
035
$a
AAIMR60539
040
$a
UMI
$c
UMI
100
1
$a
Xu, Ying.
$3
1674185
245
1 2
$a
A tightness continuum measure of Chinese semantic units, and its application to information retrieval.
300
$a
68 p.
500
$a
Source: Masters Abstracts International, Volume: 48-05, page: 2561.
502
$a
Thesis (M.Sc.)--University of Alberta (Canada), 2010.
520
$a
Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks such as machine translation (MT) and information retrieval (IR). Previous work has shown a non-monotonic relation between improvements in Chinese segmentation performance and performance on NLP tasks. Our research also suggests that different tasks need different criteria for Chinese segmentation.
520
$a
We propose a tightness continuum for Chinese semantic units which provides a more principled approach to the coupling of segmentation methods and NLP application tasks. The construction of the continuum is based on calculating the frequency distribution of units' segmentation patterns. For a Chinese character sequence of length n, 2 n-1 potential segmentation candidates exist. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks.
520
$a
In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter: it combines units which are tightly connected according to statistical information but segmented by the former segmenter, and segments units which are not tight but previously treated as one unit. The second method embeds the tightness value into IR score functions according to our hypothesis that terms in tight queries are more likely to be consecutive in relevant documents than terms in loose queries. After analyzing the currently available Chinese test collections, we found that they are not suitable for evaluating the effects of Chinese segmentation, especially the segmentation of Chinese compounds, on IR. So we created a focused test collection. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems. As another consequence our experiments demonstrate a strong need for additional corpora for the investigation of Chinese IR.
590
$a
School code: 0351.
650
4
$a
Language, Linguistics.
$3
1018079
650
4
$a
Information Science.
$3
1017528
650
4
$a
Computer Science.
$3
626642
690
$a
0290
690
$a
0723
690
$a
0984
710
2
$a
University of Alberta (Canada).
$3
626651
773
0
$t
Masters Abstracts International
$g
48-05.
790
$a
0351
791
$a
M.Sc.
792
$a
2010
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=MR60539
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9158628
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login