Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Graph-based weakly-supervised method...
~
Talukdar, Partha Pratim.
Linked to FindBook
Google Book
Amazon
博客來
Graph-based weakly-supervised methods for information extraction & integration.
Record Type:
Language materials, printed : Monograph/item
Title/Author:
Graph-based weakly-supervised methods for information extraction & integration./
Author:
Talukdar, Partha Pratim.
Description:
170 p.
Notes:
Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 4359.
Contained By:
Dissertation Abstracts International71-07B.
Subject:
Information Technology. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3414209
ISBN:
9781124064130
Graph-based weakly-supervised methods for information extraction & integration.
Talukdar, Partha Pratim.
Graph-based weakly-supervised methods for information extraction & integration.
- 170 p.
Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 4359.
Thesis (Ph.D.)--University of Pennsylvania, 2010.
The variety and complexity of potentially-related data resources available for querying---webpages, databases, data warehouses---has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the problem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration.
ISBN: 9781124064130Subjects--Topical Terms:
1030799
Information Technology.
Graph-based weakly-supervised methods for information extraction & integration.
LDR
:04009nam 2200325 4500
001
1401080
005
20111013150244.5
008
130515s2010 ||||||||||||||||| ||eng d
020
$a
9781124064130
035
$a
(UMI)AAI3414209
035
$a
AAI3414209
040
$a
UMI
$c
UMI
100
1
$a
Talukdar, Partha Pratim.
$3
1680191
245
1 0
$a
Graph-based weakly-supervised methods for information extraction & integration.
300
$a
170 p.
500
$a
Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 4359.
500
$a
Adviser: Fernando Pereira.
502
$a
Thesis (Ph.D.)--University of Pennsylvania, 2010.
520
$a
The variety and complexity of potentially-related data resources available for querying---webpages, databases, data warehouses---has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the problem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration.
520
$a
Within IE, we focus on the problem of assigning semantic classes to entities. First we develop a context pattern induction method to extend small initial entity lists of various semantic classes. We also demonstrate that features derived from such extended entity lists can significantly improve performance of state-of-the-art discriminative taggers.
520
$a
The output of pattern-based class-instance extractors is often high-precision and low-recall in nature, which is inadequate for many real world applications. We use Adsorption, a graph based label propagation algorithm, to significantly increase recall of an initial high-precision, low-recall pattern-based extractor by combining evidences from unstructured and structured text corpora. Building on Adsorption, we propose a new label propagation algorithm, Modified Adsorption (MAD), and demonstrate its effectiveness on various real-world datasets. Additionally, we also show how class-instance acquisition performance in the graph-based SSL setting can be improved by incorporating additional semantic constraints available in independently developed knowledge bases.
520
$a
Within Information Integration, we develop a novel system, Q, which draws ideas from machine learning and databases to help a non-expert user construct data-integrating queries based on keywords (across databases) and interactive feedback on answers. We also present an information need-driven strategy for automatically incorporating new sources and their information in Q. We also demonstrate that Q's learning strategy is highly effective in combining the outputs of "black box" schema matchers and in re-weighting bad alignments. This removes the need to develop an expensive mediated schema which has been necessary for most previous systems.
590
$a
School code: 0175.
650
4
$a
Information Technology.
$3
1030799
650
4
$a
Information Science.
$3
1017528
650
4
$a
Computer Science.
$3
626642
690
$a
0489
690
$a
0723
690
$a
0984
710
2
$a
University of Pennsylvania.
$3
1017401
773
0
$t
Dissertation Abstracts International
$g
71-07B.
790
1 0
$a
Pereira, Fernando,
$e
advisor
790
$a
0175
791
$a
Ph.D.
792
$a
2010
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3414209
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9164219
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login