Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Effective Record Linkage Techniques ...
~
Nanayakkara, Charini.
Linked to FindBook
Google Book
Amazon
博客來
Effective Record Linkage Techniques for Complex Population Data.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Effective Record Linkage Techniques for Complex Population Data./
Author:
Nanayakkara, Charini.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2022,
Description:
229 p.
Notes:
Source: Dissertations Abstracts International, Volume: 84-03, Section: B.
Contained By:
Dissertations Abstracts International84-03B.
Subject:
Active learning. -
Online resource:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29289056
ISBN:
9798845456649
Effective Record Linkage Techniques for Complex Population Data.
Nanayakkara, Charini.
Effective Record Linkage Techniques for Complex Population Data.
- Ann Arbor : ProQuest Dissertations & Theses, 2022 - 229 p.
Source: Dissertations Abstracts International, Volume: 84-03, Section: B.
Thesis (Ph.D.)--The Australian National University (Australia), 2022.
Real-world data sets are generally of limited value when analysed on their own, whereas the true potential of data can be exploited only when two or more data sets are linked to analyse patterns across records. A classic example is the need for merging medical records with travel data for effective surveillance and management of pandemics such as COVID-19 by tracing points of contacts of infected individuals. Therefore, Record Linkage (RL), which is the process of identifying records that refer to the same entity, is an area of data science that is of paramount importance in the quest for making informed decisions based on the plethora of information available in the modern world. Two of the primary concerns of RL are obtaining linkage results of high quality, and maximising efficiency. Furthermore, the lack of ground-truth data in the form of known matches and non-matches, and the privacy concerns involved in linking sensitive data have hindered the application of RL in real-world projects. In traditional RL, methods such as blocking and indexing are generally applied to improve efficiency by reducing the number of record pairs that need to be compared. Once the record pairs retained from blocking are compared, certain classification methods are employed to separate matches from non-matches. Thus, the general RL process comprises of blocking, comparison, classification, and finally evaluation to assess how well a linkage program has performed. In this thesis we initially provide a holistic understanding of the background of RL, and then conduct an extensive literature review of the state-of-the-art techniques applied in RL to identify current research gaps. Next, we present our initial contribution of incorporating data characteristics, such as temporal and geographic information with unsupervised clustering, which achieves significant improvements in precision (more than 16%), at the cost of minor reduction in recall (less than 2.5%) when they are applied on real-world data sets compared to using regular unsupervised clustering. We then present a novel active learning-based method to filter record pairs subsequent to the record pair comparison step to improve the efficiency of the RL process. Furthermore, we develop a novel active learning-based classification technique for RL which allows to obtain high quality linkage results with limited ground-truth data. Even though semi-supervised learning techniques such as active learning methods have already been proposed in the context of RL, this is a relatively novel paradigm which is worthy of further exploration. We experimentally show more than 35% improvement in clustering efficiency with the application of our proposed filtering approach; and linkage quality on par with or exceeding existing active learning-based classification methods, compared to our active learning-based classification technique. Existing RL evaluation measures such as precision and recall evaluate the classification outcome of record pairs, which can cause ambiguity when applied in the group RL context. We therefore propose a more robust RL evaluation measure which evaluates linkage quality based on how individual records have been assigned to clusters rather than considering record pairs. Next, we propose a novel graph anonymisation technique that extends the literature by introducing methods of anonymising data to be linked in a human interpretable manner, without compromising structure and interpretability of the data as with existing state-of-the-art anonymisation approaches.
ISBN: 9798845456649Subjects--Topical Terms:
527777
Active learning.
Effective Record Linkage Techniques for Complex Population Data.
LDR
:04609nmm a2200337 4500
001
2399135
005
20240909100731.5
006
m o d
007
cr#unu||||||||
008
251215s2022 ||||||||||||||||| ||eng d
020
$a
9798845456649
035
$a
(MiAaPQ)AAI29289056
035
$a
(MiAaPQ)AustNatlU1885264165
035
$a
AAI29289056
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Nanayakkara, Charini.
$3
3769099
245
1 0
$a
Effective Record Linkage Techniques for Complex Population Data.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2022
300
$a
229 p.
500
$a
Source: Dissertations Abstracts International, Volume: 84-03, Section: B.
500
$a
Advisor: Christen, Peter.
502
$a
Thesis (Ph.D.)--The Australian National University (Australia), 2022.
520
$a
Real-world data sets are generally of limited value when analysed on their own, whereas the true potential of data can be exploited only when two or more data sets are linked to analyse patterns across records. A classic example is the need for merging medical records with travel data for effective surveillance and management of pandemics such as COVID-19 by tracing points of contacts of infected individuals. Therefore, Record Linkage (RL), which is the process of identifying records that refer to the same entity, is an area of data science that is of paramount importance in the quest for making informed decisions based on the plethora of information available in the modern world. Two of the primary concerns of RL are obtaining linkage results of high quality, and maximising efficiency. Furthermore, the lack of ground-truth data in the form of known matches and non-matches, and the privacy concerns involved in linking sensitive data have hindered the application of RL in real-world projects. In traditional RL, methods such as blocking and indexing are generally applied to improve efficiency by reducing the number of record pairs that need to be compared. Once the record pairs retained from blocking are compared, certain classification methods are employed to separate matches from non-matches. Thus, the general RL process comprises of blocking, comparison, classification, and finally evaluation to assess how well a linkage program has performed. In this thesis we initially provide a holistic understanding of the background of RL, and then conduct an extensive literature review of the state-of-the-art techniques applied in RL to identify current research gaps. Next, we present our initial contribution of incorporating data characteristics, such as temporal and geographic information with unsupervised clustering, which achieves significant improvements in precision (more than 16%), at the cost of minor reduction in recall (less than 2.5%) when they are applied on real-world data sets compared to using regular unsupervised clustering. We then present a novel active learning-based method to filter record pairs subsequent to the record pair comparison step to improve the efficiency of the RL process. Furthermore, we develop a novel active learning-based classification technique for RL which allows to obtain high quality linkage results with limited ground-truth data. Even though semi-supervised learning techniques such as active learning methods have already been proposed in the context of RL, this is a relatively novel paradigm which is worthy of further exploration. We experimentally show more than 35% improvement in clustering efficiency with the application of our proposed filtering approach; and linkage quality on par with or exceeding existing active learning-based classification methods, compared to our active learning-based classification technique. Existing RL evaluation measures such as precision and recall evaluate the classification outcome of record pairs, which can cause ambiguity when applied in the group RL context. We therefore propose a more robust RL evaluation measure which evaluates linkage quality based on how individual records have been assigned to clusters rather than considering record pairs. Next, we propose a novel graph anonymisation technique that extends the literature by introducing methods of anonymising data to be linked in a human interpretable manner, without compromising structure and interpretability of the data as with existing state-of-the-art anonymisation approaches.
590
$a
School code: 0433.
650
4
$a
Active learning.
$3
527777
650
4
$a
Data science.
$3
3689306
650
4
$a
Clustering.
$3
3559215
650
4
$a
Education.
$3
516579
650
4
$a
Bibliographic records.
$3
3769100
650
4
$a
Markov analysis.
$3
3562906
650
4
$a
Pedagogy.
$3
2122828
690
$a
0515
690
$a
0796
690
$a
0456
710
2
$a
The Australian National University (Australia).
$3
1952885
773
0
$t
Dissertations Abstracts International
$g
84-03B.
790
$a
0433
791
$a
Ph.D.
792
$a
2022
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29289056
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9507455
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login