Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Public Health Surveillance using Soc...
~
Dai, Xiangfeng.
Linked to FindBook
Google Book
Amazon
博客來
Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series./
Author:
Dai, Xiangfeng.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2017,
Description:
112 p.
Notes:
Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
Contained By:
Dissertation Abstracts International78-10B(E).
Subject:
Computer science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10264204
ISBN:
9781369851076
Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
Dai, Xiangfeng.
Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
- Ann Arbor : ProQuest Dissertations & Theses, 2017 - 112 p.
Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
Thesis (Ph.D.)--North Carolina Agricultural and Technical State University, 2017.
Traditional public health surveillance is often limited by the time required to collect data. Social media (e.g., Twitter) provide a low-cost alternative data source for public health surveillance. In this dissertation, we develop a set of methods based on short text classification and trend analysis. First, we propose a hybrid classification method for collecting disease-related data from social media. The proposed method combines basic Natural Language Processing (NLP), rule-based classifiers and supervised machine learning classifiers. This method is efficiency and achieves better results than any single approach. To generalize the method, we also propose a word embedding based clustering method for text classification. Word embedding is an NLP method that can capture the semantic information of words. A text can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the text can then be classified as related or unrelated to a topic (e.g., influenza). Our simulations show a good performance and the best accuracy achieved was 87.1%. The proposed method is unsupervised, and hence it does not require labor to label training data and can be readily extended to other classification problems or other diseases.
ISBN: 9781369851076Subjects--Topical Terms:
523869
Computer science.
Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
LDR
:03670nmm a2200325 4500
001
2126598
005
20171128150727.5
008
180830s2017 ||||||||||||||||| ||eng d
020
$a
9781369851076
035
$a
(MiAaPQ)AAI10264204
035
$a
AAI10264204
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Dai, Xiangfeng.
$3
3288704
245
1 0
$a
Public Health Surveillance using Social Media: Short Text Classification and Trend Analysis of Nonstationary Time Series.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2017
300
$a
112 p.
500
$a
Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
500
$a
Adviser: Marwan Bikdash.
502
$a
Thesis (Ph.D.)--North Carolina Agricultural and Technical State University, 2017.
520
$a
Traditional public health surveillance is often limited by the time required to collect data. Social media (e.g., Twitter) provide a low-cost alternative data source for public health surveillance. In this dissertation, we develop a set of methods based on short text classification and trend analysis. First, we propose a hybrid classification method for collecting disease-related data from social media. The proposed method combines basic Natural Language Processing (NLP), rule-based classifiers and supervised machine learning classifiers. This method is efficiency and achieves better results than any single approach. To generalize the method, we also propose a word embedding based clustering method for text classification. Word embedding is an NLP method that can capture the semantic information of words. A text can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the text can then be classified as related or unrelated to a topic (e.g., influenza). Our simulations show a good performance and the best accuracy achieved was 87.1%. The proposed method is unsupervised, and hence it does not require labor to label training data and can be readily extended to other classification problems or other diseases.
520
$a
The collected temporal disease-related social media data is quite noisy and nonstationary. To detect the onset of a disease from social media, we applied a distance-based outliers method to transform the noisy social media data into regions of inliers and outliers, then perform region-based hypothesis testing for outbreak detection. We then propose a Hypothesis testing-based Adaptive Spline Filtering (HASF) method which breaks the nonstationary time series into sections of adapted lengths, each of which is curve-fitted with a cubic spline. The method allows the imposition of appropriate constraints such as continuity and smoothness between the sections, minimum or maximum section length, etc. The number of sections and the nodes between them are adapted from the data by testing hypotheses regarding the second statistics of the residuals computed using different configurations of nodes. The resulting cubic-spline curve can therefore be interpreted as capturing the disease trends and turning the residual into a stationary process as much as possible. The HASF approach is extended to solve the problem of missing data in time series. Three "filling" variants are considered, and the most promising variant fills big gaps with linear splines while maintaining smoothness and continuity between the sections.
590
$a
School code: 1544.
650
4
$a
Computer science.
$3
523869
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Mining engineering.
$3
788403
650
4
$a
Public health.
$3
534748
690
$a
0984
690
$a
0800
690
$a
0551
690
$a
0573
710
2
$a
North Carolina Agricultural and Technical State University.
$b
Computational Science and Engineering.
$3
2105946
773
0
$t
Dissertation Abstracts International
$g
78-10B(E).
790
$a
1544
791
$a
Ph.D.
792
$a
2017
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10264204
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9337210
電子資源
01.外借(書)_YB
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login