東華大學圖書館 |

Statistical Methods to Incorporate External Summary-Level Information into a Current Study.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Statistical Methods to Incorporate External Summary-Level Information into a Current Study./
作者:	Gu, Tian.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	115 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-05, Section: B.
Contained By:	Dissertations Abstracts International83-05B.
標題:	Biostatistics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28844406
ISBN:	9798471100381

Statistical Methods to Incorporate External Summary-Level Information into a Current Study.
Gu, Tian.

Statistical Methods to Incorporate External Summary-Level Information into a Current Study. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 115 p.

Source: Dissertations Abstracts International, Volume: 83-05, Section: B.

Thesis (Ph.D.)--University of Michigan, 2021.

This item must not be sold to any third party vendors.

In the era of big data, it is becoming increasingly common for researchers to consider incorporating external information from large studies to improve the accuracy of statistical inference instead of relying on a modestly sized dataset collected internally. We consider a general statistical problem where there are some known regression models or risk calculators to predict an outcome of interest from a set of commonly used predictors. Different types of summary information are available for these external models. An internal modest-sized dataset containing individual-level data for the variables in the known models and some new variables is available for our current analysis. In all three chapters below, we consider different settings to achieve the same goal--to build an improved prediction model that includes the new variables, using both the internal individual-level data and summary information obtained from the known external model(s). In Chapter 2, we focus on the simple case where there is only one large, well-characterized previous study from the external population. We propose a synthetic data approach, which first converts the external information into synthetic data, and then analyzes a combined dataset consisting of the observed internal data and the synthetic data. A theoretical justification and extensive simulation studies establish the efficiency gain and improved prediction performance of the proposed data integration method. We also illustrate that even under less restrictive requirements on the information that is available externally, the combined estimates have the same asymptotic properties as an alternative constraint maximum likelihood estimation approach. In Chapter 3, we consider a more complicated but quite plausible situation where several external prediction models are available to aid inference and prediction for the internal study. We assume that each of the external studies developed a prediction model for the same outcome but may use a slightly different set of covariates. We propose a meta-inference framework using an empirical Bayes estimation approach, which adaptively combines the estimates from the external models. This adaptive approach diminishes the influence of information that is less compatible with the internal data while balancing the bias-variance trade-off. The estimators we proposed are more efficient than the naive analysis of the internal data. In Chapter 4, we first extend the synthetic data method from Chapter 2 to accommodate the situation with multiple external prediction models, and further allow for heterogeneity of covariate effects across the external populations. Each external model could potentially be built on slightly different subsets of covariates that are measured in the internal study. The proposed approach generates synthetic outcome data in each population, uses stacked multiple imputation to create a long dataset with complete covariate information, and finally analyzes the imputed data with weighted regression. Leveraging multiple sources of auxiliary information from a broad class of externally fitted predictive models or established risk calculators based on parametric regression or machine learning methods, this new strategy can make statistical inference more accurate for both the internal population and the external populations. We evaluate the proposed methods through extensive simulations and apply them to improve models for predicting the risk of high-grade prostate cancer.

ISBN: 9798471100381Subjects--Topical Terms:

1002712
Biostatistics.
Subjects--Index Terms:

Data integration

Statistical Methods to Incorporate External Summary-Level Information into a Current Study.
LDR:04779nmm a2200373 4500 001 2346917
005 20220706051323.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798471100381
035 $a (MiAaPQ)AAI28844406
035 $a (MiAaPQ)umichrackham003740
035 $a AAI28844406
040 $a MiAaPQ $c MiAaPQ
100 1 $a Gu, Tian. $3 3686124
245 1 0 $a Statistical Methods to Incorporate External Summary-Level Information into a Current Study.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 115 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-05, Section: B.
500 $a Advisor: Mukherjee, Bhramar;Taylor, Jeremy M. G.
502 $a Thesis (Ph.D.)--University of Michigan, 2021.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a In the era of big data, it is becoming increasingly common for researchers to consider incorporating external information from large studies to improve the accuracy of statistical inference instead of relying on a modestly sized dataset collected internally. We consider a general statistical problem where there are some known regression models or risk calculators to predict an outcome of interest from a set of commonly used predictors. Different types of summary information are available for these external models. An internal modest-sized dataset containing individual-level data for the variables in the known models and some new variables is available for our current analysis. In all three chapters below, we consider different settings to achieve the same goal--to build an improved prediction model that includes the new variables, using both the internal individual-level data and summary information obtained from the known external model(s). In Chapter 2, we focus on the simple case where there is only one large, well-characterized previous study from the external population. We propose a synthetic data approach, which first converts the external information into synthetic data, and then analyzes a combined dataset consisting of the observed internal data and the synthetic data. A theoretical justification and extensive simulation studies establish the efficiency gain and improved prediction performance of the proposed data integration method. We also illustrate that even under less restrictive requirements on the information that is available externally, the combined estimates have the same asymptotic properties as an alternative constraint maximum likelihood estimation approach. In Chapter 3, we consider a more complicated but quite plausible situation where several external prediction models are available to aid inference and prediction for the internal study. We assume that each of the external studies developed a prediction model for the same outcome but may use a slightly different set of covariates. We propose a meta-inference framework using an empirical Bayes estimation approach, which adaptively combines the estimates from the external models. This adaptive approach diminishes the influence of information that is less compatible with the internal data while balancing the bias-variance trade-off. The estimators we proposed are more efficient than the naive analysis of the internal data. In Chapter 4, we first extend the synthetic data method from Chapter 2 to accommodate the situation with multiple external prediction models, and further allow for heterogeneity of covariate effects across the external populations. Each external model could potentially be built on slightly different subsets of covariates that are measured in the internal study. The proposed approach generates synthetic outcome data in each population, uses stacked multiple imputation to create a long dataset with complete covariate information, and finally analyzes the imputed data with weighted regression. Leveraging multiple sources of auxiliary information from a broad class of externally fitted predictive models or established risk calculators based on parametric regression or machine learning methods, this new strategy can make statistical inference more accurate for both the internal population and the external populations. We evaluate the proposed methods through extensive simulations and apply them to improve models for predicting the risk of high-grade prostate cancer.
590 $a School code: 0127.
650 4 $a Biostatistics. $3 1002712
650 4 $a Statistics. $3 517247
650 4 $a Statistical physics. $3 536281
653 $a Data integration
653 $a Prediction models
653 $a Regression inference
690 $a 0308
690 $a 0217
690 $a 0463
710 2 $a University of Michigan. $b Biostatistics. $3 3352160
773 0 $t Dissertations Abstracts International $g 83-05B.
790 $a 0127
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28844406