Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Statistical and Computational Method...
~
Li, Jingyi.
Linked to FindBook
Google Book
Amazon
博客來
Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
Record Type:
Language materials, printed : Monograph/item
Title/Author:
Statistical and Computational Methods for Analyzing High-Throughout Genomic Data./
Author:
Li, Jingyi.
Description:
113 p.
Notes:
Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
Contained By:
Dissertation Abstracts International75-01B(E).
Subject:
Biology, Biostatistics. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3593901
ISBN:
9781303373770
Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
Li, Jingyi.
Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
- 113 p.
Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
Thesis (Ph.D.)--University of California, Berkeley, 2013.
The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this question. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation.
ISBN: 9781303373770Subjects--Topical Terms:
1018416
Biology, Biostatistics.
Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
LDR
:05294nam a2200325 4500
001
1964575
005
20141010092520.5
008
150210s2013 ||||||||||||||||| ||eng d
020
$a
9781303373770
035
$a
(MiAaPQ)AAI3593901
035
$a
AAI3593901
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Li, Jingyi.
$3
1672761
245
1 0
$a
Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
300
$a
113 p.
500
$a
Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
500
$a
Includes supplementary digital materials.
500
$a
Advisers: Peter J. Bickel; Haiyan Huang.
502
$a
Thesis (Ph.D.)--University of California, Berkeley, 2013.
520
$a
The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this question. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation.
520
$a
The second part of this thesis demonstrates the power of simple statistical analysis in correcting biases of system-wide protein abundance estimates and in understanding the relationship between gene transcription and protein abundances. We found that proteome-wide surveys have significantly underestimated protein abundances, which differ greatly from previously published individual measurements. We corrected proteome-wide protein abundance estimates by using individual measurements of 61 housekeeping proteins, and then found that our corrected protein abundance estimates show a higher correlation and a stronger linear relationship with mRNA abundances than do the uncorrected protein data. To estimate the degree to which mRNA expression levels determine protein levels, it is critical to measure the error in protein and mRNA abundance data and to consider all genes, not only those whose protein expression is readily detected. This is a fact that previous proteome-widely surveys ignored. We took two independent approaches to re-estimate the percentage that mRNA levels explain in the variance of protein abundances. While the percentages estimated from the two approaches vary on different sets of genes, all suggest that previous protein-wide surveys have significantly underestimated the importance of transcription.
520
$a
In the third and final part, I will introduce a modENCODE (the Model Organism ENCyclopedia Of DNA Elements) project in which we compared developmental stages, tissues and cells (or cell lines) of Drosophila melanogaster and Caenorhabditis elegans, two well-studied model organisms in developmental biology. To understand the similarity of gene expression patterns throughout their development time courses is an interesting and important question in comparative genomics and evolutionary biology. The availability of modENCODE RNA-Seq data for different developmental stages, tissues and cells of the two organisms enables a transcriptome-wide comparison study to address this question. We undertook a comparison of their developmental time courses and tissues/cells, seeking commonalities in orthologous gene expression. Our approach centers on using stage/tissue/cell- associated orthologous genes to link the two organisms. For every stage/tissue/cell in each organism, its associated genes are selected as the genes capturing specific transcriptional activities: genes highly expressed in that stage/tissue/cell but lowly expressed in a few other stages/tissues/cells. We aligned a pair of D. melanogaster and C. elegans stages/tissues/cells by a hypergeometric test, where the test statistic is the number of orthologous gene pairs associated with both stages/tissues/cells. The test is against the null hypothesis that the two stages/tissues/cells have independent sets of associated genes. We first carried out the alignment approach on pairs of stages/tissues/cells within D. melanogaster and C. elegans respectively, and the alignment results are consistent with previous findings, supporting the validity of this approach. When comparing fly with worm, we unexpectedly observed two parallel collinear alignment patterns between their developmental timecourses and several interesting alignments between their tissues and cells. Our results are the first findings regarding a comprehensive comparison between D. melanogaster and C. elegans time courses, tissues and cells. (Abstract shortened by UMI.).
590
$a
School code: 0028.
650
4
$a
Biology, Biostatistics.
$3
1018416
650
4
$a
Biology, Bioinformatics.
$3
1018415
650
4
$a
Statistics.
$3
517247
690
$a
0308
690
$a
0715
690
$a
0463
710
2
$a
University of California, Berkeley.
$b
Biostatistics.
$3
2101048
773
0
$t
Dissertation Abstracts International
$g
75-01B(E).
790
$a
0028
791
$a
Ph.D.
792
$a
2013
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3593901
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9259574
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login