搜索
查看: 4136|回复: 11

问题探讨:RNASeq V2 normalized数据是否有必要转化为TPM

[复制链接]

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
发表于 2017-6-7 09:42:14 | 显示全部楼层 |阅读模式
转化后是否需要再次标准化
回复

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 11:07:27 | 显示全部楼层
LUAD.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0.tar.gz

包括了癌和癌旁的数据
可以直接用来比较癌和癌旁的差异基因吗?
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 11:10:19 | 显示全部楼层
TCGA …  R - Encyclopedia
RNASeq Version 2
Skip to end of metadata
Created by Pihl, Todd (NIH/NCI) [C], last modified on May 09, 2013 Go to start of metadata
Description
RNASeq Version 2 is similar to RNASeq in that it uses sequencing data to determine gene expression levels.  RNASeq Version 2 uses a different set of algorithms to determine the expression levels are the results are presented in a slightly different set of files.



RNASeq Version 2结果是是什么定量策略(RPKM?FPKM?还是TPM?)
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 11:13:38 | 显示全部楼层
There are two analysis pipelines used to create Level 3 expression data from RNA Sequence data. The first approach used at TCGA relies on the RPKM method, while the second method uses MapSplice to do the alignment and RSEM to perform the quantitation.
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 11:32:07 | 显示全部楼层
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Bo Li and Colin N DeweyEmail author
BMC Bioinformatics201112:323
DOI: 10.1186/1471-2105-12-323©  Li and Dewey; licensee BioMed Central Ltd. 2011
Received: 10 May 2011Accepted: 4 August 2011Published: 4 August 2011


The primary output of RSEM consists of two files, one for isoform-level estimates, and the other for gene-level estimates. Abundance estimates are given in terms of two measures. The first is an estimate of the number of fragments that are derived from a given isoform or gene. We can only estimate this quantity because reads often do not map uniquely to a single transcript. This count is generally a non-integer value and is the expectation of the number of alignable and unfiltered fragments that are derived from a isoform or gene given the ML abundances. These (possibly rounded) counts may be used by a differential expression method such as edgeR [9] or DESeq [8]. The second measure of abundance is the estimated fraction of transcripts made up by a given isoform or gene. This measure can be used directly as a value between zero and one or can be multiplied by 106 to obtain a measure in terms of transcripts per million (TPM). The transcript fraction measure is preferred over the popular RPKM [18] and FPKM [6] measures because it is independent of the mean expressed transcript length and is thus more comparable across samples and species [7].


我们下载的数据是哪部分?
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 12:16:20 | 显示全部楼层
tcga 上rsem rnaseq v2数据是怎么定量的,rpkm,fpkm还是tpm?
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 12:42:44 | 显示全部楼层
Data derived from the sequencing of RNA is one of the sources of gene expression data collected by TCGA. Currently, the Level 3 data is created using two distinct methods. The original method followed the RPKM (Reads Per Kilobase of exon model per Million mapped reads) method of quantiation. The newer version 2 data (RNASeqV2, introduced in May 2012) used a combination of MapSplice and RSEM to determine expression levels. In the near future, this data will also be used to identify variants such as SNPs or indels.
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 16:41:30 | 显示全部楼层
直接用TCGA下载rnaseqv2__RSEM_genes_normalized数据,通过wilcox.test找出癌和癌旁的表达差异基因有什么问题吗?
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 17:20:30 | 显示全部楼层
老师,您好,因为之前看了RSEM的文献,https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323




tcga 上rsem rnaseq v2数据是怎么定量的,rpkm,fpkm还是tpm?TCGA获取的rsem rnaseq v2数据是属于图片上说的哪一种,是number of fragments 还是fraction of transcripts?这两种数据都能从TCGA上拿到吗? 还有fraction of transcripts怎么转化为TPM?
我想做差异基因,直接用TCGA下载rnaseqv2__RSEM_genes_normalized数据,通过wilcox.test找出癌和癌旁的表达差异基因有什么问题吗?这样做的结果和你们网站http://gepia.cancer-pku.cn/index.html的结果不太一致,我想咨询下你们的方法。


问题有点多,老师能一点一点帮我释惑吗?感谢。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
回复 支持 反对

使用道具 举报

2

主题

12

帖子

59

积分

注册会员

Rank: 2

积分
59
 楼主| 发表于 2017-6-7 17:36:48 | 显示全部楼层
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-10-19 08:50 , Processed in 0.032314 second(s), 24 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.