搜索
查看: 7313|回复: 3

[mRNA-seq] wgcna流程代码

[复制链接]

29

主题

131

帖子

1208

积分

金牌会员

Rank: 6Rank: 6

积分
1208
发表于 2017-3-20 21:58:37 | 显示全部楼层 |阅读模式
本帖最后由 anlan 于 2017-3-21 10:27 编辑

这里分享一个个人感觉很完整的wgcna流程代码,一般文章或者研究用到的方法以及作图都有。链接:http://www.stat.wisc.edu/~yandell/statgen/ucla/WGCNA/wgcna.html


这个教程提供了表达矩阵、表型数据以及注释信息文件等等,我们一般只要前两者就能完成一般的wgcna的流程了
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Book/MouseData.zip

然后我按照自己的理解简化下代码,图就不贴了,看链接里面即可

1. 读入数据
[AppleScript] 纯文本查看 复制代码
suppressMessages(library(WGCNA))
options(stringsAsFactors = FALSE)
femData = read.csv(file.path("C:/Users/anlan/Desktop/MouseData", "LiverFemale3600.csv"))

2. 清理数据,形成行名是gene,列名是样品名的形式,这就是表达矩阵的格式
[AppleScript] 纯文本查看 复制代码
datExprFemale = as.data.frame(t(femData[, -c(1:8)]))
names(datExprFemale) = femData$substanceBXH
rownames(datExprFemale) = names(femData)[-c(1:8)]
3. 读入表型数据,提取表型,排序,最后的datTraits 就是我们需要的表型的格式
[AppleScript] 纯文本查看 复制代码
# Order the rows of allTraits so that they match those of datExprFemale:
Mice = rownames(datExprFemale)
traitRows = match(Mice, allTraits$Mice)
datTraits = allTraits[traitRows, -1]
rownames(datTraits) = allTraits[traitRows, 1]
# show that row names agree
table(rownames(datTraits) == rownames(datExprFemale))

4. 确定soft thresholding值,主要参数有networkType、powers,其他参数默认即可
[AppleScript] 纯文本查看 复制代码
# Choose a set of soft thresholding powers
powers = c(1:10)  # in practice this should include powers up to 20.
# choose power based on SFT criterion
sft = pickSoftThreshold(datExprFemale, powerVector = powers)
#sft=pickSoftThreshold(datExprFemale,powerVector=powers, networkType = "signed")
# Plot the results:
par(mfrow = c(1, 2))
# SFT index as a function of different powers
plot(sft$fitIndices[, 1], -sign(sft$fitIndices[, 3]) * sft$fitIndices[, 2], 
     xlab = "Soft Threshold (power)", ylab = "SFT, signed R^2", type = "n", main = paste("Scale independence"))
text(sft$fitIndices[, 1], -sign(sft$fitIndices[, 3]) * sft$fitIndices[, 2], 
     labels = powers, col = "red")
# this line corresponds to using an R^2 cut-off of h
abline(h = 0.9, col = "red")
# Mean connectivity as a function of different powers
plot(sft$fitIndices[, 1], sft$fitIndices[, 5], type = "n", xlab = "Soft Threshold (power)", 
     ylab = "Mean Connectivity", main = paste("Mean connectivity"))
text(sft$fitIndices[, 1], sft$fitIndices[, 5], labels = powers, col = "red")

5. 通过dynamic tree cutting分割模块,里面的参数百度即可
[AppleScript] 纯文本查看 复制代码
mergingThresh = 0.25
net = blockwiseModules(datExprFemale, corType = "pearson", maxBlockSize = 5000, 
                       networkType = "unsigned", power = 7, minModuleSize = 30, mergeCutHeight = mergingThresh, 
                       numericLabels = TRUE, saveTOMs = TRUE, pamRespectsDendro = FALSE, saveTOMFileBase = "femaleMouseTOM")
moduleLabelsAutomatic = net$colors
# Convert labels to colors for plotting
moduleColorsAutomatic = labels2colors(moduleLabelsAutomatic)
# A data frame with module eigengenes can be obtained as follows
MEsAutomatic = net$MEs

6. 选择表型中的weight数据,然后跟表达矩阵关联,获得GS值
[AppleScript] 纯文本查看 复制代码
# this is the body weight
weight = as.data.frame(datTraits$weight_g)
names(weight) = "weight"
# Next use this trait to define a gene significance variable
GS.weight = as.numeric(cor(datExprFemale, weight, use = "p"))
# This translates the numeric values into colors
GS.weightColor = numbers2colors(GS.weight, signed = T)
blocknumber = 1
datColors = data.frame(moduleColorsAutomatic, GS.weightColor)[net$blockGenes[[blocknumber]],]
# Plot the dendrogram and the module colors underneath
plotDendroAndColors(net$dendrograms[[blocknumber]], colors = datColors, groupLabels = c("Module colors", 
                                                                                        "GS.weight"), dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)
7. 将模块与weight表型关联,代码里面是基本的R语言
[AppleScript] 纯文本查看 复制代码
# Choose a module assignment
moduleColorsFemale = moduleColorsAutomatic
# Define numbers of genes and samples
nGenes = ncol(datExprFemale)
nSamples = nrow(datExprFemale)
# Recalculate MEs with color labels
MEs0 = moduleEigengenes(datExprFemale, moduleColorsFemale)$eigengenes
MEsFemale = orderMEs(MEs0)
modTraitCor = cor(MEsFemale, datTraits, use = "p")
modTraitP = corPvalueStudent(modTraitCor, nSamples)
textMatrix = paste(signif(modTraitCor, 2), "\n(", signif(modTraitP, 1), ")", 
                   sep = "")
dim(textMatrix) = dim(modTraitCor)
par(mar = c(6, 8.5, 3, 3))
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = modTraitCor, xLabels = names(datTraits), yLabels = names(MEsFemale), 
               ySymbols = names(MEsFemale), colorLabels = FALSE, colors = greenWhiteRed(50), 
               textMatrix = textMatrix, setStdMargins = FALSE, cex.text = 0.5, zlim = c(-1, 1), 
               main = paste("Module-trait relationships"))
8. Gene Significance与特定模块的关联(要先算每个基因在每个模块的KME值以及上面计算的GS值)
[AppleScript] 纯文本查看 复制代码
datKME = signedKME(datExprFemale, MEsFemale)
colorOfColumn = substring(names(datKME), 4)
par(mfrow = c(2, 2))
selectModules = c("blue", "brown", "black", "grey")
par(mfrow = c(2, length(selectModules)/2))
for (module in selectModules) {
  column = match(module, colorOfColumn)
  restModule = moduleColorsFemale == module
  verboseScatterplot(datKME[restModule, column], GS.weight[restModule], 
                     xlab = paste("Module Membership ",module, "module"), 
                     ylab = "GS.weight", main = paste("kME.", module, "vs. GS"), col = module)
}
9. 然后导出模块中TOM值的数据,导入cytoscape作图
10. 后续对感兴趣的gene注释下生物功能就行
上述代码是我从那篇流程中挑出来的,大家可以自行看原版即可

还有一个推荐文献,如何用wgcna分析无表型数据的文章,可以给大家作为参考
Floral Transcriptomes in Woodland Strawberry Uncover Developing Receptacle and Anther Gene Networks1
分析流程跟有表型的差不多,唯一要注意的是:
第一:应该是没有GS这样跟表型相关的定义了
第二:它是自定义表型矩阵,类似于单位矩阵,因此它找出来的模块应该是相对其他模块是显著上调或者下调在一个样本中
第三:它不是跟表型相关联,主要是根据模块特征值来选择表达特异的模块

以上属于自己的理解,有错误的地方大家可以给予纠正。最后附上wgcna tutorials ,里面有详细的R代码以及模拟数据,也可以照着跑一跑
https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html






上一篇:circos画图中snp dinsity数据准备
下一篇:footprintDB是目前为止我看到的最全的转录因子数据库
回复

使用道具 举报

4

主题

18

帖子

161

积分

注册会员

Rank: 2

积分
161
发表于 2017-9-8 22:03:23 | 显示全部楼层
请问如果没有表型数据,怎么计算module与样本的相关性呢?
回复 支持 1 反对 0

使用道具 举报

1

主题

43

帖子

463

积分

中级会员

Rank: 3Rank: 3

积分
463
发表于 2017-3-22 10:54:50 | 显示全部楼层
http://pages.stat.wisc.edu/~yandell/statgen/ucla/WGCNA/
那个网站我打不开 然后我找到了这个
人生若只如初见!
回复 支持 反对

使用道具 举报

29

主题

131

帖子

1208

积分

金牌会员

Rank: 6Rank: 6

积分
1208
 楼主| 发表于 2017-3-26 00:46:40 | 显示全部楼层
y461650833y 发表于 2017-3-22 10:54
http://pages.stat.wisc.edu/~yandell/statgen/ucla/WGCNA/
那个网站我打不开 然后我找到了这个 ...

确实有时能打开,有时不行
你找到的那个跟我发的网址里面的内容是一样的~~
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-8-22 22:33 , Processed in 0.037343 second(s), 28 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.