搜索
查看: 1043|回复: 2

单细胞测序分析工具SC3文章图片绘制

[复制链接]

13

主题

28

帖子

178

积分

注册会员

Rank: 2

积分
178
发表于 2017-7-1 14:39:02 | 显示全部楼层 |阅读模式
SC3: consensusclustering of single-cellRNA-seq data

Single cell sequence,单细胞测序是一个今年来很热门的测序技术,其在探索生物过程、疾病机理等方面展现了前所未有的精度,通过对单细
胞进行DNA和RNAseq我们得以高精度的了解单个细胞水平的体细胞突变情况和基因的表达情况,并且对于一个样本中细胞类型的组成和解析获得空前的认识 。作者在用公开发表的数据上进行测试,SC3比5种(tSNE,PCA,SNN-Cliq , SINCERA  and SEURAT)目前存在的方法更优


SC3于2017年5月发表于Nature Methods


单细胞测序能够定量描述(quantitative characterization)细胞的类型(cell type)基于全局的转录组分析。Vladimir Yu Kiselev 发明单细胞共识聚类( single-cell consensus clustering)既SC3,whichachieves high accuracy and robustness by combining multipleclustering solutions through a consensus approach

原文链接:https://www.nature.com/nmeth/journal/v14/n5/pdf/nmeth.4236.pdf

R包下载:http://bioconductor.org/packages/release/bioc/html/SC3.html
里面有详细的PDF说明http://bioconductor.org/packages/release/bioc/manuals/SC3/man/SC3.pdf和网页教程http://bioconductor.org/packages/release/bioc/vignettes/SC3/inst/doc/my-vignette.html
作图所需代码和数据:https://github.com/hemberg-lab/SC3-paper-figures真是偶像 一个大写的

首先是SC3的工作流程和文章选取的数据


SC3 takes as input an expression matrix, M, in which columns correspond to cells and rows correspond to genes/transcripts
以行为细胞,列为基因的矩阵输入
接下来是五个步骤:[size=14.495px] Gene filter.Distance calculations[size=14.495px].Transformations.[size=14.495px] k[size=14.495px]-means[size=14.495px]. Consensus clustering

选取几个公共数据作为后续验证
RPKM, reads per kilobase of transcript per million mapped reads; RPM, reads per million mapped reads; FPKM, fragments per kilobase of
transcript per million mapped reads; TPM, transcripts per million mapped reads; UMI, unique molecular identifiers;

CPM counts per million可以看看这看看这几种测序结果的具体解释https://www.plob.org/article/9212.html

Fig1c . c) Eigenvector (d) values that achieve adjusted Rand index (ARI) > 0.95 on gold-standard datasets. Black vertical lines indicate the interval
d = 4–7% of N, showing high accuracy in the classification
[Python] 纯文本查看 复制代码
library(cowplot)
library(dplyr)
library(ggplot2)
options(stringsAsFactors = FALSE)#不把字符串string识别成factor

d <- read.csv("C:\\Users\\yang\\Desktop\\SC3-paper-figures-master\\data\\1c.csv")

d$Dataset <- factor(
  d$Dataset, 
  levels = c("Biase", "Yan", "Goolam", "Deng", "Pollen", "Kolodziejczyk")
)#设置因子,分组

cols <- c("Biase" = "#bc80bd", 
          "Yan" = "#ccebc5", 
          "Goolam" = "#ffed6f", 
          "Deng" = "#bebada", 
          "Pollen" = "#fb8072", 
          "Kolodziejczyk" = "#bf812d")

p <- ggplot(d, aes(d, fill = Dataset)) +
  geom_histogram(bins = 22)#做柱状图 
p+ scale_fill_manual(values = cols)#改颜色 
p+  geom_vline(xintercept = c(3.5,7.5))#添加参考线 
p+  labs(x = "d as % of N", y = "# of solutions with ARI > 95% of max.")#设置x,y轴名字 
P+  theme_classic(base_size = 14)#加注释


ggsave("pdf/1c.pdf", w = 6, h = 4.5)
ggsave("jpeg/1c.jpeg", w = 6, h = 4.5)


(d) 100 realizations of the SC3 clustering of the datasets in b
[Python] 纯文本查看 复制代码
library(cowplot)
library(dplyr)
setwd("C:\\Users\\yang\\Desktop\\SC3-paper-figures-master")
options(stringsAsFactors = FALSE)

gtable_select <- function (x, ...)
{
    matches <- c(...)
    x$layout <- x$layout[matches, , drop = FALSE]
    x$grobs <- x$grobs[matches]
    x
}

gtable_stack <- function(g1, g2){
    g1$grobs <- c(g1$grobs, g2$grobs)
    g1$layout <- transform(g1$layout, z= z-max(z), name="g2")
    g1$layout <- rbind(g1$layout, g2$layout)
    g1
}

d <- read.csv("data/1d.csv")

d[d$Dataset == "Kolodziejczyk", ]$Dataset <- "Kolodz."

cols <- c("Biase" = "#bc80bd", "Treutlein" = "#8dd3c7", "Ting" = "#ffffb3", 
          "Yan" = "#ccebc5", "Goolam" = "#ffed6f", "Deng" = "#bebada",
          "Pollen1" = "#fb8072", "Pollen1-TopHat" = "#fb8072",
          "Pollen2-TopHat" = "#fb8072", "Pollen2" = "#fb8072",
          "Patel" = "#80b1d3", "Usoskin1" = "#fdb462", "Usoskin2" = "#fdb462",
          "Usoskin3" = "#fdb462", "Petropoulos" = "#8dd3c7", "Kolodz." = "#bf812d",
          "Klein" = "#b3de69", "Zeisel" = "#fccde5")#颜色
d$Dataset <- factor(
    d$Dataset, 
    levels = c(
        "Biase", "Yan", "Goolam", "Deng", "Pollen1", "Pollen2", "Kolodz.",
        "Treutlein", "Ting", "Patel", "Usoskin1", "Usoskin2", "Usoskin3",
        "Klein", "Zeisel"
    )
)#分组,设置因子水平

colnames(d)[2] <- "Clustering"#修改列,第二列为列名
d$Clustering <- factor(d$Clustering, levels = c("Individual", "Consensus"))#再次分组

cols.clust <- c("Individual" = "#999999", "Consensus" = "#e41a1c")

d1 <- d %>%
    group_by(Clustering, Dataset) %>%
    dplyr::summarise(Median = median(ARI))#dplyr提供了一个分组函数group_by,把分组依据相同的数据组合成行,然后用summarise统计ARI的平均值

p <- ggplot(d, aes(x = 1, y = ARI, fill = Clustering, group = Clustering)) +
    geom_bar(data = d1, aes(y = Median), position="dodge", stat="identity", width = 0.1) +
    geom_point(position = position_jitterdodge(jitter.width = 0.1, dodge.width = 0.1), size = 0.4) +
    facet_wrap(ncol = 5, ~ Dataset) +
    scale_fill_manual(values = cols.clust) +
    geom_hline(yintercept = 0.8) +
    labs(x = "") +
    theme_classic(base_size = 14) +
    theme(axis.ticks.x = element_blank(), axis.text.x=element_blank(),
          axis.title.x=element_blank(), axis.line=element_blank()) +
    annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf, color = "black")+
    annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf, color = "black")

dummy <- ggplot(d, aes(x = 1, y = ARI, fill = Clustering)) +
    facet_wrap(ncol = 5, ~ Dataset) +
    geom_rect(aes(fill = Dataset), xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
    scale_fill_manual(values = cols) +
    theme_minimal()#根据Dataset绘制颜色

g1 <- ggplotGrob(p)
g2 <- ggplotGrob(dummy)

panels <- grepl(pattern="panel", g2$layout$name)
strips <- grepl(pattern="strip-t", g2$layout$name)
g2$layout$t[panels] <- g2$layout$t[panels] - 1
g2$layout$b[panels] <- g2$layout$b[panels] - 1

new_strips <- gtable_select(g2, panels | strips)#提取dummy的布局和颜色

## ideally you'd remove the old strips, for now they're just covered
new_plot <- gtable_stack(g1, new_strips)#合并

plot_grid(new_plot)

ggsave("pdf/1d.pdf", w = 6.3, h = 4.5)
ggsave("jpeg/1d.jpeg", w = 6.3, h = 4.5)
这里有个介绍dplyr包的博客,这绘图用了不少dplyr包的东西http://www.cnblogs.com/nxld/p/6060534.html

ggplot2的具体函数,个人觉得这个介绍的比较好http://www.cnblogs.com/nxld/p/6059603.html


Benchmarking of SC3 against existing methods
这个图是一次性出来的(图a,c,d),代码比较复杂,尝试强行介绍下
[Python] 纯文本查看 复制代码
library(cowplot)
library(pheatmap)
library(dplyr)

set.seed(12323414)
options(stringsAsFactors = FALSE)
font_size <- 8

gtable_select <- function (x, ...)
{
    matches <- c(...)
    x$layout <- x$layout[matches, , drop = FALSE]
    x$grobs <- x$grobs[matches]
    x
}

gtable_stack <- function(g1, g2){
    g1$grobs <- c(g1$grobs, g2$grobs)
    g1$layout <- transform(g1$layout, z= z-max(z), name="g2")
    g1$layout <- rbind(g1$layout, g2$layout)
    g1
}
#A图和之前的图绘制方法基本相似
get_a <- function() {
    d <- read.csv("data/2a.csv")
    
    d[d$Hierarchy == "Kolodziejczyk", ]$Hierarchy <- "Kolodz."
    
    d$Hierarchy <- factor(
        d$Hierarchy, 
        levels = c("Biase", "Yan", "Goolam", "Deng", "Pollen1", "Pollen2", 
                   "Kolodz.", "Treutlein", "Ting",
                    "Patel", "Usoskin1", "Usoskin2", "Usoskin3",
                    "Klein", "Zeisel")
    )
    
    d$Method <- factor(
        d$Method, 
        levels = c("SC3", "tSNE+kmeans", "pcaReduce", "SNN-Cliq", "SINCERA", "SEURAT")
    )
    
    cols <- c("Biase" = "#bc80bd", "Treutlein" = "#8dd3c7", "Ting" = "#ffffb3", 
              "Yan" = "#ccebc5", "Goolam" = "#ffed6f", "Deng" = "#bebada",
              "Pollen1" = "#fb8072", "Pollen2" = "#fb8072",
              "Patel" = "#80b1d3", "Usoskin1" = "#fdb462", "Usoskin2" = "#fdb462",
              "Usoskin3" = "#fdb462", "Kolodz." = "#bf812d",
              "Klein" = "#b3de69", "Zeisel" = "#fccde5", "Macosko" = "#d9d9d9")
    
    meth_cols <- c(
        "SC3" = "#e41a1c", 
        "tSNE+kmeans" = "#377eb8", 
        "pcaReduce" = "#40E0D0", 
        "SNN-Cliq" = "#984ea3", 
        "SINCERA" = "#ff7f00", 
        "SEURAT" = "#ffff33"
    )
    
    d1 <- d %>%
        group_by(Method, Hierarchy) %>%
        dplyr::summarise(Median = median(ARI))
    
    p <- ggplot(d, aes(x = 1, y = ARI, fill = Method, group = Method)) +
        geom_bar(data = d1, aes(y = Median), position="dodge", stat="identity") +
        geom_point(position = position_jitterdodge(jitter.width = 0.45, dodge.width = 0.9), size = 0.4) +
        facet_wrap(ncol = 5, ~ Hierarchy) +
        scale_fill_manual(values = meth_cols) +
        scale_colour_manual(values = meth_cols) +
        geom_hline(yintercept = 0.8) +
        labs(x = "") +
        theme_classic(base_size = font_size) +
        theme(axis.ticks.x = element_blank(), axis.text.x=element_blank(),
              axis.title.x=element_blank(), axis.line=element_blank(),
              legend.key.size = unit(0.4, "cm")) +
        annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf, color = "black")+
        annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf, color = "black")
    
    
    dummy <- ggplot(d, aes(x = 1, y = ARI, fill = Method)) +
        facet_wrap(ncol = 5, ~ Hierarchy) +
        geom_rect(aes(fill = Hierarchy), xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
        scale_fill_manual(values = cols) +
        theme_minimal()
    
    g1 <- ggplotGrob(p)
    g2 <- ggplotGrob(dummy)
    
    panels <- grepl(pattern="panel", g2$layout$name)
    strips <- grepl(pattern="strip-t", g2$layout$name)
    g2$layout$t[panels] <- g2$layout$t[panels] - 1
    g2$layout$b[panels] <- g2$layout$b[panels] - 1
    
    new_strips <- gtable_select(g2, panels | strips)
    
    new_plot <- gtable_stack(g1, new_strips)
    return(new_plot)
}

get_c <- function() {
    cols <- c("Treutlein" = "#8dd3c7", "Ting" = "#ffffb3", "Deng" = "#bebada",
              "Pollen2" = "#fb8072", "Patel" = "#80b1d3", 
              "Kolodziejczyk" = "#bf812d", "Usoskin3" = "#fdb462",
              "Klein" = "#40E0D0", "Zeisel" = "#fccde5", "Macosko" = "#d9d9d9")
    
    d <- read.csv("data/2c.csv")
    
    d$Dataset <- factor(
        d$Dataset,
        levels = c(
            "Deng",
            "Pollen2",
            "Kolodziejczyk",
            "Patel",
            "Usoskin3",
            "Klein",
            "Zeisel",
            "Macosko"
        )
    )
    
    d$Fraction <- factor(
        d$Fraction,
        levels = sort(unique(as.numeric(d$Fraction)))
    )
    
    p <- ggplot(d, aes(x = 1, ARI, fill = Dataset, color = Dataset)) +
        geom_boxplot(position = position_dodge(width = 1.5), outlier.size = 0.8) +
        geom_hline(yintercept = 0.8) +
        labs(x = "# of training cells as % of N", y = "ARI") +
        scale_fill_manual(values = cols) +
        scale_colour_manual(values = cols) +
        facet_grid(. ~ Fraction) +
        theme_classic(base_size = font_size) +
        theme(axis.ticks.x = element_blank(), axis.text.x=element_blank(),
              axis.title.x=element_blank(), axis.line=element_blank(),
              strip.background = element_rect(colour = "white"),
              legend.key.size = unit(0.4, "cm")) +
        ylim(0,1) +
        annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf, color = "black")+
        annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf, color = "black")
    p <- ggdraw(p) + 
        draw_label("% of total # of cells\nin a training set", 
                   fontface = "bold",
                   size = font_size-3,
                   x = 0.87, y = 0.93)
    return(p)
}

#热图,用pheatmap绘制[url]ftp://cran.r-project.org/pub/R/web/packages/pheatmap/pheatmap.pdf[/url]
get_d <- function() {
    d <- readRDS("data/2d.rds")
    ann <- data.frame(Stage = factor(d$cell.names, levels = unique(d$cell.names)))
    anno_colors <- list(Stage = c("#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C",
                                  "#FB9A99", "#FF00FF", "#FDBF6F", "#FF7F00", 
                                  "#CAB2D6", "#6A3D9A"))
    names(anno_colors$Stage) <- levels(ann$Stage)
    dat <- d$consensus
    colnames(dat) <- d$cell.names
    write.csv(dat[d$hc$order, d$hc$order], file = "data/2d.csv", quote = FALSE, row.names = FALSE)
    p <- pheatmap(d$consensus,
             cluster_rows = d$hc,
             cluster_cols = d$hc,
             cutree_rows = 10,
             cutree_cols = 10,
             treeheight_col = 9,
             treeheight_row = 9,
             annotation_col = ann,
             annotation_colors = anno_colors,
             show_rownames = F,
             show_colnames = F,
             fontsize = font_size,
             annotation_names_col = F,
             silent = TRUE)
    return(p$gtable)
}
#调整布局
first_col <- plot_grid(get_a(), get_c(), nrow = 2, labels = c("a", "c"), rel_heights = c(2, 1))#Arrange multiple plots into a grid,第一列图

second_col <- plot_grid(NULL, get_d(), nrow = 2, labels = c("b", "d"), rel_heights = c(1.5, 1))#第二列

plot_grid(first_col, second_col, ncol = 2)

ggsave("jpeg/2.jpeg", w = 9, h = 6)
ggsave("pdf/2.pdf", w = 9, h = 6)



图3是聚类基因的热图,用的还是pheatmap。这是一个很好的画热图的包,可以ftp://cran.r-project.org/pub/R/web/packages/pheatmap/pheatmap.pdf看看他的命名跑一遍,也可以看看群主的博客
代码和前面和相似
[Python] 纯文本查看 复制代码
library(pheatmap)
library(cowplot)

options(stringsAsFactors = FALSE)
font_size <- 8

d <- readRDS("data/3.rds")

dat <- d$exprs
write.csv(dat[, d$hc$order], file = "data/3.csv", quote = FALSE, row.names = FALSE)

anno_colors <- list(Cluster = c("#e41a1c", "#984ea3", "#40E0D0", "#FFFF33"))
names(anno_colors$Cluster) <- levels(d$ann$Cluster)

p <- pheatmap(d$exprs,
         cluster_cols = d$hc,
         cluster_rows = F,
         cutree_cols = 3,
         gaps_row = c(10, 20),
         annotation_col = d$ann,
         annotation_colors = anno_colors,
         show_colnames = F,
         silent = T)

plot_grid(p$gtable, nrow = 1)

ggsave("jpeg/3.jpeg", w = 9, h = 6)
ggsave("pdf/3.pdf", w = 9, h = 6)




本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x



上一篇:lncRNA鉴定中CNCI的使用
下一篇:【免疫组测序-2】免疫组测序背景介绍(免疫组库)
回复

使用道具 举报

0

主题

26

帖子

197

积分

注册会员

Rank: 2

积分
197
发表于 2017-9-7 21:30:25 | 显示全部楼层
谢谢楼主!!!
回复

使用道具 举报

2

主题

41

帖子

331

积分

中级会员

Rank: 3Rank: 3

积分
331
发表于 2017-10-12 00:26:50 | 显示全部楼层
谢谢楼主分享~
单细胞测序也是生物信息学发展的一个重要方向
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树    

GMT+8, 2017-12-18 05:39 , Processed in 0.174766 second(s), 30 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.