搜索
查看: 5437|回复: 3

circos画图中snp dinsity数据准备

[复制链接]

6

主题

23

帖子

201

积分

中级会员

Rank: 3Rank: 3

积分
201
发表于 2017-3-20 17:41:15 | 显示全部楼层 |阅读模式
本帖最后由 qin_qinyang 于 2017-3-20 17:45 编辑

在circos画图中,突变的数据经常会以SNP密度来表示该样本在染色体上的突变情况。
例如:图中所示。
现在就根据bed文件,计算每MB的SNP密度。
方法1:根据ggplot
[AppleScript] 纯文本查看 复制代码
library(ggplot2)
genes <- read.table("test1.txt",sep="\t",header=T)
head(genes)
#   chr   stard     end
#1 chr1 1607614 1607614
#2 chr1 2332350 2332350
#3 chr1 3703465 3703465
#4 chr1 3746444 3746444
#5 chr1 6504544 6504544
#6 chr1 6529135 6529135
genes$chr <- with(genes, factor(chr, levels=paste("chr",c(1:22,"X","Y"),sep=""), ordered=TRUE))
plottedGenes <- ggplot(genes) + geom_histogram(aes(x=pos),binwidth=1000000) + facet_wrap(~chr,ncol=2) + ggtitle("RefSeq genes density over human genome 19") + xlab("Genomic position (bins 1 Mb)") + ylab("Number of genes")
allPlottingData <- ggplot_build(plottedGenes)
df <- allPlottingData[["data"]][[1]]

写出数据就直接有dinsity
方法2:在github中找到的参数:
[AppleScript] 纯文本查看 复制代码
knownCanonical = read.delim("test1.txt")

# Measure gene density per megabase
interval = as.numeric(100000)
####################################
# print(ls.str())
chromosomes = unique(knownCanonical[,1])
# Remove mitochondrial chromosome (chrM) and anything with underscores
chromosomes = grep("_", chromosomes, invert=TRUE, value=TRUE)
chromosomes = grep("chrM", chromosomes, invert=TRUE, value=TRUE)
# a = sapply(chromosomes, function(chr){
#         ind = which(knownCanonical[,1] = chr)
#         })
outFile="outFile.txt"
write("#chr\tchrStart\tchrEnd\tdensity", file=outFile, append=FALSE)
for( chr in chromosomes ){
  ind = which(knownCanonical[,1] == chr)
  chrMin = min(knownCanonical[ind,2])
  chrMax = max(knownCanonical[ind,3])
  numIntervals = ceiling((chrMax-chrMin)/interval)
  for( i in 1:numIntervals ){
    # Get the value of the start and end positions on the chromosome
    # for this interval
    thisIntervalMin = chrMin + (i-1)*interval
    thisIntervalMax = min(chrMin + i*interval-1, chrMax) 
    # Get the indices of genes in this interval
    thisIntervalInd = which((knownCanonical[,2] >= thisIntervalMin) &
                              (knownCanonical[,3] < thisIntervalMax))
    
    # Get the density of this interval
    thisIntervalDensity = length(thisIntervalInd)/(
      thisIntervalMax - thisIntervalMin)
    
    write.table(matrix(c(chr, thisIntervalMin, thisIntervalMax, 
                         thisIntervalDensity), nrow=1), file=outFile, append=TRUE,sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE)
  }}
    


3.方法2:利用bedmap
确保文件中没有^M
替换命令:sed -i 's/\r//' hg19.chromSizes.bed

[AppleScript] 纯文本查看 复制代码
sort-bed hg19.chromSizes.bed > hg19.chromSizes.sort.bed
bedops --chop 1000000  hg19.chromSizes.sort.bed |bedmap --echo --count --delim '\t' -  test.bed

三种方法的结果略有出入,有感兴趣的同学可以看看造成差异的原因。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x



上一篇:Pfam数据库蛋白编码能力预测
下一篇:wgcna流程代码
回复

使用道具 举报

0

主题

3

帖子

47

积分

新手上路

Rank: 1

积分
47
发表于 2017-3-20 19:43:47 | 显示全部楼层
在circos 中用conf文件也能实现,有一例子:
CIRCOS.CONF
<<include etc/colors_fonts_patterns.conf>>

<<include ideogram.conf>>
<<include ticks.conf>>

<image>
<<include etc/image.conf>>
</image>

karyotype = data/karyotype/karyotype.human.txt

chromosomes_units           = 1000000
chromosomes                 = -hsX;-hsY
chromosomes_display_default = yes

<plots>

# Data out of bounds should be hidden. Otherwise the
# default is to clip the data to range min/max.
range = hide

# scatter plot for values [-3,0]
<plot>
type = scatter
file = data/8/13/data.cnv.txt
r0   = 0.6r
r1   = 0.75r
min  = -3
max  = 0
glyph = circle
glyph_size = 8
color = red

<axes>
<axis>
color     = lred
thickness = 2
spacing   = 0.1r
</axis>
</axes>

<backgrounds>
<background>
color = vlred_a5
</background>
</backgrounds>

<rules>
<rule>
condition  = 1
glyph_size = eval( 6 + 4*abs(var(value)))
flow       = continue
</rule>
<rule>
condition  = var(value) < -2
stroke_color = black
stroke_thickness = 2
</rule>
</rules>
</plot>

# scatter plot for values [0,3]
<plot>
type = scatter
file = data/8/13/data.cnv.txt
r0   = 0.75r
r1   = 0.9r
min  = 0
max  = 3
glyph = circle
glyph_size = 8
color = green

<axes>
<axis>
color     = lgreen
thickness = 2
spacing   = 0.1r
</axis>
</axes>

<backgrounds>
<background>
color = vlgreen_a5
</background>
</backgrounds>

<rules>
<rule>
condition  = 1
glyph_size = eval( 6 + 4*abs(var(value)))
flow       = continue
</rule>
<rule>
condition    = var(value) > 2
stroke_color = black
stroke_thickness = 2
</rule>
</rules>

</plot>

# scatter plot for values [-3,3] turned into a heat map
# by using r0=r1
<plot>
type = scatter
file = data/8/13/data.cnv.txt
r0   = 0.935r
r1   = 0.935r
min  = -3
max  = 0
glyph = square
glyph_size = 8
fill_color = red

<rules>
<rule>
condition  = 1
fill_color = eval( "red_a" . remap_int(var(value),-3,3,1,5) )
</rule>
</rules>

</plot>

# scatter plot for values [0,3] turned into a heat map
# by using r0=r1
<plot>
type = scatter
file = data/8/13/data.cnv.txt
r0   = 0.955r
r1   = 0.955r
min  = 0
max  = 3
glyph = square
glyph_size = 8
fill_color = green

<rules>
<rule>
condition  = 1
fill_color = eval( "green_a" . remap_int(var(value),0,3,1,5) )
</rule>
</rules>

</plot>

</plots>

<<include etc/housekeeping.conf>>


BANDS.CONF

show_bands            = yes
fill_bands            = yes
band_stroke_thickness = 2
band_stroke_color     = white
band_transparency     = 3

IDEOGRAM.CONF


<ideogram>

<spacing>
default = 0.005r
break   = 0.5r

axis_break_at_edge = yes
axis_break         = yes
axis_break_style   = 2

<break_style 1>
stroke_color = black
fill_color   = blue
thickness    = 0.25r
stroke_thickness = 2
</break>

<break_style 2>
stroke_color     = black
stroke_thickness = 2
thickness        = 1.5r
</break>

</spacing>

<<include ideogram.position.conf>>
<<include ideogram.label.conf>>
<<include bands.conf>>

</ideogram>


IDEOGRAM.LABEL.CONF

show_label       = yes
label_font       = default
label_radius     = dims(image,radius) - 50p
label_size       = 36
label_parallel   = yes
label_case       = lower
label_format     = eval(sprintf("chr%s",var(label)))


IDEOGRAM.POSITION.CONF

radius           = 0.90r
thickness        = 30p
fill             = yes
fill_color       = black
stroke_thickness = 2
stroke_color     = black

TICKS.CONF


show_ticks          = yes
show_tick_labels    = yes

<ticks>
tick_separation      = 3p
label_separation     = 5p
radius               = dims(ideogram,radius_outer)
multiplier           = 1e-6
color          = black
size           = 20p
thickness      = 4p
label_offset   = 5p
format         = %d

<tick>
spacing        = 1u
show_label     = yes
label_size     = 16p
</tick>

<tick>
spacing        = 5u
show_label     = yes
label_size     = 18p
</tick>

<tick>
spacing        = 10u
show_label     = yes
label_size     = 20p
</tick>

<tick>
spacing        = 20u
show_label     = yes
label_size     = 24p
</tick>

</ticks>
回复 支持 反对

使用道具 举报

0

主题

3

帖子

107

积分

注册会员

Rank: 2

积分
107
发表于 2017-10-29 09:23:18 | 显示全部楼层
有个问题,请问能指导一下吗?能加您的qq吗,我的“276398980”
回复 支持 反对

使用道具 举报

13

主题

44

帖子

522

积分

高级会员

Rank: 4

积分
522
发表于 2018-1-30 10:18:16 | 显示全部楼层
请问这个图怎么看啊,不太懂?
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-8-23 07:21 , Processed in 0.030777 second(s), 26 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.