搜索
查看: 4278|回复: 0

[CHIP-seq] 三种peaks的区别-broad-narrow-gap

[复制链接]

634

主题

1182

帖子

4030

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
4030
发表于 2016-11-25 16:38:17 | 显示全部楼层 |阅读模式
初学者对这个问题肯定很感兴趣:https://www.biostars.org/p/159925/ Many publicly available data sets (road map epigenomics data in particular) has three kinds of peaks (based on different parameters used to do peak calls in MACS2) namely "narrow" peaks, "broad" peaks and "gapped" peaks. I was wondering how are these three types of peaks different from each other and when doing downstream analysis using this kinds of peaks data would it be ok to use bedtools to merge these peak regions and use that instead of doing the analysis individually on all the different types of peaks.


you can find more info on the UCSC site and forum .
narrowpeaks can be generally called for TF, since the region bound is pretty much limited. broadpeaks are better for histone modifications or histone modifiers since the regions can be much wider. I'm not totally sure about gapped peaks but looks like you can used them when combiing regions that contain gaps (yeah, intuitive...).
and yes, you can use them for downstream analysis.
why would you want to merge them tho? that can give you gigantic peaks with basically no information. if you look at the co-binding of multiple TFs or histone marks, then I'd go for overlaps.
We used MACS2 to identify three types of regions of enrichment: (i) narrow peaks of contiguous enrichment (narrowPeaks) that pass a Poisson P value threshold of 0.01; (ii) broader regions of enrichment (broadPeaks) that pass a Poisson P value threshold of 0.1 (using MACS2’s broad peak mode); and (iii) gapped/chained regions of enrichment (gappedPeaks) defined as broadPeaks that contain at least one strong narrowPeak. To obtain reliable regions of enrichment, we restricted our analysis to enriched regions identified using pooled data that were also independently identified in both pseudoreplicates. The coverage and conservation analysis only used histone modification datasets from the Broad Institute Production group. We used the gappedPeak representation for the histone marks with relatively compact enrichment patterns. These include H3K4me3, H3K4me2, H3K4me1, H3K9ac, H3K27ac, and H2A.Z. For the diffused histone marks, H3K36me3, H3K79me2, H3K27me3, H3K9me3, and H3K9me1, we used the broadPeak representation. These peak calls were not optimally thresholded by design to allow for analysis of genomic coverage over a wide range of signal enrichment. Additional details and step-by-step instructions are provided at https://sites.google.com/site/anshulkundaje/projects/ encodehistonemods. The gappedPeak and broadPeak files can be downloaded from www.broadinstitute.org/∼anshul/projects/encode/rawdata/peaks_ histone/mar2012/broad/combrep_and_ppr/. The narrowPeak files (not used in any of the analyses) can be downloaded from www.broadinstitute.org/∼anshul/projects/encode/ rawdata/peaks_histone/mar2012/narrow/combrep_and_ppr/. The negative log10 of Poisson P values of enrichment present in column 8 of the peak files was used as scores for the peaks in the coverage analysis. DNase-I high-resolution footprints. High-resolution footprints from deep DNase-seq data (January 2011 freeze) were previously identified in ENCODE Project Consortium 2012. These can be downloaded from http://ftp.ebi.ac.uk/pub/databases/ensembl/ encode/integration_data_jan2011/byDataType/footprints/jan2011/ gencode_TF_footprints.out. Bound TF motifs. TF binding site motif instances present within ChIP-seq peaks of the corresponding TFs were previously identified in ENCODE Project Consortium 2012 (January 2011 freeze). These can be downloaded from http://ftp.ebi.ac.uk/pub/ databases/ensembl/encode/integration_data_jan2011/byDataType/ motifs/jan2011/bound_motifs.bed. Repeat elements. Repeat Master annotations were downloaded from the University of California, Santa Cruz (UCSC) genome browser (April 2011). The file that was used can be downloaded from http://woldlab.caltech.edu/∼georgi/ENCODE-Function-2014_ public/repeatMasker/hg19-repeats.





上一篇:什么是network analysis
下一篇:计算机模拟测序数据来探索RNA-seq
你这个问题很复杂,需要打赏,请点击 http://www.bio-info-trainee.com/donate 进行打赏,谢谢
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-9-16 15:41 , Processed in 0.028909 second(s), 26 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.