搜索
查看: 2642|回复: 1

[CHIP-seq] ENCODE计划中的TF的ChIP-seq结果的motif展示

[复制链接]

633

主题

1182

帖子

4030

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
4030
发表于 2017-3-13 11:19:46 | 显示全部楼层 |阅读模式
都在这个网站:http://compbio.mit.edu/encode-motifs/


This website allows you to browse known and discovered motifs for the ENCODE TF ChIP-seq datasets.
Pouya Kheradpour and Manolis Kellis
Nucleic Acids Research, 2013 December 13, doi:10.1093/nar/gkt1249

Send any questions/comments to Pouya Kheradpour.
  • Each experiment is put into a "factor group" on the basis of ChIP TF and its known motifs with the intention to group factors with very similar motifs. Known motifs are assigned to the factor group using the same critera.
  • The navigation is done in the left frame. For each factor group it indicates the the number of known motifs, discovered motifs, and experimental datasets. Clicking on the headers will resort the table.
  • The "discovered motifs" for each factor group are the top 10 (in terms of enrichment in their discovery dataset using the Intergenic background) where no two are more than 0.75 similar to each other (this prevents very similar variants of the same motif from being taken).
  • Enrichments are computed by taking the fraction of motif instances that are inside the bound regions and dividing that by the fraction of shuffle motif instances inside (where the bound regions are filtered against the background regions, defined below). They are also corrected for small counts by using a confidence interval (with Z=1.5) around each fraction and taking the extreme which leads to the enrichment closest to 1.
  • Clicking on a factor group will change the middle and right frames.
    • The middle frame shows the known and discovered motifs. Clicking on the name of the motif will highlight it in the heatmap. Clicking on the logo will provide the PFM (position frequency matrix). For all PFMs, see motifs.txt below.
    • The right frame is a heatmap indicating:
      • Top; in white/black color scale: The similarity (in correlation) between the known/discovered motifs. This is computed directly from the PFMs without using the genome at all.
      • Below; in white/red scale: the enrichment of each of the motifs. Enrichments are not available for motifs with too little information content or for which control motifs could not be created. The enrichments are for three different background regions as the three triangles (all intergenic/intronic, only +/-2kb from TSS and outside +/- 2kb, in top, left and right, respectively). All three backgrounds exclude coding, 3'UTRs, and repetitive regions. The indicated number is for all Intergenic. Experiments names are systematically named using a name mapping scheme.
  • Motifs are matched to the genome using a p-value of 4^-8 (threshold for each motif computed using TFM-PVALUE). A custom program is used to do the actual matching.
  • motif-disc.pdf (13M): A printable version of the web page with logos and heatmaps for each factor group.
  • encode-motifs-v1.3.tar.gz (43K): software to (1) compute enrichments and produce heatmaps on custom data and (2) perform unified motif discovery. See README contained within for more information.
  • The following bulk datafiles are available:
    • matches.txt.gz (962M): the motif matches which can be used for carrying out custom analyses (all coordinates are in hg19).
      • The file is 1 indexed, end inclusive.
      • The strand of the motif may not match the logo displayed on this website (which may be flipped to match others in the factor group). See the motifs.txt file below for the strand used to produce these matches.
      • matches-with-controls.txt.gz (11G) contains all the matches as matches.txt.gz, but also contains matches for the shuffled control motifs (indicated with _C#).
      • matches-with-controls-0.3.txt.gz (1.3G) motif instances at 0.3 confidence level based on conservation in closely related species (Kheradpour, et al. 2007; Lindblad-Toh, et al. 2011).
    • back-regions.txt.gz (29M): the background regions used for the analysis. This file is also 1 indexed and end inclusive.
    • motifs-sim.txt.gz (15M): similarities between all pairs of motifs.
    • motifs.txt (1.1M): all the known and discovered motifs.
    • motifs-toscan.txt.gz (875K): known and discovered motifs plus the control shuffles in log-odds format with cut-off following name.
    • enrichments.txt.gz (34M): the enrichments of every motif in every dataset. Columns indicate the (1) background, (2) dataset with (3) corresponding factor group, (4) motif, (5) enrichment (as defined above), the count of the motif in the (6) background and (7) foreground, and the count of the control motifs in the (8) background and (9) foreground.
    • exp-regions-motifs.txt.gz (123M): for each experimental region (with names as described in the mapping scheme) a semicolon separated list of matching motifs (in the order they occur on the positive strand).
Last modified on 2013-11-01.
Previous versions: 201006freeze 201101freeze-run1 20110629freeze-run2



上一篇:ceRNA分析介绍和策略
下一篇:Genomic region black lists 这个概念很重要
你这个问题很复杂,需要打赏,请点击 http://www.bio-info-trainee.com/donate 进行打赏,谢谢
回复

使用道具 举报

633

主题

1182

帖子

4030

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
4030
 楼主| 发表于 2017-3-13 11:21:17 | 显示全部楼层
其实没看懂,感觉很有用,http://egg2.wustl.edu/roadmap/web_portal/predict_reg_motif.html 先放在这里吧
你这个问题很复杂,需要打赏,请点击 http://www.bio-info-trainee.com/donate 进行打赏,谢谢
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-12-16 11:36 , Processed in 0.035481 second(s), 24 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.