搜索
查看: 6782|回复: 0

ChIP-seq之一篇文章学会ChIP-seq分析(上)

[复制链接]

23

主题

37

帖子

374

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
374
发表于 2017-7-19 17:35:39 | 显示全部楼层 |阅读模式
一篇文章学会ChIP-seq分析(上)原文链接

写在前面:《一篇文章学会ChIP-seq分析(上)》《一篇文章学会ChIP-seq分析(下)》为生信菜鸟团博客相关文章合集,共九讲内容。带领你从相关文献解读、资料收集和公共数据下载开始,通过软件安装、数据比对、寻找并注释peak、寻找motif等ChIP-seq分析主要步骤入手学习,最后还会介绍相关可视化工具。第一讲:文献选择与解读
文献;CARM1 Methylates Chromatin Remodeling Factor BAF155 to Enhance Tumor Progression and Metastasis
我很早以前想自学CHIP-seq的时候关注过这篇文章,那时候懂得还不多,甚至都没有仔细看这篇文章就随便下载了数据进行分析,也只是跑一些软件而已。这次仔细阅读这篇文章才发现里面门道很多,尤其是ChIP-seq的实验基础和表观遗传学的生物学基础知识。
作者首先实验证明了用small haripin RNA来knockout CARM1 只能达到90%的敲除效果,有趣的是,对CARM1的功能影响非常小,说明只需要极少量的CARM1就可以发挥很好的作用,因此作者通过zinc finger nuclease这种基因组编辑技术设计了100%敲除CARM1的实验材料。(当然,现在有更好的基因编辑技术啦)
这样就能比较CARM1有无时各种蛋白被催化状态了,其中SWI/SNF(BAF) chromatin remodeling complex 染色质重构复合物的一个亚基 BAF155,非常明显的只有在CARM1这个基因完好无损的细胞系里面才能被正常的甲基化。作者证明了BAF155是CARM1这个基因非常好的一个底物, 而且通过巧妙的实验设计,证明了BAF155这个蛋白的第1064位氨基酸(R) 是 CARM1的作用位点。
因为早就有各种文献说明了SWI/SNF(BAF) chromatin remodeling complex 染色质重构复合物在癌症的重要作用, 所以作者也很自然想探究BAF155在癌症的功能详情,这里作者选择的是ChIP-seq技术。BAF155是作为SWI/SNF(BAF) chromatin remodeling complex 染色质重构复合物的一个组分,必然neng 直接或者间接的结合DNA咯。而ChIP-seq技术最适合来探究能直接或者间接结合DNA的蛋白的功能,所以作者构造了一种细胞系(MCF7),它的BAF155蛋白的第1064位氨基酸(R) 突变而无法被CARM1这个基因催化而甲基化,然后比较突变的细胞系和野生型细胞系的BAF155的两个ChIP-seq结果,这样就可以研究BAF155是否必须要被CARM1这个基因催化而甲基化后才能行使生物学功能。
作者用me-BAF155特异性抗体+western bloting 证明了正常的野生型MCF7细胞系里面有~74%的BAF155被甲基化。
有一个细胞系SKOV3,可以正常表达除了BAF155之外的其余14种SWI/SNF(BAF) chromatin remodeling complex 染色质重构复合物,而不管是把突变的细胞系和野生型细胞系的BAF155混在里面都可以促进染色质重构复合物的组装,所以甲基化与否并不影响这个染色质重构复合物的组装,重点应该研究的是甲基化会影响BAF155在基因组其它地方结合。
结果显示,突变的细胞系和野生型细胞系种BAF155在基因组结合位置(peaks)还是有较大的overlap的,重点是看它们的peaks在各种基因组区域(基因上下游,5,3端UTR,启动子,内含子,外显子,基因间区域,microRNA区域)分布情况的差别,还有它们距离转录起始位点的距离的分布区别,还有它们注释到的基因区别,已经基因富集到什么通路等等。
虽然作者在人的细胞系(MCF7)上面做ChIP-seq,但是在老鼠细胞系(MDA-MB-231)做了mRNA芯片数据分析,BAF155这个蛋白的第1064位氨基酸(R) 突变细胞系和野生型细胞系,用的是Affymetrix HG U133 Plus 2.0这个常用平台。
which was hybridized to Affymetrix HG U133 Plus 2.0 microarrays containing 54,675 probesets for >47,000 transcripts and variants, including 38,500 human genes.
To identify genes differentially expressed between MDA-MB-231-BAF155WT and MDA-MB-231-BAF155R1064K
表达矩阵下载地址:[url=]http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4004525/bin/NIHMS556863-supplement-03.xlsx[/url]
我简单摘抄作者ChIP-seq数据的生物信息学分析结果
  • All samples were mapped from fastq files using BOWTIE [-m 1 -- best] to mm9 [UCSCmouse genome build 9]
  • Sequences were mapped to the human genome (hg19) using BOWTIE (--best –m 1) to yield unique alignments
  • Peaks were called by using HOMER [[url=]http://biowhat.ucsd.edu/homer/[/url]] and QuEST [[url=]http://mendel.stanford.edu/sidowlab/downloads/quest/[/url]].

用到的软件有
  • *QuEST 2.4 *(Valouev et al., 2008) was run using the recommend settings for transcription factor (TF) like binding with the following exceptions:
    kdebandwith=30, regionsize=600, ChIP threshold=35, enrichment fold=3, rescue fold=3.
  • *HOMER *(Heinz et al., 2010) analysis was run using the default settings for peak finding.
    False Discovery Rate (FDR) cut off was *0.001 (0.1%) for all peaks. *
    The tag density for each factor was normalized to 1x107 tags and displayed using the UCSC genome browser.
  • Motif analysis (de novo and known), was performed using the* HOMER software and Genomatix. *
  • *Peak overlaps *were processed with HOMER and Galaxy (Giardine et al., 2005).
  • *Peak comparisons *between replicates were processed with EdgeR statistical package in R

以上就是我们接下来需要学习的流程化分析步骤,下面我给一个主要流程的截图,但主要是实验是如何设计
这里有一个文章发表了关于CHIP-seq的流程的:[url=]http://biow.sb-roscoff.fr/ecolebioinfo/protected/jacques.van-helden/ThomasChollierNatProtoc2012peak-motifs.pdf[/url]
[url=][/url]
同时我还推荐大家看几篇相关文献
  • Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. [url=]http://www.nature.com/nature/journal/v448/n7153/pdf/nature06008.pdf[/url]
  • Mapping and analysis of chromatin state dynamics in nine human cell types(GSE26386): [url=]http://www.nature.com/nature/journal/v473/n7345/full/nature09906.html[/url]
  • Promiscuous RNA binding by Polycomb Repressive Complex 2 [url=]http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3823624/pdf/nihms517229.pdf[/url]


第二讲:资料收集
CHIP-seq的确是非常完善的NGS流程,各种资料层出不穷。

大家首先可以看下面几个完整流程的PPT来对CHIP-seq流程有个大致的印象,我对前面提到的文献数据处理的几个要点,就跟下面这个图片类似。
  • QuEST is a statistical software for analysis of ChIP-Seq data with data and analysis results visualization through UCSC Genome Browser. [url=]http://www-hsc.usc.edu/[/url]~valouev/QuEST/QuEST.html
  • peak calling 阈值的选择: [url=]http://www.nature.com/nprot/journal/v7/n1/figtab/nprot.2011.420F2.html[/url]
  • MeDIP-seq and histone modification ChIP-seq analysis [url=]http://crazyhottommy.blogspot.com/2014/01/medip-seq-and-histone-modification-chip.html[/url]
  • 2011-review-CHIP-seq-high-quaility-data: [url=]http://www.nature.com/ni/journal/v12/n10/full/ni.2117.html?message-global=remove[/url]
  • 不同处理条件的CHIP-seq的差异peaks分析: [url=]http://www.slideshare.net/thefacultyl/diffreps-automated-chipseq-differential-analysis-package[/url]
  • 一个实际的CHIP-seq数据分析例子: [url=]http://www.biologie.ens.fr/[/url]~mthomas/other/chip-seq-training/
  • [url=]http://biow.sb-roscoff.fr/ecolebioinfo/trainingmaterial/chip-seq/documents/presentation_chipseq.pdf[/url]
  • [url=]http://ecole-bioinfo-aviesan.sb-roscoff.fr/sites/ecole-bioinfo-aviesan.sb-roscoff.fr/files/files/chipseqCarlHerrmannRoscoff2015.pdf[/url]
  • [url=]http://ecole-bioinfo-aviesan.sb-roscoff.fr/sites/ecole-bioinfo-aviesan.sb-roscoff.fr/files/files/defrance-ChIP-seq_annotation.pdf[/url]

然后下面的各种资料,是针对CHIP-seq流程的各个环境的,还有一些是针对于表观遗传学知识
  • ppt : [url=]http://159.149.160.51/epigenmilano/epigenbarozzi.pdf[/url]
  • best practise: [url=]http://bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/[/url]
  • pipeline : [url=]https://github.com/shenlab-sinai/chip-seq_preprocess[/url]
  • [url=]https://sites.google.com/site/anshul...e/projects/idr[/url] ## samtools view -b -F 1548 -q 30 chipSampleRep1.bam
  • pipeline : [url=]http://daudin.icmb.utexas.edu/wiki/index.php/ChIPseqprepand_map[/url]
  • pipeline : [url=]https://github.com/BradyLab/ChipSeq/blob/master/chipseq.sh[/url]
  • [url=]https://github.com/crukci-bioinformatics/chipseq-pipeline[/url]
  • [url=]https://github.com/ENCODE-DCC/chip-seq-pipeline[/url]
  • Hands-on introduction to ChIP-seq analysis - VIB Training [url=]http://www.biologie.ens.fr/[/url]~mthomas/other/chip-seq-training/
  • video(A Step-by-Step Guide to ChIP-Seq Data Analysis Webinar) : [url=]http://www.abcam.com/webinars/a-step-by-step-guide-to-chip-seq-data-analysis-webinar[/url]
  • Using ChIP-Seq to identify and/or quantify bound regions (peaks)[url=]http://barcwiki.wi.mit.edu/wiki/SOPs/chipseqpeaks[/url]
  • [url=]http://jura.wi.mit.edu/bio/education/hottopics/ChIPseq/ChIPSeqHotTopics.pdf[/url]
  • [url=]http://pedagogix-tagc.univ-mrs.fr/courses/ASG1/practicals/chip-seq/mapping_tutorial.html[/url]
  • 公开课: [url=]https://www.coursera.org/learn/galaxy-project/lecture/FUzcg/chip-sequence-analysis-with-macs[/url]
  • EBI的教程:[url=]https://www.ebi.ac.uk/training/online/course/ebi-next-generation-sequencing-practical-course/chip-seq-analysis/chip-seq-practical[/url]
  • 台湾教程:[url=]http://lsl.sinica.edu.tw/Services/Class/files/20151118475_2.pdf[/url] 徐唯哲 Paul Wei-Che HSU
  • peak finder软件大全: [url=]http://wodaklab.org/nextgen/data/peakfinders.html[/url]
  • [url=]https://www.encodeproject.org/documents/049704a4-5c58-4631-acf1-4ef152bdb3ef/@@download/attachment/LearningChromatinStatesfromChIP-seq_data.pdf[/url]
  • [url=]https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/47aq5pp5mzza5vb/PDFs/TuesdayMBChIP-Seq_Intro.pdf[/url]
  • paper: Large-Scale Quality Analysis of Published ChIP-seq Data [url=]http://www.g3journal.org/content/4/2/209.full[/url]
  • paper: Chip-seq data analysis: from quality check to motif discovery and more [url=]http://ccg.vital-it.ch/var/sibapril15/cases/landt12/strandcorrelation.html[/url]
  • Workshop hands on session(RNA-Seq / ChIP-Seq ) : [url=]https://hpc.oit.uci.edu/biolinux/handson.docx[/url]
  • [url=]http://www.gqinnovationcenter.com/documents/bioinformatics/ChIPseq.pptx[/url]
  • paper supplement : [url=]http://genome.cshlp.org/content/suppl/2015/10/02/gr.192005.115.DC1/Supplemental_Information.docx[/url]
  • [url=]http://www.illumina.com/documents/products/datasheets/datasheetchipsequence.pdf[/url]
  • [url=]http://www.ncbi.nlm.nih.gov/pubmed/22130887[/url] "Analyzing ChIP-seq data: preprocessing, normalization, differential identification, and binding pattern characterization."
  • [url=]http://www.ncbi.nlm.nih.gov/pubmed/22499706[/url] "Normalization, bias correction, and peak calling for ChIP-seq." (stat heavy)
  • [url=]http://www.ncbi.nlm.nih.gov/pubmed/24244136[/url] "Practical guidelines for the comprehensive analysis of ChIP-seq data."
  • [url=]http://www.ncbi.nlm.nih.gov/pubmed/25223782[/url] "Identifying and mitigating bias in next-generation sequencing methods for chromatin biology."
  • [url=]http://www.ncbi.nlm.nih.gov/pubmed/24598259[/url] "Impact of sequencing depth in ChIP-seq experiments."
  • figures: [url=]https://github.com/shenlab-sinai/ngsplot[/url]

可视化工具
  • [url=]https://github.com/daler/metaseq[/url]
  • [url=]http://liulab.dfci.harvard.edu/CEAS/usermanual.html[/url]

bioconductor系列工具和教程 :
  • [url=]http://faculty.ucr.edu/[/url]~tgirke/HTMLPresentations/Manuals/WorkshopDec610_2012/Rchipseq/Rchipseq.pdf
  • [url=]http://bioinformatics-core-shared-training.github.io/cruk-bioinf-sschool/Day4/chipqc_sweave.pdf[/url]
  • [url=]http://bioconductor.org/packages/release/bioc/html/chipseq.html[/url]
  • [url=]http://bioconductor.org/help/workflows/chipseqDB/[/url]
  • [url=]http://bioconductor.org/help/workflows/generegulation/[/url]
  • [url=]http://bioconductor.org/help/course-materials/2009/EMBLJune09/Practicals/chipseq/BasicChipSeq.pdf[/url]

公司教程
  • [url=]http://www.partek.com/Tutorials/microarray/Tiling/ChipSeqTutorial.pdf[/url]


第三讲:公共数据下载
这一步跟自学其它高通量测序数据处理一样,就是仔细研读paper,在里面找到作者把原始测序数据放在了哪个公共数据库里面,一般是NCBI的GEO,SRA,本文也不例外,然后解析样本数,找到下载链接规律。
[AppleScript] 纯文本查看 复制代码
## step1 : download raw data
> cd ~
> mkdir CHIPseq_test && cd CHIPseq_test
> mkdir rawData && cd rawData
> ## batch download the raw data by shell script :
> for ((i=593;i<601;i++)) ;do wget [[url=ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP033/SRP033492/SRR1042]ftp://ftp-trace.ncbi.nlm.nih.gov ... 3/SRP033492/SRR1042[/url]]([url=ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP033/SRP033492/SRR1042]ftp://ftp-trace.ncbi.nlm.nih.gov ... 3/SRP033492/SRR1042[/url])$i/SRR1042$i.sra;done
很容易就下载了8个测序文件,每个样本的数据大小,测序量如下
[AppleScript] 纯文本查看 复制代码
> 621M Jun 27 14:03 SRR1042593.sra (16.9M reads)
> 2.2G Jun 27 15:58 SRR1042594.sra (60.6M reads)
> 541M Jun 27 16:26 SRR1042595.sra (14.6M reads)
> 2.4G Jun 27 18:24 SRR1042596.sra (65.9M reads)
> 814M Jun 27 18:59 SRR1042597.sra (22.2M reads)
> 2.1G Jun 27 20:30 SRR1042598.sra (58.1M reads)
> 883M Jun 27 21:08 SRR1042599.sra (24.0M reads)
> 2.8G Jun 28 11:53 SRR1042600.sra (76.4M reads)
虽然下载的SRA格式数据也是一个很流行的标准,但它只是数据压缩的标准,几乎没有软件能直接跟SRA的格式的测序数据来进行分析,我们需要转成fastq格式,代码如下:
[AppleScript] 纯文本查看 复制代码
> ## step2 :  change sra data to fastq files.
> ## cell line: MCF7 //  Illumina HiSeq 2000 //  50bp // Single ends // phred+33
> ## [[url=http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52964]http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52964[/url]]([url=http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52964]http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52964[/url])
> ## [[url=ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP033/SRP033492]ftp://ftp-trace.ncbi.nlm.nih.gov ... RP/SRP033/SRP033492[/url]]([url=ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP033/SRP033492]ftp://ftp-trace.ncbi.nlm.nih.gov ... RP/SRP033/SRP033492[/url])
> ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump $id;done
> rm *sra
解压的详情如下,可以看到SRA格式有6~9倍的压缩了,比zip格式压缩的2~3倍高多了
[AppleScript] 纯文本查看 复制代码
##  621M --> 3.9G
##  2.2G --> 14G
##  541M --> 3.3G
##  2.4G --> 15G
第四讲:必要软件安装及结果下载
博文的顺序有点乱,因为怕读到前面的公共测序数据下载这篇文章的朋友搞不清楚,我如何调用各种软件的,所以我这里强势插入一篇博客来描述这件事,当然也只是略过,我所有的软件理论上都是安装在我的home目录下的biosoft文件夹,所以你看到我一般安装程序都是:
[AppleScript] 纯文本查看 复制代码
cd ~/biosoft[/size][/font][/color][/align][align=left][color=rgb(80, 97, 109)][font=&quot;][size=16px]mkdir macs2 && cd macs2 ##指定的软件安装在指定文件夹里面
这只是我个人的安装习惯,因为我不是root,所以不能在linux系统下做太多事,我这里贴出我所有的软件安装代码:
[AppleScript] 纯文本查看 复制代码
## pre-step: download sratoolkit /fastx_toolkit_0.0.13/fastqc/bowtie2/bwa/MACS2/HOMER/QuEST/mm9/hg19/bedtools
## [url=http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software]http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software[/url]
## [url=http://www.ncbi.nlm.nih.gov/books/NBK158900/]http://www.ncbi.nlm.nih.gov/books/NBK158900/[/url]
## Download and install sratoolkit
cd ~/biosoft
mkdir sratoolkit && cd sratoolkit
wget [url=http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.6.3/sratoolkit.2.6.3-centos_linux64.tar.gz]http://ftp-trace.ncbi.nlm.nih.go ... ntos_linux64.tar.gz[/url]
##
## Length: 63453761 (61M) [application/x-gzip]
## Saving to: "sratoolkit.2.6.3-centos_linux64.tar.gz"
tar zxvf sratoolkit.2.6.3-centos_linux64.tar.gz
## Download and install bedtools
cd ~/biosoft
mkdir bedtools && cd bedtools
wget [url=https://github.com/arq5x/bedtools2/releases/download/v2.25.0/bedtools-2.25.0.tar.gz]https://github.com/arq5x/bedtool ... tools-2.25.0.tar.gz[/url]
## Length: 19581105 (19M) [application/octet-stream]
tar -zxvf bedtools-2.25.0.tar.gz
cd bedtools2
make
## Download and install PeakRanger
cd ~/biosoft
mkdir PeakRanger && cd PeakRanger
wget [url=https://sourceforge.net/projects/ranger/files/PeakRanger-1.18-Linux-x86_64.zip/]https://sourceforge.net/projects ... 8-Linux-x86_64.zip/[/url]
## Length: 1517587 (1.4M) [application/octet-stream]
unzip PeakRanger-1.18-Linux-x86_64.zip
~/biosoft/PeakRanger/bin/peakranger -h
## Download and install bowtie
cd ~/biosoft
mkdir bowtie && cd bowtie
wget [url=https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.9/bowtie2-2.2.9-linux-x86_64.zip/download]https://sourceforge.net/projects ... x86_64.zip/download[/url]
#Length: 27073243 (26M) [application/octet-stream]
#Saving to: "download" ## I made a mistake here for downloading the bowtie2
mv download bowtie2-2.2.9-linux-x86_64.zip
unzip bowtie2-2.2.9-linux-x86_64.zip
mkdir -p ~/biosoft/bowtie/hg19_index
cd ~/biosoft/bowtie/hg19_index
# download hg19 chromosome fasta files
wget [url=http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz]http://hgdownload.cse.ucsc.edu/g ... Zips/chromFa.tar.gz[/url]
# unzip and concatenate chromosome and contig fasta files
tar zvfx chromFa.tar.gz
cat *.fa > hg19.fa
rm chr*.fa
## ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ~/biosoft/bowtie/hg19_index/hg19.fa ~/biosoft/bowtie/hg19_index/hg19
## Download and install BWA
cd ~/biosoft
mkdir bwa && cd bwa
[url=http://sourceforge.net/projects/bio-bwa/files/]http://sourceforge.net/projects/bio-bwa/files/[/url]
tar xvfj bwa-0.7.12.tar.bz2 # x extracts, v is verbose (details of what it is doing), f skips prompting for each individual file, and j tells it to unzip .bz2 files
cd bwa-0.7.12
make
export PATH=$PATH:/path/to/bwa-0.7.12 # Add bwa to your PATH by editing ~/.bashrc file (or .bash_profile or .profile file)
# /path/to/ is an placeholder. Replace with real path to BWA on your machine
source ~/.bashrc
# bwa index [-a bwtsw|is] index_prefix reference.fasta
bwa index -p hg19bwaidx -a bwtsw ~/biosoft/bowtie/hg19_index/hg19.fa
# -p index name (change this to whatever you want)
# -a index algorithm (bwtsw for long genomes and is for short genomes)
## Download and install macs2
## // [url=https://pypi.python.org/pypi/MACS2/]https://pypi.python.org/pypi/MACS2/[/url]
cd ~/biosoft
mkdir macs2 && cd macs2
wget ~~~~~~~~~~~~~~~~~~~~~~MACS2-2.1.1.20160309.tar.gz
tar zxvf MACS2-2.1.1.20160309.tar.gz
cd MACS2-2.1.1.20160309
python setup.py install --user
#################### The log for installing MACS2:
Creating ~/.local/lib/python2.7/site-packages/site.py
Processing MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg
Copying MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg to ~/.local/lib/python2.7/site-packages
Adding MACS2 2.1.1.20160309 to easy-install.pth file
Installing macs2 script to ~/.local/bin
Finished processing dependencies for MACS2==2.1.1.20160309
############################################################
~/.local/bin/macs2 --help
Example for regular peak calling:
macs2 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01
Example for broad peak calling:
macs2 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1
## Download and install homer (Hypergeometric Optimization of Motif EnRichment)
## // [url=http://homer.salk.edu/homer/]http://homer.salk.edu/homer/[/url]
## // [url=http://blog.qiubio.com:8080/archives/3024]http://blog.qiubio.com:8080/archives/3024[/url]
## pre-install: Ghostscript,seqlogo,blat
cd ~/biosoft
mkdir homer && cd homer
wget [url=http://homer.salk.edu/homer/configureHomer.pl]http://homer.salk.edu/homer/configureHomer.pl[/url]
perl configureHomer.pl -install
perl configureHomer.pl -install hg19
般来说,对我这样水平的人来说,软件安装就跟家常便饭一样,没有什么问题了,但如果你是初学者呢,肯定没那么轻松,所以请加强学习,我无法在这里讲解太具体的知识了。
所有软件安装完毕后就可以下载文章对这些ChIP-seq的处理结果了,这个很重要,检验我们是否重复了人家的数据分析过程。
[AppleScript] 纯文本查看 复制代码
## step3 : download the results from paper
## [url=http://www.bio-info-trainee.com/1571.html]http://www.bio-info-trainee.com/1571.html[/url]
mkdir paper_results && cd paper_results
wget [url=ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE52nnn/GSE52964/suppl/GSE52964_RAW.tar]ftp://ftp.ncbi.nlm.nih.gov/geo/s ... pl/GSE52964_RAW.tar[/url]
tar xvf GSE52964_RAW.tar
ls *gz |xargs gunzip
## step4 : run FastQC to check the sequencing quality.
##这里可以看到我们下载的原始数据已经被作者处理好了,去了接头,去了低质量序列
ls *.fastq | while read id ; do ~/biosoft/fastqc/FastQC/fastqc $id;done
## Sequence length 51
## %GC 39
## Adapter Content passed
The quality of the reads is pretty good, we don't need to do any filter or trim
mkdir QC_results
mv *zip *html QC_results/



回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-10-19 01:25 , Processed in 0.034162 second(s), 24 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.