搜索
查看: 1945|回复: 0

根据基因名找找不同物种的特异基因!比如-人-老鼠

[复制链接]

634

主题

1182

帖子

4030

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
4030
发表于 2017-2-24 10:56:50 | 显示全部楼层 |阅读模式
下面代码是从ensembl的ftp里面下载gtf文件!
http://asia.ensembl.org/info/data/ftp/index.html
我比较喜欢ensembl而已!
[Shell] 纯文本查看 复制代码
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/felis_catus/Felis_catus.Felis_catus_6.2.87.chr.gtf.gz &
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/gallus_gallus/Gallus_gallus.Gallus_gallus-5.0.87.chr.gtf.gz & 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/canis_familiaris/Canis_familiaris.CanFam3.1.87.chr.gtf.gz & 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/equus_caballus/Equus_caballus.EquCab2.87.chr.gtf.gz 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/ailuropoda_melanoleuca/Ailuropoda_melanoleuca.ailMel1.87.gtf.gz & 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/sus_scrofa/Sus_scrofa.Sscrofa10.2.87.chr.gtf.gz  & 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.87.chr.gtf.gz  & 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/mus_musculus/Mus_musculus.GRCm38.87.chr.gtf.gz & 
nohup wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr.gtf.gz & 

zcat Mus_musculus.GRCm38.87.chr.gtf.gz |perl -alne 'print uc $1 if /gene_name\s\"(.*?)\";/' |sort -u >mouse.genelist
zcat Homo_sapiens.GRCh38.87.chr.gtf.gz |perl -alne 'print uc $1 if /gene_name\s\"(.*?)\";/' |sort -u >human.genelist

perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  mouse.genelist human.genelist  >both_have
perl -alne '{if(@ARGV==1){$h{$_}=1}else{print if exists $h{$_}} }' mouse.genelist human.genelist  >both_have
perl -alne '{if(@ARGV==1){$h{$_}=1}else{print unless exists $h{$_}} }' mouse.genelist human.genelist  >human_yes_mouse_not
perl -alne '{if(@ARGV==1){$h{$_}=1}else{print unless exists $h{$_}} }' human.genelist mouse.genelist  >mouse_yes_human_not

 

得到的基因需要过滤掉pseudogene,miRNA,lncRNA相关的基因,然后注释到基因名!

这个时候,我比较喜欢用R语言了!
[AppleScript] 纯文本查看 复制代码
 a=read.table('human_yes_mouse_not',stringsAsFactors = F)[,1]
 library(humanid)
 tmp=geneAnno(a)
 #write.csv(tmp,'human_yes_mouse_not.csv')
 tmp <- na.omit(tmp)
 tmp <- tmp[!grepl('pseudo',tmp$gene_name),]
 tmp <- tmp[!grepl('LINC',tmp$symbol),]
 tmp <- tmp[!grepl('MIR',tmp$symbol),]
 write.csv(tmp,'human_yes_mouse_not_filter.csv')
 
 


一目了然,有五千多个基因在human被研究,但是老鼠没有,当然,里面还有很多不清不楚的基因!




上一篇:生信编程直播第8题-几个ID转换咯
下一篇:rnaSeqFPro分析转录本数据
你这个问题很复杂,需要打赏,请点击 http://www.bio-info-trainee.com/donate 进行打赏,谢谢
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-9-22 08:34 , Processed in 0.028269 second(s), 26 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.