搜索
查看: 312|回复: 2

[alignment] Ensemble的人类参考基因组

[复制链接]

10

主题

48

帖子

349

积分

中级会员

Rank: 3Rank: 3

积分
349
发表于 2017-1-5 22:09:42 | 显示全部楼层 |阅读模式
请问下各位大神:

我在ensembl下载的人类基因组Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz解压后有30G。
但我在NCBI和UCSC下载的人类基因组却只有3G。而且不是说人类基因组也就只有3G左右吗?
为什么ensembl所下载的基因组却这么大,其中有什么说法吗?

我Google了下,
有的说是除了染色体序列外,还有assembly patches and haplotype sequences
但有的说是有GL contigs, 没有patches or haplotype.

有人了解过吗?我看很多pipeline里面都是直接用Homo_sapiens.GRCh37.75.dna.toplevel.fa。。
我之前一直用UCSC的。。没搞懂Ensemble的。。

多谢!

回复

使用道具 举报

4

主题

21

帖子

289

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
289
发表于 2017-1-6 08:26:05 | 显示全部楼层
我也不懂 顶一个
回复 支持 反对

使用道具 举报

10

主题

48

帖子

349

积分

中级会员

Rank: 3Rank: 3

积分
349
 楼主| 发表于 2017-1-6 23:19:12 | 显示全部楼层
本帖最后由 anlan 于 2017-1-6 23:30 编辑

我粗略看了下
grep -c '^>' Homo_sapiens.GRCh37.75.dna.toplevel.fa 有297条
然后看了下大概是那些,如下:
1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 REF
>10 dna:chromosome chromosome:GRCh37:10:1:135534747:1 REF
>11 dna:chromosome chromosome:GRCh37:11:1:135006516:1 REF
>12 dna:chromosome chromosome:GRCh37:12:1:133851895:1 REF
>13 dna:chromosome chromosome:GRCh37:13:1:115169878:1 REF
>14 dna:chromosome chromosome:GRCh37:14:1:107349540:1 REF
>15 dna:chromosome chromosome:GRCh37:15:1:102531392:1 REF
>16 dna:chromosome chromosome:GRCh37:16:1:90354753:1 REF
>17 dna:chromosome chromosome:GRCh37:17:1:81195210:1 REF
>18 dna:chromosome chromosome:GRCh37:18:1:78077248:1 REF
>19 dna:chromosome chromosome:GRCh37:19:1:59128983:1 REF
>2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 REF
>20 dna:chromosome chromosome:GRCh37:20:1:63025520:1 REF
>21 dna:chromosome chromosome:GRCh37:21:1:48129895:1 REF
>22 dna:chromosome chromosome:GRCh37:22:1:51304566:1 REF
>3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 REF
>4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 REF
>5 dna:chromosome chromosome:GRCh37:5:1:180915260:1 REF
>6 dna:chromosome chromosome:GRCh37:6:1:171115067:1 REF
>7 dna:chromosome chromosome:GRCh37:7:1:159138663:1 REF
>8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 REF
>9 dna:chromosome chromosome:GRCh37:9:1:141213431:1 REF
>HG1007_PATCH dna:chromosome chromosome:GRCh37:HG1007_PATCH:243059660:243125680:1PATCH_FIX
>HG1032_PATCH dna:chromosome chromosome:GRCh37:HG1032_PATCH:190828226:191125710:1 PATCH_FIX
>HG104_HG975_PATCH dna:chromosome chromosome:GRCh37:HG104_HG975_PATCH:144743526:145173331:1 PATCH_FIX
>HG1063_PATCH dna:chromosome chromosome:GRCh37:HG1063_PATCH:102420838:102687153:1 PATCH_FIX
>HG1074_PATCH dna:chromosome chromosome:GRCh37:HG1074_PATCH:51028872:51334771:1 PATCH_FIX
>HG1079_PATCH dna:chromosome chromosome:GRCh37:HG1079_PATCH:54528888:55587573:1 PATCH_FIX
>HG1082_HG167_PATCH dna:chromosome chromosome:GRCh37:HG1082_HG167_PATCH:140144410:140687734:1 PATCH_FIX
>HG1091_PATCH dna:chromosome chromosome:GRCh37:HG1091_PATCH:60558332:60952100:1 PATCH_FIX
>HG1133_PATCH dna:chromosome chromosome:GRCh37:HG1133_PATCH:10953894:11326502:1 PATCH_NOVEL
>HG1146_PATCH dna:chromosome chromosome:GRCh37:HG1146_PATCH:43663045:44117429:1 PATCH_FIX
>HG115_PATCH dna:chromosome chromosome:GRCh37:HG115_PATCH:101718951:102075280:1 PATCH_FIX
>HG1208_PATCH dna:chromosome chromosome:GRCh37:HG1208_PATCH:57774022:57871366:1 PATCH_FIX
谁能说说除了染色体外的是什么吗?
我Ensemble的文档也看了下,那些HG开头的是unlocalized or unplaced scaffolds吗?
那如果我只需要染色体序列的话,在Ensemble应该选择下载primary assembly吧?


多谢!


回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|关于我们|手机版|小黑屋|生信技能树    

GMT+8, 2017-4-28 01:09 , Processed in 0.027331 second(s), 30 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.