|
生信技能树的朋友大家好,
用STAR分析完后只count到102个gene,追溯回align完后的结果发现有93%(6249540)的uniquely mapped genes,但是N_no featured的gene有6438119,猜测是no_featured gene数太多导致count到的gene数很少?
STAR脚本如下:genome generate:
STAR --runThreadN 6 --runMode genomeGenerate --genomeSAindexNbases 9 \
--genomeDir lsali_STAR_genome \
--genomeFastaFiles Lsali.fna \
--sjdbGTFfilw Lsali.gtf \
--sjdbOverhang 149
align_out:
STAR --runThreadN 5 --limitBAMsortRAM 1101674636 --genomeDir lsali_STAR_genome \
--readFilesCommand zcat --readFilesIn B1_1_val_1.fq.gz, /B1_2_val.fq.gz \
--outFileNamePrefix B1t_ \
--outSAMtype BAM SortedByCoordinate \
--outBAMsortingThreadN 5 \
--quantMode TranscriptomeSAM GeneCounts
log.final.out 结果:
Started job on | May 03 13:37:23
Started mapping on | May 03 13:37:24
Finished on | May 03 15:05:10
Mapping speed, Million of reads per hour | 5.94
Number of input reads | 8688613
Average input read length | 299
UNIQUE READS:
Uniquely mapped reads number | 8127480
Uniquely mapped reads % | 93.54%
Average mapped length | 298.32
Number of splices: Total | 35463
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 5707
Number of splices: GC/AG | 298
Number of splices: AT/AC | 237
Number of splices: Non-canonical | 29221
Mismatch rate per base, % | 0.72%
Deletion rate per base | 0.01%
Deletion average length | 2.24
Insertion rate per base | 0.01%
Insertion average length | 2.44
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 63800
% of reads mapped to multiple loci | 0.73%
Number of reads mapped to too many loci | 652
% of reads mapped to too many loci | 0.01%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
ReadsPerGene.out.tab 结果:
N_unmapped 497446 497446 497446
N_multimapping 63800 63800 63800
N_noFeature 6438119 8090065 6475534
N_ambiguous 17286 534 16752
参考基因组/注释文件来自NCBI:
转换gff到gtf格式。使用gffread,script如下
gffread Lsali.gff -T -o Lsali.gtf
发现转化后filesize显著变小,有可能是转化过程中信息丢失导致的嘛?但试了不同版本cufflink发现结果还是一样。
求大佬指点迷津,谢谢!
|
|