搜索
查看: 1960|回复: 0

Platypus-一个新的找变异位点的软件

[复制链接]

634

主题

1182

帖子

4030

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
4030
发表于 2017-2-17 09:25:05 | 显示全部楼层 |阅读模式
一直用以前的pipeline用习惯了,bwa,samtools,bcftools,,picardsGATK 等,发现2016-10月的一篇nature测序大文章用到了一些新的软件,给大家推荐一下:
我不想翻译了,英文看到挺好的,作者介绍这个工具的语言风趣幽默,值得一读。
Platypus: A New Variant Caller that Integrates Mapping, Assembly and Haplotype-based approaches

Nature Genetics just published a really interesting paper from Gerton Lunter Group  (and McVean group) from University of Oxford. It is about a variant calling approach that integrates ideas from mapping, assembly, and haplotypes. I can hear you saying “Yeah. right. Yet another variant caller?”. Believe me, this one looks different. The paper looks really interesting and want to read the whole paper.

Here is a really short summary of what the paper is about.

The most common approach, like old version of GATK [Check the comments below from GATK on the benefits of new GATK Haplotype caller that also does local re-assembly like ], is to call variants by aligning reads to a reference genome and find locations where nucleotides differ from the reference base. This approach has served us well as it has high sensitivity; uses most of the human genome, includes repetitive regions, exploits information in paired-end reads; and does not need crazy computing resources.

One of the weaknesses of the approach is that alignment based approaches focus on a single variant type, like SNP or indel. This can cause errors around indels and larger variants. It is also prone to high false positives from highly diverged regions. Also they rely mainly on alignment accuracy at nucleotide level and realignments around indels to improve the accuracy can be costly.  The use of multi-sample variant calling helps borrow information between samples to call variants that does not look reliable in a single sample.

Alternative variant calling approaches that uses reference-free sequence assembly builds a de Bruijn for finding evidence of polymorphisms. Such approaches works on the local haplotype level rather than on the level of individual variants and does well on highly divergent regions. However, these approaches have huge computational requirements.

为什么不试一试呢?

The Nature Genetics paper presents a new approach that integrates local sequence assembly, haplotype-based, multi-sample variant caller with in a single Bayesian statistical framework and it is implemented as software Platypus.  Platypus takes in mapped and sorted BAM files as input and calls candidate variants from read alignments, local assembly and external sources.  Platypus can identify SNPs, MNPs and short indels of size less than read length, and larger indels of size up to several kb deletions and maybe 200bp insertions.

First Platypus generates candidate variants using the read alignments, variants identified by local assembly and variants from external sources.  The local assembler looks at small window of region (~few kb) at a time and uses all the reads in the window and their pairs to generate a colored de Bruijn graph. Candidate alleles are generated by getting all unique paths in the graph by a depth-first traversal algorithm. Platypus is tuned for high sensitivity and returns a exhaustive list of paths unlike other assemblers. Candidate haplotypes are generated by clustering the candidate alleles across windows. Haplotype frequencies are estimated by EM magic. Variants are called using the estimated haplotype frequencies. A lot of interesting details on how it works is hiden in the supplementary methods section.

The paper goes on to show how this approach is useful in four different scenarios of variant calling applications.

  • calling variation from whole-genome data
  • calling SNPs and indels from whole-exome data
  • de novo mutations in parent-offspring trios
  • genotyping HLA loci

The paper also shows that integrating the approaches yields high sensitivity and specificity in several clinically relevant experimental designs  and it is also an order of magnitude faster.







你这个问题很复杂,需要打赏,请点击 http://www.bio-info-trainee.com/donate 进行打赏,谢谢
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-9-22 00:13 , Processed in 0.029441 second(s), 26 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.