USeq, MiSeq, WeAllSeq...to Seek: 2013

Friday, December 27, 2013

Evomics 2014 Workshop!

Looking for the blog on the Workshop on Genomics for 2014?

Seems the workshops organizers enjoyed my blogging shenanigans so much last year they've asked me to blog from the Evomics website this year.

First blog post is up...

"Evomics 2014...ready, set go!"

If you'd like to relive the guts and glory from last year, the posts that received the highest traffic from the workshop last year are linked below!

Comparison charts for NGS platforms 2013: a tangent I included during the workshop last year that got a surprisingly high number of pageviews.
The wrap up from last year's workshop with highlights
The Modern Genome Sequencing preparation blog
Konrad's talk on Short Read alignment

and

Konrad again..."So you want to be an NGS sequencer eh?" from day 2 where he discusses the latest and greatest in the technology.

Feel free to peruse others tagged as workshop on genomics or WoG and keep your eye on the Evomics blog for this years workshop rundown and roundup!

Ready, set...

Friday, October 25, 2013

GATK Best Practices Workshop: Variant Calling

GATK Best Practices Workshop
Variant Calling

"Examining the evidence for variation from reference via Bayesian inference"

There are two essential approaches to finding genetic variation that's 'real'.

Initial approach which is very fast and uses and independent base assumption
Evolved approach which is more computationally intensive and involves local de novo assembly of the variable region.

There were two variant callers discussed: The Unified Genotypes and the Haplotype Caller

Unified Genotyper (UG)

Calls SNPs and indels separately by considering each variant locus independently

Determine possible SNP and indel alleles
Compute likelihoods of data given genotypes
Compute allele frequency distribution to determine most likely allele count, omit a variant call if it's determined that it should be omitted.
Assign genotypes to samples

Accepts any 'ploidy'
Can do pooled calling
You need high sample numbers
Remember you have to run indel realignment per the previous blog, it's required by UG.

Bayesian modeling is used for SNP and indel calling, you can see all the modeling action here (this is an older presentation but you can download the new one per the forum link).

Inference: What is genotype G given sample read data D
Calculate (Bayes' rule) the probability of each possible genotype (G)
Assumes reads are independent
Relies on the likelihood function to estimate probability of sample data given a 'proposed' haplotype
Considers the 'pileup' of bases and their associated quality scores

Only considers "Good Bases"
Good bases satisfy the minimum requirement for base quality, mapping read quality and pair mapping quality

The prior on Bayesian inference is or was tuned using human data...if you are using a different data set you will want to tune your prior differently.
Indels are more involved because the number of possibilities increases dramatically.
In the end...you get simultaneous estimation of allele frequency, the probability that a variant exists and the assignment of genotypes to each sample