Friday, December 27, 2013

Evomics 2014 Workshop!

Looking for the blog on the Workshop on Genomics for 2014?

Seems the workshops organizers enjoyed my blogging shenanigans so much last year they've asked me to blog from the Evomics website this year.

First blog post is up...

"Evomics 2014...ready, set go!"

If you'd like to relive the guts and glory from last year, the posts that received the highest traffic from the workshop last year are linked below!

and
Feel free to peruse others tagged as workshop on genomics or WoG and keep your eye on the Evomics blog for this years workshop rundown and roundup!

Ready, set...


Friday, October 25, 2013

GATK Best Practices Workshop: Variant Calling

GATK Best Practices Workshop
Variant Calling


"Examining the evidence for variation from reference via Bayesian inference"

There are two essential approaches to finding genetic variation that's 'real'.
  1. Initial approach which is very fast and uses and independent base assumption
  2. Evolved approach which is more computationally intensive and involves local de novo assembly of the variable region.
There were two variant callers discussed: The Unified Genotypes and the Haplotype Caller

Unified Genotyper (UG)
  • Calls SNPs and indels separately by considering each variant locus independently
    • Determine possible SNP and indel alleles
    • Compute likelihoods of data given genotypes
    • Compute allele frequency distribution to determine most likely allele count, omit a variant call if it's determined that it should be omitted.
    • Assign genotypes to samples
  • Accepts any 'ploidy'
  • Can do pooled calling
  • You need high sample numbers
  • Remember you have to run indel realignment per the previous blog, it's required by UG. 
Bayesian modeling is used for SNP and indel calling, you can see all the modeling action here (this is an older presentation but you can download the new one per the forum link).
  • Inference: What is genotype G given sample read data D
  • Calculate (Bayes' rule) the probability of each possible genotype (G)
  • Assumes reads are independent
  • Relies on the likelihood function to estimate probability of sample data given a 'proposed' haplotype
  • Considers the 'pileup' of bases and their associated quality scores
    • Only considers "Good Bases"
    • Good bases satisfy the minimum requirement for base quality, mapping read quality and pair mapping quality
  • The prior on Bayesian inference is or was tuned using human data...if you are using a different data set you will want to tune your prior differently.
  • Indels are more involved because the number of possibilities increases dramatically.
  • In the end...you get simultaneous estimation of allele frequency, the probability that a variant exists and the assignment of genotypes to each sample
credit: http://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-0 (may not be most recent, search forum for most recent slides from Oct 2013 workshop)

Now that we've gone over Unified Genotyper...let me tell you about the program that will eventually replace it...doh!

Haplotype Caller
  • Calls variants by local de novo assembly
  • More accurate especially for indels
  1. Propose haplotypes with loci de novo assembly using de Bruijn graphs
  2. Evaluate haplotypes with Pair HMM

credit: http://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-0 (may not be most recent, search forum for most recent slides from Oct 2013 workshop)

As a note: If you invoke minPruning this will improve performance as it will 'toss' low confidence haplotypes.

These variant callers give you 'raw' calls...so more work is needed to assess the quality of the variant calls...

Variant Quality Score Recalibration (VQSR)
  • Mutation callers typically 'cast a wide net' VQSR helps us narrow that 'net' to what is most likely 'real'.
  • Requires A LOT of data, if you have a small number of variants or samples VQSR will not help your analysis.
  • Allows analyst to trade off sensitivity and specificity depending on their project goals. (you can control this).
  • Builds a model of what true genetic variation looks like and allows for rank-ordering of variants based on their likelihood of being real.
  • Some assumptions and notes:
    • Each variant has a diverse set of statistics associated with it (variant annotations)
    • Real variants tend to cluster together
    • Clusters tend to be gaussianly distributed (the GATK team noted this over many runs on many datasets)
    • Uses a gaussian mixture model to fit the data
    • Tries to find the fewest (smallest number) of clusters that explain the data.

credit: http://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-0 (may not be most recent, search forum for most recent slides from Oct 2013 workshop)
  • First you will build your model (VariantRecalibrator) then apply filters and write the new annotated VCF (ApplyRecalibration
  • Recalibrate your SNP calls and indels separately...how?
    • In your first analysis do the SNPs
    • In the second analysis use the recalibrated VCF and do indels (indel mode).
credit: http://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-0 (may not be most recent, search forum for most recent slides from Oct 2013 workshop)

Note: Plots require the R statistical package be installed...

credit: http://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-0 (may not be most recent, search forum for most recent slides from Oct 2013 workshop)

GATK definitely provides a step by step framework that should get you from raw data to variant calls pretty seamlessly. Remember though that many of their parameters and assumptions are based on their extensive work on human genome projects and may not be necessarily applicable to your bacterial or viral genome project.

That being said they are a receptive group and questions on the forum are welcome.

Additionally, there are variety of variant callers out there...for instance Mike Zody (also from the Broad Institute) has been involved with work on V-Phaser, V-Phaser 2 and V-Profiler; which are specialized callers for viral population data:
  • Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, Berlin AM, Malboeuf CM, Ryan EM, Gnerre S, Zody MC, Erlich RL, Green LM, Berical A, Wang Y, Casali M, Steeck H, Bloom AK, Dudek T, Tully D, Newman R, Axten KL, Gladden AD, Battis L, Kemper M, Zeng Q, Shea TP, Gujja S, Zedlack C, Gasser O, Brander C, Hess C, Gunthard HF, Brumme ZL, Brumme CJ, Bazner S, Rychert J, Tinsley JP, Mayer KH, Rosenberg E, Pereya F, Levin JZ, Young SK, Jessen H, Altfeld M, Birren BW, Walker BD, Allen TM(2012) Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection. PLoS Pathogens 8(3): e1002529
  • Macalalad AR, Zody MC, Charlebois P, Lennon NJ, Newman RM, Malboeuf CM, Ryan EM, Boutwell CL, Power KA, Brackney DE, Pesko KN, Levin JZ, Ebel GD, Allen TM, Birren BW, Henn MR (2012) Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Computational Biology 8(3):e1002417.
  • Yang X, Charlebois P, Macalalad A, Henn MR, and Zody MC. (2013) V-Phaser 2.0: Variant Inference for Viral Populations. Submitted.
Consequently Mike Zody also spoke at the Workshop on Genomics which I've blogged about regarding NGS experimental design and his publications and presentations are worth a look...

Other Variant Callers:
A really nice paper in Nature (disclaimer, it's from 2011 and other programs have come out since) came out talking about SNP/variant callers and provides a referenced table and links to where you can find other information about other programs on SEQwiki.

So you can find a caller to fit your data or NGS experimental design. Just be aware of your programs assumptions and caveats:
  • What is the underlying data assumptions of the program
  • What was the program 'calibrated on'
  • Does it have certain weaknesses to be aware of?
  • Has it been compared to other callers? How did it do?
  • Does it require you to do backhand springs with your data formatting to even get your data into the caller?
  • Are there readily available tutorials or support sites or receptive developers so you don't potentially f*ck it up and if you do you have somewhere to go?
  • What are the default parameters? Do they 'make sense' for your data?
  • If the defaults don't make sense, do you understand how to change them to fit your data?

Thursday, October 24, 2013

GATK Best Practices Workshop: Data Pre-Processing

GATK Best Practices Workshop
Data Pre-Processing

This past Monday and Tuesday I was able to attend the GATK Best Practices Seminars being held at the Broad Institute in Cambridge, MA.

Here's the download of the Monday morning session:

And by way of note...all slides from the workshop can be found on a link via the GATK forum.

By way of introduction...
  • GATK doesn't really do 'mapping' though they have suggestions for tools of preferred use for mapping.
  • GATK is a post-processing tool.
  • The integrated genome viewer (IGV) is a user friendly tool for visualizing whole genome data.
  • Be sure to pay attention to your study design...are you 'deep' sequencing of 'shallow' sequencing? This will affect data processing and interpretation.
With deep sequencing designs:
  • You will have increased sensitivity for variant detection
  •  More accurate genotyping
  • Caveat: No information about multiple samples as a deep sequencing design often times means you can only do 1 or very few samples.
With shallow sequencing designs:
  • Your sensitivity will depend on the frequency of the variation of interest.
  • Not as accurate genotyping potentially.
  • You may discover more 'total' variants across more samples, however your confidence in real variants versus 'error' due to not deep enough sequencing may be reduced.
 Best Practices:
  • Based on human whole genome or whole exome analysis
  • Definitely hit up the documentation, it is quite extensive
  • Not necessarily applicable to all datasets, can be used as a general guideline but if you are working with bacteria or viruses (like me) then some parts may not be applicable or calibrated the way you need them to be so you'll have to 'play around'
  • Depends on design (see above)
  • Use the forum, a great place to lob questions at the developers regarding the tools.
So here's the layout: Also available on their website

credit:  http://www.broadinstitute.org/gatk/guide/best-practices

Other suggestions before we jumped in per the Exome sequencing they have been doing:
  • Add 50 bp of 'padding' on either side of your intervals (ie. exome, genes, loci, regions, etc...)
  • Run at least 50 samples/run
  • If you don't have 50 samples you can pull from the 1000 genomes project and do 'joint-calling', if you don't work on humans...pull samples from your preferred database make sure formatting, meta-information, is the same though so you'll have to do some more manipulation potentially.
  • You can also use 'hard filters' per the best practices recommendations.
BTW....Tangent if you ever get the opportunity to attend seminar/talk with Eric Banks, do it. He's a fabulous speaker who answers any question great or small.

Let's get started...

Tuesday, October 15, 2013

NIAID-DVI: Dengue Cohort Studies in Thailand -- Tim Endy

As a intermittent blogger...or rather any type of blogger, I get an absolute thrill when I see my pageviews jump from a few dozen to a couple hundred at any given time especially since my blog goes "inactive" if I've nothing to say (ie. no conference, no workshop, no training to write about or I'm being lackadaisical about reviewing literature on a blog). On October 9th I hit 244 pageviews...and when I checked today I saw that on Oct 13th I had 488...it was indeed thrilling. The last time I experienced a thrill such as this was when I got into the thousands after I live-blogged the Workshop on Genomics Blog Series in Jan 2013.

It's always nice to know one's blog is proving useful from time to time...

Back to NIAID-DVI!

Human immune responses to dengue infection: lessons from cohort studies in Thailand
Tim Endy
State University of New York Upstate Medical University

Tim Endy like Dr. Kuhn is another one of those individuals in the field I could listen to for hours and not get bored. He's been a long time collaborator of my postdoc mentor, MAJ Jarman and WRAIR, where I work now. He also a retired COL did a stint at AFRIMS in Bangkok and is highly involved and I would contend a pillar in dengue work in Thailand.

Additionally, it's always exciting when someone you look up to in your field uses figures you generated in his presentation...huzzah!

That all being said...

Tuesday, October 8, 2013

NIAID-DVI: Pediatric Dengue Cohort Study -- Eva Harris

Neutralizing antibody responses in the Nicaraguan Pediatric Dengue Cohort Study
Eva Harris
Division of Infectious Diseases & Vaccinology
SPH, UC Berkley

So I've always enjoyed talks by Eva Harris...she's always very animated and excited about her work and she almost always has an overwhelming amount of information to show--scientific sensory overload at times...you wish you could just have copies of the slides or write faster to catch everything she says. She has a very active group, active collaborations, so she usually has  a lot to say in a short time. But y'know when you've been involved in work in a country for 23+ years...you're going to have a lot to say. 

I first heard Eva talk at ASTMH about Molly OhAinle's work which came out in 2011 in Science and Translational Medicine...

Monday, October 7, 2013

NIAID-DVI: Understanding the E gene Part II: 3 Tales of the power of small changes

Now that we have an appreciation for the dynamic dengue particle back to the ever elusive E gene...

Small Change #1: 2 amino acids

The type-specific neutralizing antibody response elicited by a dengue vaccine candidate is focused on two amino acids of the envelope protein
Ted Pierson
NIAID-NIH


As stated in many of these blog series posts the failure of the Sanofi vaccine has highlighted the limited understanding we still have about dengue and the research continues in many aspects of disease pathogenesis as well as genetic influences on the virus. Ted Pierson (and others) seek to better understand the humoral immune response against DENV infection. They wanted to identify epitopes recognized by serotype-specific neutralizing antibodies elicited by monovalent DENV1 vaccination. To do this they constructed a panel of over 50 DENV1 structural gene variants containing substitutions at surface accessible residues of the envelope protein to match the corresponding DENV2 sequence. They identified two mutations that contribute significantly to type-specific recognition by polyclonal DENV1 immune sera. When they analyzed sera from 24 participants of a phase I clinical study, they found that there was a reduced capacity to neutralize a DENV1 variant which contained both mutations. Sera from 77% of subjects recognized the DENV1 variant and DENV2 equivalently (less than 3 fold difference). The data indicated that the type-specific component of the DENV1 neutralizing antibody response to vaccination was focused on just two regions of the E protein. The amino acids in question? E157 and E126.

Unfortunately the paper hasn't come out specifically on this study that I can find...but Pierson has been involved in numerous studies characterizing aspects of the E gene:
Small Change #2: The fusion loop of the E gene

Wednesday, October 2, 2013

NIAID-DVI: The Dynamic Dengue Particle -- Richard Kuhn

New Lessons in dengue virus structure and composition and their influence on vaccine strategies
Richard Kuhn
Purdue University

I had the honor and pleasure of meeting Dr. Kuhn where I work and he is fantastically animated and I very much enjoyed listening to what he had to say.

The structure of dengue has been known for more than 10 years via cryo-electron microscopy; however now with more sophisticated tools we can see the virus in ways unimaginable in the past. His group employed a variety of structure and biochemical tools to probe the structure of the dengue virion as well as conformation, composition and dynamics..


Lets look at some particles...
mmm...pretty. Clustering of dengue particles. Source: MicrobiologyBytes

Monday, September 30, 2013

NIAID-DVI: Understanding the dengue E gene, Part I -- Human antibody neutralization, Aravinda de Silva

So we're going to spend several blog posts now focusing in on the E gene and understanding it genetically, structurally and it's role immunologically...as there were several topics covering this during NIAID-DVI...

Recent Advances in our understanding of how human antibodies neutralize dengue viruses
Aravinda de Silva
University of North Carolina School of Medicine

From the Abstract

Ten years ago it was known that people exposed to dengue virus developed strongly neutralizing antibodies against the homologous serotype but the molecular basis of neutralization was not known. Over the course of the subsequent decade several groups have studied the properties of DENV-specific human serum antibodies (Abs) derived from plasma cells and monoclonal Abs (mAbs) derived from memory B-cells. These studies demonstrated:
  • The dominant human Ab response is serotype cross-reactive and non-neutralizing
  • While functionally important, neutralizing Abs are a small 'component' to the entire response.
  • Human neutralizing Abs bind to complex epitopes centered in the hinge region between domain I and II (DI and DII)  of the dengue E gene.
  • Humans also produce strong neutralizing Abs that bind in domain III (DIII) of the E protein.
  • Replicating viruses stimulate DI/DII hinge antibodies whereas recombinant antigens trigger DIII directed neutralizing antibody response.
Recent studies have indicated that DENVs in cell culture are dynamic and structurally heterogeneous particles. This heterogeneity is a result of incomplete cleavage of pre membrane (prM) protein during viral release from infected cells. This produces a virus which his a mix of immature, partially mature and mature virions. The extent of heterogeneity is dependent on the cell lines where the virus was grown.

The 'maturation state' of the virus influences the ability to infect cells and antibodies to neutralize DENVs...this is why DENV neutralization titers are notoriously variable.

Tuesday, September 24, 2013

NIAID-DVI: B-cell responses to dengue infection -- Jens Wrammert

Blog Series: NIAID-DVI

Human B cell responses during dengue infection
Jens Wrammert
Emory University

"Humoral immune responses are thought to play a major role in dengue-induced immuno-pathology, however little is known about the plasmablasts producing these antibodies during an ongoing infection." ~Jens
  •  The group analyzed plasmablast responses in patients with acute dengue infection.
    • plasmablast responses increased more than 1000 fold over baseline levels.
    • These responses made up as much as 30% of the peripheral lymphocyte population
    • Responses were dengue specific
      • IgG secreting cells that reached high numbers after fever onset coinciding with 'the window' where serious dengue-induced pathology is observed.
 
What is a plasmablast?
"The most immature blood cell that is considered a plasma cell instead of a B cell is the plasmablast. Plasmablasts secrete more antibodies than B cells, but less than plasma cells. They divide rapidly and are still capable of internalizing antigens and presenting them to T cells. A cell may stay in this state for several days, and then either die or irrevocably differentiate into a mature, fully differentiated plasma cell." ~Wikipedia
"A plasmablast is basically a B cell that is actively secreting antibodies. This is in response to acute infection only so it is not 'long term' "~Dr. Friberg-Robertson, Immunologist and Friend Extraordinaire 
Questions raised:
Do these cells have a role in dengue immuno-pathology during an ongoing infection?
Ongoing/Future Research to answer said question:
  • Understanding the complete repertoire and specificity of the antibodies that are secreted [en masse] by the plasmablasts.
  • Jens' group has/is generated panels of human monoclonal antibodies from dengue infection-induced plasmablasts for 5 patients and have done initial functional analyses...stay tuned into his research for the results.
Other papers that highlight plasmablast responses in dengue infection:

Saturday, September 21, 2013

NIAID-DVI: Transcriptional responses to dengue -- Stephen Popper

Blog Series: NIAID-DVI

Early genome-wide host transcriptional responses to dengue that correlate with neutralizing antibody titer following vaccination and natural infection
Stephen Popper
Stanford University School of Medicine

Vaccine development has been typically hampered because we don't have a good understanding of the correlates of protection or the links between innate and adaptive immune responses. We also have a limited understanding of the role of background immune status and it's affect on subsequent infection or vaccination. In order to identify links between early host responses and later adaptive immune responses, Popper's group studies the genome-wide transcriptional response to dengue during vaccine trial (a controlled setting) and in natural infection settings.

  • They characterized the temporal dynamics of transcript abundance in subjects vaccinated with rDEN3delta30/31 (TetraVax-DV live vaccine candidate Den3 component developed by NIAID).
  • Of Note:
    • During early transcriptional response there is an interferon-associated transcript expression pattern that peaked in most subjects between day 6 and day 12 post-immunization and this correlated with the titer of neutralizing antibodies (PRNT60) measured at day 42.
In their newest work...

Wednesday, September 18, 2013

NIAID-DVI: Systems Vaccinology -- Bali Pulendran

Blog Series: NIAID-DVI

So as I stated in an earlier blog...some things flew well above my head at this meeting so some of these blogs will be a re-print of the abstract and some general links and thoughts as well as a load of definitions and attempts to understand the components that made up said persons abstract with a healthy dose of trying to understand what they do. Also some of the presentations drove me to take lots of notes from which I could draw on while others, I'll admit my eyes glazed over...and the only note I wrote (several times on many talks was)
"Crash Course Immunology?"
It actually became a bit comic with how many presentations I wrote that on. I now have a new found appreciation for what immunologists do AND a new understanding of why I don't do it. So kudos to all vaccinologists and immunologists everywhere!

Sunday, September 15, 2013

NIAID-DVI: Sanofi CYD dengue vaccine--Where do we go from here? Jean Lang and Bruno Guy

Blog Series: NIAID-DVI

Sanofi Pasteur CYD dengue vaccine programme update
Jean Lang
Sanofi Pasteur
and

Immunological characterization of the Sanofi Pasteur dengue vaccine candidate: hypotheses and investigations to explain results of the Phase IIb proof of concept efficacy trial in Ratchaburi
Bruno Guy
Sanofi Pasteur

So by way of quick review:
  • the CYD-TDV vaccine is a yellow fever backbone with the prM and E genes replaced by each of the dengue serotypes.
  • There was good response in efficacy trials to dengue 1, 3 and 4 however the vaccine failed in illiciting a response to dengue 2 (response was about 30%) DESPITE having satsifactory PRNT titers...see previous blog post.
  • In 2012 there was a Phase IIb efficacy trial in Ratchaburi, Thailand
    • It showed the vaccine was safe.
    • It showed that it was possible to make an efficacious vaccine against dengue
    • It raised questions on the reference PRNT assay
    • It challenged some of the dengue vaccine development hypotheses
    • It reminds us how complex the disease is

Thursday, September 12, 2013

NIAID-DVI: TV003 live attenuated tetravalent dengue vaccine -- Stephen Whitehead

Blog Series: NIAID-DVI

Safety and immunogenicity of the NIH live attentuated tetravalent dengue vaccine candidate TV003
Stephen Whitehead
National Institutes of Allergy and Infectious Diseases, NIH


January 13, 2013, the NIH releases a news statement: NIH-developed candidate dengue vaccine shows promise in early-stage trial where they discuss the new vaccine for dengue that has done well in an early stage clinical trial, the study having been published in the Journal of Infectious disease 17 January 2013. Unfortunately the article is behind a paywall...curses!!! So I will do my best to convey what I can through other numerous links and information garnered from pillaged internet slides about the vaccine and of course what I learned in the meeting.

Falling back on John T. Roehrig's presentation because it's such a nice clear presentation...here is the make up TV003.



Tuesday, September 10, 2013

NIAID-DVI: Dengue subunit vaccine -- Beth-Ann Coller

Blog Series: NIAID-DVI

Development update - recombinant subunit dengue vaccine
Beth-Ann Coller
Merck & Co., Inc.

So Merck decide to nix the use of the whole genome and opted for focusing in on the Envelope gene; 80% of it.

The current status per the meeting abstract:
  • Preclinical studies of this recombinant subunit vaccine have been conducted in non-human primates in order to evaluate immunogenicity and efficacy of tetravalent formulations
  • They are doing testing with and without adjuvant
  • The formulations have been evaluated in both dengue naive and experienced animals.
  • The work has shown that recombinant proteins can induce balanced tetravelent responses without evidence of interference.
  • They are in Phase 1 clinical trials with the vaccine in healthy flavivirus naive adults.
So I had one potentially naive question in this presentation which of course I kept to myself because it was a lively bunch and I feared for my life at times given I secretly harbor a copy of immunology for dummies book...

Monday, September 9, 2013

NIAID-DVI: GSK/FIOCRUZ/US Tetravalent DPIV -- Alexander Schmidt

Blog Series: NIAID-DVI

Tetravalent dengue purified inactivated vaccine (DPIV): status of the GSK/FIOCRUZ/US Army dengue vaccine candidate
Alexander Schmidt

So some of these talks I processed more than others, there were quite a few after all, so some postings will be more detailed than others. That being said, this is one of those lesser 'noted' talks so essentially you'll be hearing a lot from the abstract and I'll do my best to augment what I can with further research links.

Lucky for me there are two great slide sets freely available about GSK's DPIV as well as their mission within the dengue vaccine 'initiative'. They state not to reproduce any of the content/slides at the bottom of all the slides without permission so I'll just be linking them.



Saturday, September 7, 2013

NIAID-DVI: DENVax -- Jorge E Osorio

Blog Series: NIAID-DVI

Pre-Clinical and clinical development of recombinant live attenuated tetravalent dengue vaccine (DENVax)
Jorge E Osorio
Takeda Vaccines

So Takeda Vaccines took over Inviragen and have been developing a tetravalent, live attenuated dengue vaccine called DENVax. It consists of a molecularly characterized, attenuated DENV-2 strain and three chimeras. The chimeras all have the backbone of the attenuated DENV-2 strain, but the prM and E genes have been swapped out with DENV-1, DENV-3 and DENV-4.


Full-size image (39 K)
DENVax design from Osorio et al., 2011.

NIAID-DVI --A Preamble: The Sanofi Dengue Vaccine Failure (Chimerivax)

Blog Series: NIAID-DVI

A preamble: The Sanofi dengue vaccine failure, ChimeriVax

So given I am not daily entrenched in the world of vaccine development, the most that I ever heard about dengue vaccines was that there were several out there in development and Sanofi's was the furthest along. Then I heard the Sanofi vaccine failed. It illicited good response to dengue 1, 3 and 4 and did a fantastic fail on dengue serotype 2.

I could end this blog there but my curiosity and incessant need to comb the internet and literature revealed just how 'fantastic' of a fail this was:

Still Here...new series of blog posts coming

Greetings to my 6 followers and many others who peruse this blog via google, facebook or other link.

Rest assured I have not fell off the planet; the purpose of this blog was to be a communication, teaching, disseminating tool for when I attend workshops, conferences, develop courses or read literature. Indeed apparently I don't travel as much as some in terms of conferences and workshops so I've had little write about...

That and I got swept away in the bid for grant funds available at my institution...was successful--HUZZAH, 4th year of trying was the charm apparently and now am in the thick of my own research as well as what I do on a daily basis as the Viral Diseases Branch bioinformatician.

However following the Workshop on Genomics this last January (and coming up again this January) I have been teaching a basic class on bioinformatic sequence analysis and recently attended a meeting on the NIAID - Dengue Vaccine Initiative.

Sunday, January 20, 2013

Blogs I follow, for those who are now addicted to Science-y blogs...

So it was requested of me to post what blogs I personally follow so I will list them below.

As I said in the first blog for the series in Cesky Krumlov...I most likely will let this blog go dormant until I have something useful to say, but for those of you who would love to get daily musings from other scientists in the field who post of a semi-regular/daily basis let me wett your appetite for what's out there and you can search on your own too.

Normally I subscribe via RSS/Google reader. If you have google reader and there isn't an RSS subscribe button on the blog simply copy/paste the URL into google reader and it'll subscribe to what's posted at the URL.

Happy Reading!


  1. Aetiology: discussing causes, origins, evolution and implications in human disease (Tara Smith)
  2. Avian Flu Diary: infectious disease hobbyist, influenza and disaster preparedness
  3. Xkcd: comic on physics, math, science--pretty much geeky fun.
  4. What If: same guy as xkcd but answers questions using math, science, physics etc...in the most awesome geeky way possible.
  5. The tree of Life: Jonathan Eisen's blog about microbes, genomes, omics, UCDavis and the tree of life
  6. BacPathGenomics: genomics and evolution of bacterial pathogens
  7. coastalpathogens: blog about coastal pathogens among other things.
  8. This is NOT Junk: Michael Eisen's blog (equally great) about DNA, evolution, open science, genomes etc.
  9. Daniel Wilson's blog: evolutionary biologist and researcher in genetics
  10. Download the Universe: book reviews that include science books, focused on e-books
  11. DrugMonkey: US biomedical research industry blogger
  12. iMicroBham: Science teaching blog
  13. microBEnet: microbiology of the built environment
  14. Microbiology Bytes: latest news about microbiology
  15. Omics! Omics!A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery.
  16. Outbreak News: news on outbreaks all over the world
  17. Pathogens: Genes and GenomesA heady mix of bacterial pathogenomics, next-generation sequencing, type-III secretion, bioinformatics and evolution!
  18. PLoS Blogs Network
  19. Rob Dunn: Wildlife of our bodies writer/biologist
  20. Science Professor: life of a science professor
  21. Science-Based Medicine: Issues and controversies between science and medicine
  22. Science Hubb: A blog about interesting science
  23. SEQAnswers.com
  24. Seqonomics: economics of personalized medicine from a Sanger Inst. researcher
  25. Small Things Considered: Amer Soc of Micro blog on microbiology, virology and parasites
  26. The febrile muse: portrayal of infectious disease in literature and arts
  27. The medicine show: Forbes blog from Matthew Herper on Science, politics, education etc.
  28. The molecular ecologist: blog on molecular ecology (all organisms)
  29. Twisted Bacteria: science communicator who focuses on actinomycetes
  30. Virology blog: Vincent Racaniello's excellent virology blog and podcasts
  31. What's up doc?: Blog for postgraduate researchers with resources, links and advice
  32. zoonotica: blog that focuses on the viral/pathogen human/animal interface
  33. All creatures great and small: Preaching microbial supremecy: Science teaching blog from professor at primarily undergraduate teaching institution
That ought to get you started...

Workshop on Genomics 2013: Final musings, anecdotes, mutterings and OCD tendencies garnered...

Greetings bioinformatic campers!

Last blog from ever so idyllic Prague, Czech Republic.

It has been a long two weeks, jam packed with beer, wine, sequencing, programs like RAD and programs like CRAP--mmm, haha; fun-guys, snow ball fights, genomics, scary math, more oddly named software, tutorials galore, unix ninjas and sequencing gurus, enough 'considerations' to have us spinning for years to come...so lets rehash the glory a little bit shall we?

Of life, love, linux and latent twitching... Highlights from 2013 Workshop on Genomics, Cesky Krumlov, Czech Republic in no particular order--most of the below is in good fun, a lot intellectually was learned, see all the previous blogs! But in reflection there were some truly golden moments both scientifically and personally as those that attended and those that ran the workshop were indeed awesome like that:

Blog Series: WoG, Cesky Krumlov: Day 12: Evolutionary genomics, Sake and a program called CRAP (you can't make this stuff up...)

Well I hope everyone has been enjoying reading about the various topics discussed at this workshop. Hopefully you have some new ideas, new programs and new considerations to fuel your OCD tendencies and make you twitch at night...

And with that...our last speaker for the workshop:

Antonis Rokas
Vanderbilt University
Nashville, TN

Topic Evolutionary Genomics

So, given we've been on high octane genomics using NGS for 11 days straight Antonis was merciful to us for this last talk and after a brief introduction to evolutionary genomics gave us some 'vignettes from the field' on the topics of comparative functional genomics, populations genomics and phylogenomics.

The progress of our understanding and study of genomics is no different than other fields historically. In geography old maps would have you falling off the end of the world (cause you know it's flat right?). Then as we became more informed about our world (and it helps that explorers didn't fall off the end) our maps evolved and today we have google maps, google view, google directions...yes, be comforted in knowing....google is always there...watching....you.

During our infantile understanding of chemistry--it was alchemy and you can imagine all the shenanigans that must've taken place as people desperately attempted to heal in a time of limited medicine, limited understanding of the human body and of disease in general--so they turn to herbs and 'potions' and did many things that ended up more harmful than helpful. Now there's a pill for everything...and then some!

When the first genomes were sequenced it was much the same thing...ok, we have a genome--now what. Some people though that discovering the genome and putting it together would decode the 'language' of life...not so much. In our process of understanding a genome, we conduct assembly, gene/ORF finding, assignation of motifs and regulatory areas if possible. Understanding genomes also requires theory just as differences in anatomy suggest adaptation in animals and similarity suggesting common origins; so is it with genomes. Similarity suggesting common origins and differences in sequence suggesting adaptation. Genomes provide a common 'yardstick'.

Friday, January 18, 2013

Blog Series: WoG, Cesky Krumlov: Day 11: Functional metagenomic modeling with some Drosophila sperm thrown in for good measure...

Joseph Bielawski
Dalhousie University
Halifax, Nova Scotia, Canada

Topic: Searching for functional divergence in genomes and metagenomes

We are going to start off with the metagenomic portion of the talk first then at the bottom we'll hash out the genomic portion of his talk...

So in conducting research in metagenomics, as we learned from Rob Beiko's talk it's about who is there and what are they doing. Today we focused on how to infer function from metagenomic data. How to tack on phenotype to a metagenome 'genotype' if you will.

So you have two approaches in metagenomics: targeted analysis (ala PCR amplification from the environment using universal primers to catch all the organisms with your gene of interest) and random analysis which is a catch-all for everything you got within your sample. Now it's always great to have apriori knowledge and you are highly encouraged to collect as much metadata as possible about your sample...but inferring function from metagenomics is quite daunting especially if you have little to go on so it helps to have a model.

Now models are by no means going to fully explain exactly what's going on in the actual environment but they allow you to make inferences based on your data that you can explore in further detail and corroborate.

The model we will discuss actually doesn't have a name that I could find within his slides! So I will call it MetaG-MetaP-Modeling (MMM)...metagenomic metabolic pathway modeling. Bear in mind when the publication comes out it'll have most likely a cooler name.

Thursday, January 17, 2013

Blog Series: WoG, Cesky Krumlov: Day 10: How beavers and black queens teach us about Metagenomics...

Robert Beiko
Dalhousie University
Halifax, Nova Scotia, Canada

Topic: Metagenomics

So the term 'metagenomics' was coined by Jo Handelsman in 1998. Metagenomics describes the functional and sequence based analysis of the collective microbial genomes contained in an environmental sample.
  • This rather 'pure' definition excludes PCR based metagenomic studies as they only provide information about one gene.
The beaver gut is an example of a microbial community hard at work digesting the wood the beaver eats. Unfortunately, as I learned anew today...that microbial community is apparently also nom-i-licious and also gets digested at some point. Sucks to be them. But given turnover the cycle continues, the wood is digested and the balance of nature maintained. Still sucks to be a bacterium in the beaver gut...I gotta say.

Metagenomics asks two essential questions:
  1. Who is there?
  2. What are they doing?

TANGENT!: Comparison Charts for NGS Platforms 2013

If you are thinking about buying an NGS platform...

Travis Glenn has released the 2013 tables comparing NGS platforms seven ways from Sunday, so you can make an informed decision.

I found them on a blog I follow: www.molecularecologist.com

Table 1a-c: de novo, resequencing and other applications. NGS Platform grades.

Table 2a: Runtime, reads and yield

Table 2b: Costs/run, Costs/MB, minimum costs

Table 3a: Instrument Costs

Table 3b: Computation Resources

Table 3c: Error Rates

Table 4: Advantages and disadvantages of each instrument

Citation: Glenn, TC. 2011. Field Guide to Next Generation DNA Sequencers. Molecular Ecology Resources. doi: 10.1111/j.1755-0998.2011.03024.x

Blog Series: WoG, Cesky Krumlov; Day 9: A Tale of RAD-taggin Sea Drag-ins

Dr. Bill Cresko
University of Oregon
Sensei of the Unix Ninja

Topic: Genomic analysis of non-model organisms, RAD-tags and STACKS

Greetings!

So today we launch into how to analyze critters that do not contain a reference genome or have relatively few annotations associated with their genome, the proverbial black box we bang our heads against.

What is a non-model versus model organism anyway?

Model organisms have 'an entire community trying to dissect one species usually in an effort to understand humans better.' The mouse is a model organism, E. coli K12 is a model organism.

Non-model organisms are studied by relatively few, don't contain good references usually and have no correlation to research going on in humans; meaning they don't necessarily further our knowledge of processes in humans.

However, despite not having the breadth of knowledge that model organisms have--the literal arsenal of annotations--we still have the same questions about non-model organisms as we do with model organisms!
  1. How do major differences among lineages evolve
  2. What is the relatedness between organisms?
  3. What is adaptation like in these organisms?
There are fundamental processes in evolution:
  1. Origin of genetic variation: via mutation and migration
  2. Sorting of that variation: variation can be affected by genetic drift and natural selection for instance. For those drawing a blank on 'genetic drift'...it is the change in allele frequency over generations...for those of you drawing a blank on 'allele'...an allele is essentially a gene or better yet a specific 'version' of the gene. So one gene can have more than one 'flavor'. An allele is a particular flavor of that gene as determined by it's sequence data.  For those of you drawing a blank on gene...click over to the nyan cat you tube video--be entertained and don't come back to this blog until you know what a gene is.
  3. Simultaneous genotyping of neutral and adaptive loci (for population genomics): neutral loci provide a genome wide background that gives you estimates of effective population size and can be used for phylogeography. Adaptive loci are outliers from the neutral background and can lend insight into selective sweeps or local adaptation.
So the naive solution when approaching a project with a non-model organisms is to just 'sequence everything'. Why? Because sequencing is still quite expensive and for many studies is pretty much a waste.

Genomes are generally organized as linkage blocks so essentially as long as you have well spaced markers that will work just as well for genotype for a fraction of the cost. Having genetic maps are very useful in genome studies and often times a great first step to guide you in whether full genome sequencing really needs to be done to answer the scientific question you have.

So the Cresko lab is heavily involved in an alternative approach that exploits these linkage blocks in the genome. It bypasses the need for a full genome and still provides tons of data about the genome in addition to building a genetic map and doing genotyping that can guide further studies/hypotheses.

The technique is called RAD-tags, the program/pipeline is called STACKS and the 'grand architect' our resident Unix Ninja--Julian Catchen.

Tuesday, January 15, 2013

Blog Series: WoG, Cesky Krumlov: Day 8: Transcriptomics

Where did days 6 and 7 go??? 

Lost inevitably in the chaos of RStudio Lab tutorial, Python Lab Tutorial, Cesky Krumlov Castle touring and wine tasting followed by copious amounts of wandering, eating, more wandering, snow ball fights, freezing feet, beer drinking and sleeping.

The tutorials are well written, given there was no lecture for these, I've simply linked the tutorials...go nutz.

Remember I've discussed python in previous blogs in the programming prep blog for instance and linked Tyghe's website for further tutorials using python--the resources are at your fingertips, have at it! Feel free to direct python related questions to Tyghe's blog or if you happen to know Daniel McDonald from the Univ. of Colorado feel free to harass him as well...I will not link his email in case he hunts me down for giving you all his information and challenges my husband to a python duel--coding at dawn!

Now...on to week 2!!!

Friday, January 11, 2013

Blog Series: WoG, Cesky Krumlov: "One *ome to rule them all...?" and other anecdotes from this evening

Apologies to the Lord of the Ring die hards who are probably outside my door with torches and pitch forks...but truly this evenings discussion really brought some things to light and well...tossed other things into the dark...

Ah Science...

Blog Series: WoG, Cesky Krumlov; Day 5: Short Read Alignment

I found myself attempting to remember what day it was today...apparently lots of attendees including myself are losing track of the days. One thing we never lose track of though is the bar...and heading to it for a few beers/drinks after the 7-10pm session...cheers.

Dr. Konrad Paszkiewicz
University of Exeter

Topic: Short Read Alignment

In general short read alignments are difficult because the shorter your read the less likely it is to match uniquely to a given reference or sequence of interest. Instead it'll match to multiple places and you won't be sure exactly where it goes.

Ok, so if it's difficult to align short reads, why do people generate them? Well for one it's cheaper. Additionally, for many applications a short read of about 50 bp is enough to work with; for example resequencing of small organisms, de novo analysis of bacterial genomes which are usually quite small compared to a human genome, ChIP-seq or digital gene expression.

Blog Series: WoG, Cesky Krumlov: A note on Emacs

An Emacs aside...

So if you've been following the slides Julian had a slide in his Unix section about Emacs versus Vim and how to use Emacs. Now, we didn't get to it yet and I don't know if we will, but I saw this and thought it would be amusing to those who use Emacs or Vim or other editors...

Of course my husband is a die hard Vim-er. When I told him we'd be learning Emacs, his response over gmail chat was "Vim or death". However, after some chat discussion he acquiesced that I can learn Emacs if I wish, however I am never to speak of it...

credit: www.xkcd.com/378
For those interested in striking out on their own:

Emacs: http://www.gnu.org/software/emacs/
Emacs Tutorial: see Julian's slides and http://www2.lib.uchicago.edu/keith/tcl-course/emacs-tutorial.html

and so I don't go home to divorce papers sitting on the kitchen table...

Vim: http://www.vim.org/download.php
Vim Tutorial: http://blog.interlinked.org/tutorials/vim_tutorial.html
Vim Tips: http://vim.wikia.com/wiki/Tutorial

Cheers.

Thursday, January 10, 2013

Blog Series: WoG: Cesky Krumlov; Day 4: Assembly

Dr. Rayan Chikhi
Pennsylvania State University

Topic: De novo Assembly

A whole day of assembly!

There is no single program right now that is considered 'the assembler'. Different assemblers have advantages and disadvantages as well as things they are generally useful and not useful for. So one thing in todays assemblers is that they all take a lot of time and memory to run--especially when doing de novo assembly. One of the exceptions is the program Minia, developed by Dr. Chikhi which was designed to run efficiently using low memory requirement.

One of the important things that you need to know for assembly is what a k-mer is. A k-mer is any sequences with length k.

AGC is a k-mer with k=3
AGCT is a k-mer with k=4
AGCTT is a k-mer with k=5

You hopefully get the idea. 

There are two essential methods that assemblers use to assemble: de Bruijn graphs and overlap/string graphs. Now we sort of covered this in the Assembly prep blog...lets see if I can explain this better here now...

Blog Series: WoG, Cesky Krumlov; Day 3: Unix, Part 2--Ninja-ery

Julian Catchen
University of Oregon
Unix Ninja

Topic: Unix Part 2

So unfortunately as with the quality control blog, I am or will be unable to give you files that we practiced on but Julian's slides are quite good. We learned about pipes and added on to our current knowledge of command line. We finished up the end of the first slide set... which included all the following commands:

  • man [command you are confused about]: manual for commands that gives you all the options. Some man pages are more helpful than others but if you stare at it long enough it'll start to make sense. Often times there are examples of usage so pay attention to those.
  • ls, gunzip, more, cat, head, tail, grep, wc...all that we learned yesterday--so don't forget it and today we ended up having to man some of those commands to learn the options.
  • sort [filename], just as it sounds--depending on what kind of sort you want you have to specify an option (see man page for sort; ie. numeric, alphabetical)
  • uniq [filename], as with sort, lots of options to tease things out of your file.
  • cut [filename], we learned this before too--BUT I learned anew that it is only for column files today.
  • tr "       " "," [filename]: translate command that changes all tabs to commas in the given filename.
  • tab = Ctrl+V+tab or \t
  • |, this is a 'pipe' read Julian's slides part 1
Example: What do you think I just did to this file?

cat batch_1.genotypes_1.loc | tr "    " "," | grep "^96053"

  1. I grabbed the file = cat
  2. I piped it to the command 'tr' specifying I wanted all tabs changed to commas
  3. Then piped it to another command 'grep' (remember what grep does?)
  4. With grep I specified I wanted to look for all entries where the beginning of the line (hat symbol) started with 96053

We didn't get through all these slides but they are great so have a gander...

What we did get through:

Wednesday, January 9, 2013

Blog Series: WoG, Cesky Krumlov; Day 3: Genomics Study Design, a.k.a. "To seq or not to seq, that is the question!"

So I totally slacked off today and went to lunch instead of writing the usual afternoon blog of the morning session, I hope you'll all forgive me, but to be fair those of you in the U.S. weren't even out of bed by the time lunch for me rolled around!

All of the presentations so far have been really awesome and informational so I hope you will take advantage of all the slides being posted on the website!

Today's morning session is great for PIs and students wishing to design sequencing experiments and determining to get an NGS platform.

I will be interjecting during this blog post...my interjections will be in a different color (probably green, because I like the color green).

Tuesday, January 8, 2013

Blog Series: WoG, Cesky Krumlov; Day 2: Data Quality Control--no really it's more fun than it sounds...

In truth...to me, data sequence quality control is necessary and ok and it was fun when I was learning it but the further in you get the more you want to automate the hell out of it! WELL, lucky for you, you probably aren't at that stage yet so we are going to start fresh thanks to Naiara's talk, slides and exercises ala this evenings lab!

It is time, my fellow command-line/terminal apprentice ninjas...let's DO SCIENCE

Now I won't be going through absolutely everything in this blog entry but I will cover most and offer tips to help the exercises go smoothly for you.

Blog Series: WoG, Cesky Krumlov; Day 2: Unix

Dr. Julian Catchen
University of Oregon
Full-time Unix Ninja

Topic: Unix

So in my prep blog on programming I went through some of the basics of command-line and ninja-ery. But really you cannot do Unix command-line justice unless you jump in and do it yourself with the UnixTutorial provided. Additionally, Julian's slides are up on that same page and there are great at illustrating comparisons between what we are used to (graphical user interface) and how that translates or what that looks like on the command-line (or in the terminal) and the slides are in pdf format, so download them and learn about the history of Unix and take the tutorial.

Highlights?

  • Unix was originally developed by AT&T!
  • Steve Jobs was fired from Apple, developed Nextstep, Apple went into the tubes, re-hired Steve Jobs--Steve Jobs promptly threw their operating system out and applied Nextstep which because OSX (it's Unix based).
  • Google Android runs Linux
  • Airplanes with personal movie systems run Linux
  • Wireless internet routers run linux
  • By the end of these two weeks he plans to make us Unix/Linux converts
  • We googled 'unix commands' to obtain a cheat sheet of commands to help us if we forget, HEY there is also a cheat sheet I linked for you on the programming prep blog--go figure! :)
  • By the end of this session you should know about the following, if not, go back to his slides and the Unix Tutorial
  1. change directories
  2. list files, list all files, list files that humans can read
  3. move up and down commands on the command line
  4. create directories
  5. know what relative vs. absolute paths are
  6. know how to figure out where you are
  7. know three ways to 'get home' from anywhere on the computer system
  8. know what tab and tab tab do and why they are totally cool
  9. more, head, tail and cat
  10. know how to unzip and de-tar files or do both at once
  11. what is grep?
  12. how to obtain line counts in a file
There is a slide called "Explore the file hierarchy" YOU CAN DO THIS WITHOUT WORKSHOP FILES! Explore your own computer file system! Huzzah!

My favorite quote of the evening:

"When you type 'cd ..' it's a like a worm hole that comes and sucks you up a directory, it's really cool" ~Dr. Julian Catchen, Workshop on Genomics, Cesky Krumlov 2013


Blog Series: WoG, Cesky Krumlov; Day 2: So you want a NGS Sequencer eh?

Dr. Konrad Paszkiewicz
University of Exeter
Director of Wellcome Trust Biomedical Bioinformatics Hub

Topic: DNA Sequencing Technology: Past, present, future

Good morning bioinformatic campers! Well, afternoon for me, but morning for many of you back in the U.S.

So much of what was covered this morning is going to be redundant with the DNA sequencing preparation blog I wrote previously. So between this entry and that other one you will hopefully get a complete view of the 'state of the union' where sequencing is concerned and prospects for the future. The nice thing is that Konrad tossed in a lot of pro/con lists for different platforms, so those of you considering NGS in the future, this is a bare bones, get you started guide as to what's out there and whether 'it's worth it' for your own research to invest in a platform and which one to invest in. Again, I still highly recommend Dr. Elaine Mardis' talk that I linked in my DNA sequencing technology prep blog.

First and foremost, if you are familiar with what molecular biology is and what sequencing is and don't know who Fred Sanger is...then you've probably been living in a hole...

Monday, January 7, 2013

Blog Series: WoG, Cesky Krumlov; Day 1: Amazon Cloud

The Amazon Cloud
Dr. Konrad Paszkiewicz
Director, Wellcome Trust Biomedical Bioinformatics Hub

Topic: Amazon Cloud

Pros

  • No need to house/maintain servers
  • No need to worry about backing up
  • Only pay for what you use
  • Upgrade are handled by Amazon
  • You can expand and delete storage as you have need
  • There are many many preconfigured virtual machines (VM) to pick from if you don't want to develop one on your own (QIIME, STACKS and Short read aligment all have their own VMs)
Cons
  • You will pay for it; storage even when you are not using it, time using it etc and many researchers are resistant to the idea that they have to 'pay' for computing power and storage. And they are surprised by how much computational power and storage can cost. Researchers need to start being trained to think of about computing costs in their grant proposals--cost of programming and software, costs of hardware, costs of the people that will need to be hired to program and do analysis.
  • Cost of an Amazon VM can run 0.20-3.00/hr; you also pay by the Gigabyte.
  • Data transfer to your VM could be slow
  • If the network is down you won't be able to access your VM
  • Typical per month cost is $200-$400 depending on how much data you store and how extensively you use the machine. So you have to decide is it worth it? What are the costs and benefits of buying and maintaining your own server or computing system that's powerful enough to run analysis and store all your data versus using Amazon web services.
Read on if you are interested in the tutorial/exercise we went through today despite it most likely not being applicable to your situation unless you have access and/or are interested in implementing amazon cloud services.