Sunday, January 20, 2013

Blog Series: WoG, Cesky Krumlov: Day 12: Evolutionary genomics, Sake and a program called CRAP (you can't make this stuff up...)

Well I hope everyone has been enjoying reading about the various topics discussed at this workshop. Hopefully you have some new ideas, new programs and new considerations to fuel your OCD tendencies and make you twitch at night...

And with that...our last speaker for the workshop:

Antonis Rokas
Vanderbilt University
Nashville, TN

Topic Evolutionary Genomics

So, given we've been on high octane genomics using NGS for 11 days straight Antonis was merciful to us for this last talk and after a brief introduction to evolutionary genomics gave us some 'vignettes from the field' on the topics of comparative functional genomics, populations genomics and phylogenomics.

The progress of our understanding and study of genomics is no different than other fields historically. In geography old maps would have you falling off the end of the world (cause you know it's flat right?). Then as we became more informed about our world (and it helps that explorers didn't fall off the end) our maps evolved and today we have google maps, google view, google directions...yes, be comforted in knowing....google is always there...watching....you.

During our infantile understanding of chemistry--it was alchemy and you can imagine all the shenanigans that must've taken place as people desperately attempted to heal in a time of limited medicine, limited understanding of the human body and of disease in general--so they turn to herbs and 'potions' and did many things that ended up more harmful than helpful. Now there's a pill for everything...and then some!

When the first genomes were sequenced it was much the same thing...ok, we have a genome--now what. Some people though that discovering the genome and putting it together would decode the 'language' of life...not so much. In our process of understanding a genome, we conduct assembly, gene/ORF finding, assignation of motifs and regulatory areas if possible. Understanding genomes also requires theory just as differences in anatomy suggest adaptation in animals and similarity suggesting common origins; so is it with genomes. Similarity suggesting common origins and differences in sequence suggesting adaptation. Genomes provide a common 'yardstick'.



Vignette 1: Of Whisky and Fungus, Comparative Genomics

So, Antonis' lab studies fungi, the most sequenced eukaryote out there.The study the DNA record to gain insight into evolution’s patterns and processes using computational and experimental approaches in a variety of lineages. Genomics and reverse genetics have taught them that the language of genomics is to be found in biochemistry. You will never seen the 'nose' gene but you will see the gene for serpentine receptors (necessary for the building of the nose). Fungi have two types of metabolism one for eating and one for defense and they grow just about everywhere on everything...just about. Apparently while they will grow on normal dinner fries, they will not grow on McDonald's fries--ya that's right think about that next time you indulge in your McDonald's addiction. Move over twinkies, McDonald's has you covered after the apocalypse...or at least McDonalds french fries do.

We also learned fungi like whisky:

“I put maybe a shot of whiskey in a liter of agar and filled the petri plates with it,” Scott says. “That made it grow a hell of a lot faster.” ~Dr. James Scott, Mycologist (quote from Wired Magazine story)

So in an effort to understand and compare different species of fungi from full genome sequencing a couple of programs were designed for pathway reconstruction or identification and clustering. The KEGG pipeline focuses on discerning biochemistry from sequence data and the program entitled CRAP (cluster reconstruction and phylogeny) was used to address well...c.r.a.p--cluster reconstruction and phylogeny (I told you, you couldn't make this stuff up...ah science). So it took me awhile to hunt down CRAP on the web but I finally found it...and I threw my arms in the air and said "HUZZAH! I found CRAP"...and then I stopped for a moment, oohhh Antonis and James, what have you done to me!

Another way in which to understand genomes is to conduct comparative synteny analyses...synten-huh? So in looking at the similarity of gene order on clusters of genes you can determine perhaps what sections of the genome have been flipped (inverted) or moved (via horizontal gene transfer). If genomes are identical then they are 100% syntenous, if not then they'll have clusters of genes perhaps that have been deleted, moved or inverted when compared to other genomes of the same species.

Examples of differences in synteny among fungal species ala Antonis' presentation. I haven't pointed all of them out, just some to give you an idea.
So understanding genomes and comparing them is more than just sequencing and annotation; evaluating genome architecture (ie. synteny) is also very useful when looking at how genomes differ and perhaps why.

Seriously...I wasn't kidding about the fries


Or the CRAP...


Vignette 2: Domesticating fungus like we domesticate the dog...'HEEL FungIDO!'

Let's switch fungal gears and head over to Aspergillus. Study of Aspergillus oryzae has led to interesting insights into domestication of my favorite sake fungus. Evolutionarily A. oryzae is closely related to his/her ugly cousin A. flavus--a nasty bugger that is an agricultural pest, aflatoxin producer and costs a lot in damage every year. What's a aflatoxin? Well I know it's a carcinogen and since it doesn't sound like a spa treatment, I think I'll stay away.

A. oryzae on the other hand assists in the production of sake, soy sauce (shoyu) and miso. It's a non-aflatoxin producer and receives a big thumbs up from the USDA. It has the pathway for aflatoxin but it's inactivated. Essentially you can think of A. oryzae as the domesticated version of A. flavus. In the study they analyzed 16 genomes, 8 oryzae and 8 flavus to discern how this process of domestication occurred. They used Illumina sequencing and obtained 12-30 million 80 bp reads that amounted to about 20x coverage across the genomes. They then untangled ~100,000 SNPs and to asked how many genetic populations they had and also did some gene cluster mapping in regions. This led to some RNA-seq work that elucidated how A. oryzae because atoxic. The reason was that in the sake making process yeast was put into the mix to break down sugar to alcohol and so A. oryzae has to become atoxic so as to not kill the yeast (S. cerevisia). In summary Antonis gives us the 'road to domestication'.


Pretty cool and if you want to know the nitty gritty details of all this:

Gibbons et al., 2012. Global transcriptome changes underlying colony growth in opportunistic human pathogen Aspergillus fumigatus. Eukaryotic Cell 11: 68.

Gibbons et al., 2012. Benchmarking next generation transcriptome sequencing for functional and evolutionary genomics. Mol Biol Evol 26: 2731.

Rokas et al., 2007. What can comparative genomics tell us about species concepts in the genus Aspergillus? Stud Mycol 59: 11.

Vignette 3: 'Tree of life' delusions?

In our last story of the morning, we walked over into phylogenomics. Richard Dawkins once was quoted saying that he believed there was a tree of life and that we'd discover all it's branches by 2050. When I think of this in terms of bacteria my mind reels, we haven't been able to scratch the surface of bacterial diversity or all the branch relationships within it. And as Antonis goes on to discuss it's pretty difficult for fungi...yeast...just about anything in fact. Where do you draw the line or define 'species'? At what point is your marker whether it be one gene or a whole genome accurate enough at reflecting the 'true' evolutionary history of your organism? What resolution genetically do you have to go to?

Give us some spells and voodoo! Sadly there are none.

Going back to a similar mantra repeated many times during the workshop, 'it's going to depend'. The problem lies in gene(s) selection. Fact is, many genes when you compare their phylogenetic trees are highly incongruent...for those of you who are tired, that means 'they don't match'. Antonis showed that incongruency is pervasive (35-48%) in mammals, insects...and worse in other organisms. In using yeast it was very difficult to resolve relationships among species with deeper branches. There is a lot of disagreement. Concatenation of genes helps but again, it'll depend how many you concatenate and the selection of genes you choose to concatenate (adaptive? conserved? a mix of both?). He got a comment from a reviewer that seemed particularly appropriate in describing the conundrum we face in finding 'the tree of life'.

"Plainly stated, taxonomists keep digging the same hole and falling down it; all that has changed over the years is the sophistication of the shovel..." ~Anon Reviewer

So what do we do with all this? Is phylogenetics worthless then? Absolutely not...but as with all these blog entries...lets throw some more considerations for your OCD tendencies:
  •  High bootstrap value does not mean 'seal of approval' for relationships in a tree. While it's not a useless measure by any means, the more taxa you have in your tree the higher the bootstrap can go just inherently--it's apart of the math involved in bootstrapping and the Rokas lab found this in their work.
  • Be aware of deep branches. I'm not saying they aren't 'true' but generally shorter branches with high support are easier to tease apart than longer branches. Try to get lots of taxa in (computational time permitting) to 'fill in' where you are getting longer branches to resolve the clades better.
  • I (mel) generally run trees with lots of closely related species/taxa as much as possible, then zoom into my clades of interest. This helps in that I can have lots of taxa that help with resolution but I don't have to print a tree with 100's of taxa I ultimately don't care about, I can zoom into the topology and focus on the relationships of interest.
  • Gene trees can and will disagree many times because of duplication events, lineage sorting, HGT/recombination, gene loss etc. Try concatenation and I have a habit of avoiding genes near mobile elements like transposons because that increases the likelihood they've been transferred perhaps.
  • Consider using a concordance factor which is a the proportion of the genome for which a clade is  true. Using a program like BUCKy will help you. (This is a Bayesian program out of the Rokas lab). When in doubt try and use multiple tools to confirm your tree nodes, clades and relationships.
  • Use other data like RNA-seq data or increasing genomic depth to clarify/confirm your sequences and increase depth to improve your confidence in the quality of the sequences underlying your tree.

And there you have it, munch on that for a bit while I construct my final blog...

Next Up: Final musings and random highlights from this years workshop...ala Mel