Sunday, January 20, 2013

Blogs I follow, for those who are now addicted to Science-y blogs...

So it was requested of me to post what blogs I personally follow so I will list them below.

As I said in the first blog for the series in Cesky Krumlov...I most likely will let this blog go dormant until I have something useful to say, but for those of you who would love to get daily musings from other scientists in the field who post of a semi-regular/daily basis let me wett your appetite for what's out there and you can search on your own too.

Normally I subscribe via RSS/Google reader. If you have google reader and there isn't an RSS subscribe button on the blog simply copy/paste the URL into google reader and it'll subscribe to what's posted at the URL.

Happy Reading!


  1. Aetiology: discussing causes, origins, evolution and implications in human disease (Tara Smith)
  2. Avian Flu Diary: infectious disease hobbyist, influenza and disaster preparedness
  3. Xkcd: comic on physics, math, science--pretty much geeky fun.
  4. What If: same guy as xkcd but answers questions using math, science, physics etc...in the most awesome geeky way possible.
  5. The tree of Life: Jonathan Eisen's blog about microbes, genomes, omics, UCDavis and the tree of life
  6. BacPathGenomics: genomics and evolution of bacterial pathogens
  7. coastalpathogens: blog about coastal pathogens among other things.
  8. This is NOT Junk: Michael Eisen's blog (equally great) about DNA, evolution, open science, genomes etc.
  9. Daniel Wilson's blog: evolutionary biologist and researcher in genetics
  10. Download the Universe: book reviews that include science books, focused on e-books
  11. DrugMonkey: US biomedical research industry blogger
  12. iMicroBham: Science teaching blog
  13. microBEnet: microbiology of the built environment
  14. Microbiology Bytes: latest news about microbiology
  15. Omics! Omics!A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery.
  16. Outbreak News: news on outbreaks all over the world
  17. Pathogens: Genes and GenomesA heady mix of bacterial pathogenomics, next-generation sequencing, type-III secretion, bioinformatics and evolution!
  18. PLoS Blogs Network
  19. Rob Dunn: Wildlife of our bodies writer/biologist
  20. Science Professor: life of a science professor
  21. Science-Based Medicine: Issues and controversies between science and medicine
  22. Science Hubb: A blog about interesting science
  23. SEQAnswers.com
  24. Seqonomics: economics of personalized medicine from a Sanger Inst. researcher
  25. Small Things Considered: Amer Soc of Micro blog on microbiology, virology and parasites
  26. The febrile muse: portrayal of infectious disease in literature and arts
  27. The medicine show: Forbes blog from Matthew Herper on Science, politics, education etc.
  28. The molecular ecologist: blog on molecular ecology (all organisms)
  29. Twisted Bacteria: science communicator who focuses on actinomycetes
  30. Virology blog: Vincent Racaniello's excellent virology blog and podcasts
  31. What's up doc?: Blog for postgraduate researchers with resources, links and advice
  32. zoonotica: blog that focuses on the viral/pathogen human/animal interface
  33. All creatures great and small: Preaching microbial supremecy: Science teaching blog from professor at primarily undergraduate teaching institution
That ought to get you started...

Workshop on Genomics 2013: Final musings, anecdotes, mutterings and OCD tendencies garnered...

Greetings bioinformatic campers!

Last blog from ever so idyllic Prague, Czech Republic.

It has been a long two weeks, jam packed with beer, wine, sequencing, programs like RAD and programs like CRAP--mmm, haha; fun-guys, snow ball fights, genomics, scary math, more oddly named software, tutorials galore, unix ninjas and sequencing gurus, enough 'considerations' to have us spinning for years to come...so lets rehash the glory a little bit shall we?

Of life, love, linux and latent twitching... Highlights from 2013 Workshop on Genomics, Cesky Krumlov, Czech Republic in no particular order--most of the below is in good fun, a lot intellectually was learned, see all the previous blogs! But in reflection there were some truly golden moments both scientifically and personally as those that attended and those that ran the workshop were indeed awesome like that:

Blog Series: WoG, Cesky Krumlov: Day 12: Evolutionary genomics, Sake and a program called CRAP (you can't make this stuff up...)

Well I hope everyone has been enjoying reading about the various topics discussed at this workshop. Hopefully you have some new ideas, new programs and new considerations to fuel your OCD tendencies and make you twitch at night...

And with that...our last speaker for the workshop:

Antonis Rokas
Vanderbilt University
Nashville, TN

Topic Evolutionary Genomics

So, given we've been on high octane genomics using NGS for 11 days straight Antonis was merciful to us for this last talk and after a brief introduction to evolutionary genomics gave us some 'vignettes from the field' on the topics of comparative functional genomics, populations genomics and phylogenomics.

The progress of our understanding and study of genomics is no different than other fields historically. In geography old maps would have you falling off the end of the world (cause you know it's flat right?). Then as we became more informed about our world (and it helps that explorers didn't fall off the end) our maps evolved and today we have google maps, google view, google directions...yes, be comforted in knowing....google is always there...watching....you.

During our infantile understanding of chemistry--it was alchemy and you can imagine all the shenanigans that must've taken place as people desperately attempted to heal in a time of limited medicine, limited understanding of the human body and of disease in general--so they turn to herbs and 'potions' and did many things that ended up more harmful than helpful. Now there's a pill for everything...and then some!

When the first genomes were sequenced it was much the same thing...ok, we have a genome--now what. Some people though that discovering the genome and putting it together would decode the 'language' of life...not so much. In our process of understanding a genome, we conduct assembly, gene/ORF finding, assignation of motifs and regulatory areas if possible. Understanding genomes also requires theory just as differences in anatomy suggest adaptation in animals and similarity suggesting common origins; so is it with genomes. Similarity suggesting common origins and differences in sequence suggesting adaptation. Genomes provide a common 'yardstick'.

Friday, January 18, 2013

Blog Series: WoG, Cesky Krumlov: Day 11: Functional metagenomic modeling with some Drosophila sperm thrown in for good measure...

Joseph Bielawski
Dalhousie University
Halifax, Nova Scotia, Canada

Topic: Searching for functional divergence in genomes and metagenomes

We are going to start off with the metagenomic portion of the talk first then at the bottom we'll hash out the genomic portion of his talk...

So in conducting research in metagenomics, as we learned from Rob Beiko's talk it's about who is there and what are they doing. Today we focused on how to infer function from metagenomic data. How to tack on phenotype to a metagenome 'genotype' if you will.

So you have two approaches in metagenomics: targeted analysis (ala PCR amplification from the environment using universal primers to catch all the organisms with your gene of interest) and random analysis which is a catch-all for everything you got within your sample. Now it's always great to have apriori knowledge and you are highly encouraged to collect as much metadata as possible about your sample...but inferring function from metagenomics is quite daunting especially if you have little to go on so it helps to have a model.

Now models are by no means going to fully explain exactly what's going on in the actual environment but they allow you to make inferences based on your data that you can explore in further detail and corroborate.

The model we will discuss actually doesn't have a name that I could find within his slides! So I will call it MetaG-MetaP-Modeling (MMM)...metagenomic metabolic pathway modeling. Bear in mind when the publication comes out it'll have most likely a cooler name.

Thursday, January 17, 2013

Blog Series: WoG, Cesky Krumlov: Day 10: How beavers and black queens teach us about Metagenomics...

Robert Beiko
Dalhousie University
Halifax, Nova Scotia, Canada

Topic: Metagenomics

So the term 'metagenomics' was coined by Jo Handelsman in 1998. Metagenomics describes the functional and sequence based analysis of the collective microbial genomes contained in an environmental sample.
  • This rather 'pure' definition excludes PCR based metagenomic studies as they only provide information about one gene.
The beaver gut is an example of a microbial community hard at work digesting the wood the beaver eats. Unfortunately, as I learned anew today...that microbial community is apparently also nom-i-licious and also gets digested at some point. Sucks to be them. But given turnover the cycle continues, the wood is digested and the balance of nature maintained. Still sucks to be a bacterium in the beaver gut...I gotta say.

Metagenomics asks two essential questions:
  1. Who is there?
  2. What are they doing?

TANGENT!: Comparison Charts for NGS Platforms 2013

If you are thinking about buying an NGS platform...

Travis Glenn has released the 2013 tables comparing NGS platforms seven ways from Sunday, so you can make an informed decision.

I found them on a blog I follow: www.molecularecologist.com

Table 1a-c: de novo, resequencing and other applications. NGS Platform grades.

Table 2a: Runtime, reads and yield

Table 2b: Costs/run, Costs/MB, minimum costs

Table 3a: Instrument Costs

Table 3b: Computation Resources

Table 3c: Error Rates

Table 4: Advantages and disadvantages of each instrument

Citation: Glenn, TC. 2011. Field Guide to Next Generation DNA Sequencers. Molecular Ecology Resources. doi: 10.1111/j.1755-0998.2011.03024.x

Blog Series: WoG, Cesky Krumlov; Day 9: A Tale of RAD-taggin Sea Drag-ins

Dr. Bill Cresko
University of Oregon
Sensei of the Unix Ninja

Topic: Genomic analysis of non-model organisms, RAD-tags and STACKS

Greetings!

So today we launch into how to analyze critters that do not contain a reference genome or have relatively few annotations associated with their genome, the proverbial black box we bang our heads against.

What is a non-model versus model organism anyway?

Model organisms have 'an entire community trying to dissect one species usually in an effort to understand humans better.' The mouse is a model organism, E. coli K12 is a model organism.

Non-model organisms are studied by relatively few, don't contain good references usually and have no correlation to research going on in humans; meaning they don't necessarily further our knowledge of processes in humans.

However, despite not having the breadth of knowledge that model organisms have--the literal arsenal of annotations--we still have the same questions about non-model organisms as we do with model organisms!
  1. How do major differences among lineages evolve
  2. What is the relatedness between organisms?
  3. What is adaptation like in these organisms?
There are fundamental processes in evolution:
  1. Origin of genetic variation: via mutation and migration
  2. Sorting of that variation: variation can be affected by genetic drift and natural selection for instance. For those drawing a blank on 'genetic drift'...it is the change in allele frequency over generations...for those of you drawing a blank on 'allele'...an allele is essentially a gene or better yet a specific 'version' of the gene. So one gene can have more than one 'flavor'. An allele is a particular flavor of that gene as determined by it's sequence data.  For those of you drawing a blank on gene...click over to the nyan cat you tube video--be entertained and don't come back to this blog until you know what a gene is.
  3. Simultaneous genotyping of neutral and adaptive loci (for population genomics): neutral loci provide a genome wide background that gives you estimates of effective population size and can be used for phylogeography. Adaptive loci are outliers from the neutral background and can lend insight into selective sweeps or local adaptation.
So the naive solution when approaching a project with a non-model organisms is to just 'sequence everything'. Why? Because sequencing is still quite expensive and for many studies is pretty much a waste.

Genomes are generally organized as linkage blocks so essentially as long as you have well spaced markers that will work just as well for genotype for a fraction of the cost. Having genetic maps are very useful in genome studies and often times a great first step to guide you in whether full genome sequencing really needs to be done to answer the scientific question you have.

So the Cresko lab is heavily involved in an alternative approach that exploits these linkage blocks in the genome. It bypasses the need for a full genome and still provides tons of data about the genome in addition to building a genetic map and doing genotyping that can guide further studies/hypotheses.

The technique is called RAD-tags, the program/pipeline is called STACKS and the 'grand architect' our resident Unix Ninja--Julian Catchen.

Tuesday, January 15, 2013

Blog Series: WoG, Cesky Krumlov: Day 8: Transcriptomics

Where did days 6 and 7 go??? 

Lost inevitably in the chaos of RStudio Lab tutorial, Python Lab Tutorial, Cesky Krumlov Castle touring and wine tasting followed by copious amounts of wandering, eating, more wandering, snow ball fights, freezing feet, beer drinking and sleeping.

The tutorials are well written, given there was no lecture for these, I've simply linked the tutorials...go nutz.

Remember I've discussed python in previous blogs in the programming prep blog for instance and linked Tyghe's website for further tutorials using python--the resources are at your fingertips, have at it! Feel free to direct python related questions to Tyghe's blog or if you happen to know Daniel McDonald from the Univ. of Colorado feel free to harass him as well...I will not link his email in case he hunts me down for giving you all his information and challenges my husband to a python duel--coding at dawn!

Now...on to week 2!!!

Friday, January 11, 2013

Blog Series: WoG, Cesky Krumlov: "One *ome to rule them all...?" and other anecdotes from this evening

Apologies to the Lord of the Ring die hards who are probably outside my door with torches and pitch forks...but truly this evenings discussion really brought some things to light and well...tossed other things into the dark...

Ah Science...

Blog Series: WoG, Cesky Krumlov; Day 5: Short Read Alignment

I found myself attempting to remember what day it was today...apparently lots of attendees including myself are losing track of the days. One thing we never lose track of though is the bar...and heading to it for a few beers/drinks after the 7-10pm session...cheers.

Dr. Konrad Paszkiewicz
University of Exeter

Topic: Short Read Alignment

In general short read alignments are difficult because the shorter your read the less likely it is to match uniquely to a given reference or sequence of interest. Instead it'll match to multiple places and you won't be sure exactly where it goes.

Ok, so if it's difficult to align short reads, why do people generate them? Well for one it's cheaper. Additionally, for many applications a short read of about 50 bp is enough to work with; for example resequencing of small organisms, de novo analysis of bacterial genomes which are usually quite small compared to a human genome, ChIP-seq or digital gene expression.

Blog Series: WoG, Cesky Krumlov: A note on Emacs

An Emacs aside...

So if you've been following the slides Julian had a slide in his Unix section about Emacs versus Vim and how to use Emacs. Now, we didn't get to it yet and I don't know if we will, but I saw this and thought it would be amusing to those who use Emacs or Vim or other editors...

Of course my husband is a die hard Vim-er. When I told him we'd be learning Emacs, his response over gmail chat was "Vim or death". However, after some chat discussion he acquiesced that I can learn Emacs if I wish, however I am never to speak of it...

credit: www.xkcd.com/378
For those interested in striking out on their own:

Emacs: http://www.gnu.org/software/emacs/
Emacs Tutorial: see Julian's slides and http://www2.lib.uchicago.edu/keith/tcl-course/emacs-tutorial.html

and so I don't go home to divorce papers sitting on the kitchen table...

Vim: http://www.vim.org/download.php
Vim Tutorial: http://blog.interlinked.org/tutorials/vim_tutorial.html
Vim Tips: http://vim.wikia.com/wiki/Tutorial

Cheers.

Thursday, January 10, 2013

Blog Series: WoG: Cesky Krumlov; Day 4: Assembly

Dr. Rayan Chikhi
Pennsylvania State University

Topic: De novo Assembly

A whole day of assembly!

There is no single program right now that is considered 'the assembler'. Different assemblers have advantages and disadvantages as well as things they are generally useful and not useful for. So one thing in todays assemblers is that they all take a lot of time and memory to run--especially when doing de novo assembly. One of the exceptions is the program Minia, developed by Dr. Chikhi which was designed to run efficiently using low memory requirement.

One of the important things that you need to know for assembly is what a k-mer is. A k-mer is any sequences with length k.

AGC is a k-mer with k=3
AGCT is a k-mer with k=4
AGCTT is a k-mer with k=5

You hopefully get the idea. 

There are two essential methods that assemblers use to assemble: de Bruijn graphs and overlap/string graphs. Now we sort of covered this in the Assembly prep blog...lets see if I can explain this better here now...

Blog Series: WoG, Cesky Krumlov; Day 3: Unix, Part 2--Ninja-ery

Julian Catchen
University of Oregon
Unix Ninja

Topic: Unix Part 2

So unfortunately as with the quality control blog, I am or will be unable to give you files that we practiced on but Julian's slides are quite good. We learned about pipes and added on to our current knowledge of command line. We finished up the end of the first slide set... which included all the following commands:

  • man [command you are confused about]: manual for commands that gives you all the options. Some man pages are more helpful than others but if you stare at it long enough it'll start to make sense. Often times there are examples of usage so pay attention to those.
  • ls, gunzip, more, cat, head, tail, grep, wc...all that we learned yesterday--so don't forget it and today we ended up having to man some of those commands to learn the options.
  • sort [filename], just as it sounds--depending on what kind of sort you want you have to specify an option (see man page for sort; ie. numeric, alphabetical)
  • uniq [filename], as with sort, lots of options to tease things out of your file.
  • cut [filename], we learned this before too--BUT I learned anew that it is only for column files today.
  • tr "       " "," [filename]: translate command that changes all tabs to commas in the given filename.
  • tab = Ctrl+V+tab or \t
  • |, this is a 'pipe' read Julian's slides part 1
Example: What do you think I just did to this file?

cat batch_1.genotypes_1.loc | tr "    " "," | grep "^96053"

  1. I grabbed the file = cat
  2. I piped it to the command 'tr' specifying I wanted all tabs changed to commas
  3. Then piped it to another command 'grep' (remember what grep does?)
  4. With grep I specified I wanted to look for all entries where the beginning of the line (hat symbol) started with 96053

We didn't get through all these slides but they are great so have a gander...

What we did get through:

Wednesday, January 9, 2013

Blog Series: WoG, Cesky Krumlov; Day 3: Genomics Study Design, a.k.a. "To seq or not to seq, that is the question!"

So I totally slacked off today and went to lunch instead of writing the usual afternoon blog of the morning session, I hope you'll all forgive me, but to be fair those of you in the U.S. weren't even out of bed by the time lunch for me rolled around!

All of the presentations so far have been really awesome and informational so I hope you will take advantage of all the slides being posted on the website!

Today's morning session is great for PIs and students wishing to design sequencing experiments and determining to get an NGS platform.

I will be interjecting during this blog post...my interjections will be in a different color (probably green, because I like the color green).

Tuesday, January 8, 2013

Blog Series: WoG, Cesky Krumlov; Day 2: Data Quality Control--no really it's more fun than it sounds...

In truth...to me, data sequence quality control is necessary and ok and it was fun when I was learning it but the further in you get the more you want to automate the hell out of it! WELL, lucky for you, you probably aren't at that stage yet so we are going to start fresh thanks to Naiara's talk, slides and exercises ala this evenings lab!

It is time, my fellow command-line/terminal apprentice ninjas...let's DO SCIENCE

Now I won't be going through absolutely everything in this blog entry but I will cover most and offer tips to help the exercises go smoothly for you.

Blog Series: WoG, Cesky Krumlov; Day 2: Unix

Dr. Julian Catchen
University of Oregon
Full-time Unix Ninja

Topic: Unix

So in my prep blog on programming I went through some of the basics of command-line and ninja-ery. But really you cannot do Unix command-line justice unless you jump in and do it yourself with the UnixTutorial provided. Additionally, Julian's slides are up on that same page and there are great at illustrating comparisons between what we are used to (graphical user interface) and how that translates or what that looks like on the command-line (or in the terminal) and the slides are in pdf format, so download them and learn about the history of Unix and take the tutorial.

Highlights?

  • Unix was originally developed by AT&T!
  • Steve Jobs was fired from Apple, developed Nextstep, Apple went into the tubes, re-hired Steve Jobs--Steve Jobs promptly threw their operating system out and applied Nextstep which because OSX (it's Unix based).
  • Google Android runs Linux
  • Airplanes with personal movie systems run Linux
  • Wireless internet routers run linux
  • By the end of these two weeks he plans to make us Unix/Linux converts
  • We googled 'unix commands' to obtain a cheat sheet of commands to help us if we forget, HEY there is also a cheat sheet I linked for you on the programming prep blog--go figure! :)
  • By the end of this session you should know about the following, if not, go back to his slides and the Unix Tutorial
  1. change directories
  2. list files, list all files, list files that humans can read
  3. move up and down commands on the command line
  4. create directories
  5. know what relative vs. absolute paths are
  6. know how to figure out where you are
  7. know three ways to 'get home' from anywhere on the computer system
  8. know what tab and tab tab do and why they are totally cool
  9. more, head, tail and cat
  10. know how to unzip and de-tar files or do both at once
  11. what is grep?
  12. how to obtain line counts in a file
There is a slide called "Explore the file hierarchy" YOU CAN DO THIS WITHOUT WORKSHOP FILES! Explore your own computer file system! Huzzah!

My favorite quote of the evening:

"When you type 'cd ..' it's a like a worm hole that comes and sucks you up a directory, it's really cool" ~Dr. Julian Catchen, Workshop on Genomics, Cesky Krumlov 2013


Blog Series: WoG, Cesky Krumlov; Day 2: So you want a NGS Sequencer eh?

Dr. Konrad Paszkiewicz
University of Exeter
Director of Wellcome Trust Biomedical Bioinformatics Hub

Topic: DNA Sequencing Technology: Past, present, future

Good morning bioinformatic campers! Well, afternoon for me, but morning for many of you back in the U.S.

So much of what was covered this morning is going to be redundant with the DNA sequencing preparation blog I wrote previously. So between this entry and that other one you will hopefully get a complete view of the 'state of the union' where sequencing is concerned and prospects for the future. The nice thing is that Konrad tossed in a lot of pro/con lists for different platforms, so those of you considering NGS in the future, this is a bare bones, get you started guide as to what's out there and whether 'it's worth it' for your own research to invest in a platform and which one to invest in. Again, I still highly recommend Dr. Elaine Mardis' talk that I linked in my DNA sequencing technology prep blog.

First and foremost, if you are familiar with what molecular biology is and what sequencing is and don't know who Fred Sanger is...then you've probably been living in a hole...

Monday, January 7, 2013

Blog Series: WoG, Cesky Krumlov; Day 1: Amazon Cloud

The Amazon Cloud
Dr. Konrad Paszkiewicz
Director, Wellcome Trust Biomedical Bioinformatics Hub

Topic: Amazon Cloud

Pros

  • No need to house/maintain servers
  • No need to worry about backing up
  • Only pay for what you use
  • Upgrade are handled by Amazon
  • You can expand and delete storage as you have need
  • There are many many preconfigured virtual machines (VM) to pick from if you don't want to develop one on your own (QIIME, STACKS and Short read aligment all have their own VMs)
Cons
  • You will pay for it; storage even when you are not using it, time using it etc and many researchers are resistant to the idea that they have to 'pay' for computing power and storage. And they are surprised by how much computational power and storage can cost. Researchers need to start being trained to think of about computing costs in their grant proposals--cost of programming and software, costs of hardware, costs of the people that will need to be hired to program and do analysis.
  • Cost of an Amazon VM can run 0.20-3.00/hr; you also pay by the Gigabyte.
  • Data transfer to your VM could be slow
  • If the network is down you won't be able to access your VM
  • Typical per month cost is $200-$400 depending on how much data you store and how extensively you use the machine. So you have to decide is it worth it? What are the costs and benefits of buying and maintaining your own server or computing system that's powerful enough to run analysis and store all your data versus using Amazon web services.
Read on if you are interested in the tutorial/exercise we went through today despite it most likely not being applicable to your situation unless you have access and/or are interested in implementing amazon cloud services.

Blog Series: WoG, Cesky Krumlov; Day 1: Mammalian Genomes

Dr. Chris Ponting
MRC Functional Genomics Unit
University of Oxford

Topic: Mammalian Genomes

This afternoon's session started off with a topic I'm pretty far from, human and mammalian genomics. I've worked hard in my career to stay away from diploid genetics--I like the simplicity of viruses and prokaryotes by comparison. Today's talk was in 3 parts focusing on the human and mouse genomes, functional DNA and transcript maps and the future.

Blog Series: WoG, Cesky Krumlov; Day 1: Introduction

Greetings!

Day 1: Introductions, Reasons, Rationales and Methods to the Madness amounting to the creation of this workshop.

So last night we had the opening reception where everyone was able to meet; it is quite the diverse crowd but I simply had to corner Scott Handley and ask the question I'm sure he gets several times over--so much so that he's got slides worked into his introduction presentation talking about how the workshop came about and how it ended up in an isolated idyllic town 3 hours south of Prague.

Scott and his colleagues have been hosting workshops for several years now. Originally he'd attended, then TA'd for the workshop out of Woods Hole, MA which focused on molecular evolution. A workshop I tried for many years to go to as a graduate student, but alas was unable. But from others I have heard over and over how amazing of a workshop it is and therefore I highly recommend you look into it if that's the direction you'd like to go in your research. From a Woods Hole start they commenced doing workshops in other areas such as doing one for the CDC in Atlanta at the Smithsonian as well as Fort Collins, CO. Then they decided to do one in Europe and in the course of their research and contacts, found Cesky Krumlov. Cesky Krumlov was ideal--it had infrastructure for computer analysis (internet, ability to charge laptops easily), it was beautiful and different, but not so exciting that you'll miss classes to 'see the sights', and inexpensive. The workshops pay for themselves via the registration fees, there is no  outside grant that covers workshop costs here. If they get enough registrants the workshop is on, if not it's cancelled. So far attendance has been good and the workshops have all been rewarding experiences for all involved-so they've continued to host them.

Through the course of our conversation it came up, the rationale for this kind of workshop. When they first hosted the genomics workshop most of their attendees were PIs that had just gotten next generation sequencing capabilities and now had all this data they had no idea what to do with. Now in it's third year the attendance has moved more toward graduate students and postdocs who now are wrestling with mass quantities of data produced in their labs. He used the analogy of a tidal wave of data, easily overwhelming anyone in its path. There are simply not enough folks with the training and skills to manage and analyze this deluge; hence the workshop on genomics was borne.

Another topic of consideration is the value of continuing education in our field. Many times family members are asking me 'what? you're taking classes?...didn't you get your Ph.D.? Aren't you done with school already!?' Quite frankly in our line of work, the 'schooling' never stops, when it does, your career will be dead. If you are not moving as fast or at least attempting to catch up with your field through continual reading, workshops, collaborations, short courses, conferences...then you will become a relic before your career even starts. In the past it took 30 years to become a relic in biological research--then they invented high throughput sequencing--curses! Now if you don't keep up you can be easily outdated within a year--a year! Not only that, but it takes a skill set in order to conduct bioinformatic analysis, a skill set that has be learned flexibly so that it can easily and quickly change and adapt as the technology does. And it takes time to learn that skill set and perfect it's flexibility, so you have to invest that time (and money) to seek out those opportunities that allow you to grow and learn with the field.

Often times I think researchers have the impression that once the laboratory experiment (as in, wet lab) is run that the hard part is truly over. Not so much. Bioinformatics is not a 'magic black box' you push data into and out comes the next Science paper. Utilizing software, writing code, implementing bioinformatics is an experiment in and of itself. It can malfunction, you can put the wrong 'reagents' (Parameters) into it causing you to have to re-do it, it can break altogether, you can set up the experiment incorrectly and the computer gives you garbage back--after running for 48 hrs--doh!!! In silico computer based 'experiments' can take just as long or even longer to obtain results from as any laboratory experiment and more and more there should be a growing respect for this in the field. Quite frankly, though things are moving faster--they are still going to take as long as they take. And believe me I scream and yell at the computer runs as much as any gel or culturing experiment--both can take days and days to run, only to find our you missed one small crucial point and now have to start all over. SO, have the same patience and respect for bioinformatics as you would any wet lab experiment, believe me--it'll save you frustration, anxiety and exhaustion in the long run.

And that's just getting to the point of getting output from your computer experiment...

Now you have to figure out if the result makes sense or if you goofed and have to troubleshoot. See, no different than the wet lab. Honestly, sometimes I wish I could fix the problem with duct tape, new wiring, cleaning the system and tubing as opposed to having to glean through hundreds of lines of code or parameters to find the misplaced semi-colon! In the lab you visualize everything on your bench, in bioinformatics, it's all in your head, in the computer software (be it code or software parameters) and of course 500 little sheets of paper littering your desk with the schematics and flow of what you are doing or trying to do at each computational step.

Continuing education? Best thing ever.

Ready, set, sequence! Read on for todays highlights...

Saturday, January 5, 2013

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Programming

Dobrý večer! from Cesky Krumlov, Czech Republic! Ok that's enough Czech from me...

Section 8: Programming

So BioPerl and (I'm going to plug BioPython in here too--see below, PyCogent) are basically what they sound like, perl codes/scripts/modules (however is easier for you to think about it) and python code (Perl and Python being programming languages) geared toward applications in biological analysis.

Now my husband is a programmer turned bioinformatic programmer and his best advice is to jump right in and just keep using it. My main concern with that...aside from my inability to manifest a 36 hour day that would allow me to take on learning a computer langauge...is I don't use it every day in my job. This makes it difficult to say the least to retain all the commands in your head and even then--when in doubt 'Google!' The links above, the BioPerl one being suggested in the preparation materials for the workshop and the BioPython will give you an idea of the modules/programs that have been constructed using those languages and provide more links if you want to get further into the mire and meld programming with biological analysis. You will have to learn some basics of the language itself before jumping into the biological application of it just for functionality's sake.

Now that's all I'm going to say for the moment, but we'll come back to this...my husband who was trained as a computer scientist and is a programmer who just recently got thrown into the world of biology head first and is now programming (using Python) for bioinformatic analysis has a python tutorial and some advice to dispense to you all who aspire to move in that direction...but he has to write it up.

In the meantime, one thing you absolutely need to get comfortable with is the ominous black box called command-line. You simply have to learn how to navigate around your computer in command-line interface. The conference organizers have provided a helpful tutorial so we are going to go through that and I'll add as we go based on my own trial and error experiences.

Thursday, January 3, 2013

So a funny thing happened on the way to the Czech Republic...

Section X: Nervous flying whilst on OTC sleeping medications, what happens when said flight is delayed.

Indeed your blog post to assist you preparing for the programming highlights of this workshop is coming...but first a slight digression...

The time is now 20 minutes to midnight and I sit at the Dulles airport in Washington DC. In a perfect world I would be 37,000 feet in the air enjoying some slightly questionable airplane food they call 'dinner'. Alas, our airplane door had other plans. In an effort to conduct a formal protest, our airplane door malfunctioned during boarding and we all de-planed while mechanics endeavored to remedy the situation.

Now I am somewhat of a nervous flyer so thinking myself ever so clever a half hour prior to our scheduled departure I took some pills to help me sleep. I was promptly notified, nay, chastized by a fellow passenger as we de-planed that you NEVER take OTC sleeping medication until you've taken off and have level'd off. Duly noted fellow passenger with the blue cap. So now I sit in the terminal, thankfully with free wifi entertaining my insomniatic facebook friends with random postings to keep myself from passing out and inevitably missing my flight as a consequence.

I would dearly regret having to cut this blog short but...but should you notice no further blogs, rest assured that they will start up again when my sleeping pills wear off and I wake up...probably still in the Dulles Airport awaiting my flight, the blue airport rug judging me for being so naive as to take sleeping medication prior to actually being in the air. Go ahead blue Dulles airport rug...judge me, I judge myself a little bit.

What does any of this have to do with the workshop? Um...you got me, except I really ought to get there at some point so I can share the wealth of knowledge with you.

Lessons learned:

  1. Murphy's Law states that when you get to the airport earlier than God himself, there will be no lines...anywhere. However, when you are running late with your hair on fire and your head rolling on the ground--you'll inevitably be stuck behind Myrtle in security, the 80 year old biddy who keeps forgetting to take the 3 oz bottle of lotion out of her bag and putting it on the conveyor belt despite being told several times by TSA.
  2. Never take OTC anything until you are in the air and can be assured you won't be shuffled into an alternate dimension by said malfunctioning airplane door and a glass of wine.
  3. Well played malfunctioning door...well played.
  4. Charging stations and free wifi at the airport are golden, cherish them.
Sites to assist you in your preparation for attempting to leave an international airport in D.C.

Back to academics! Programming prep blog to come...assuming my plane takes off at some point and I make it to the Czech Republic. Many safe travels to all workshop attendees who will be entering the cloudy highway to or through Europe soon.

Next Up: Truly....there will be a Preparation--Programming Blog soon...

Blog Series: Workshop on Genomics; Cesky Krumlov; Preparation--Metagenomics

Section 7: Metagenomics

Metagenomics is a massive topic! My first encounter with metagenomics was in my Ph.D. work; metagenomics of a hot spring microbial community (we focused on two hot springs in Yellowstone National Park).

For the purposes of this workshop 2 readings and a PubMed search are suggested:
  1. Wooley, JC; A Godzik and I Friedberg. 2010. A primer on metagenomics. PLoS Computational Biology 6:e1000667. (open access and a good read)
  2. QIIME PubMed Search
  3. Knights, D; EK Costello and R Knight. 2011. Supervised classification of human microbiota. FEMS Microbiology Reviews. 35:343-359. (not open access, subscription required).
Essentially, metagenomics focuses on all, or as many as can be detected using todays methods, of the organisms within an environment (usually unculturable organisms are the 'target'). That's not to say you can't 'create' an environment in the lab that hosts several known or culturable organisms and do a metagenomics study on that, but most of the research has been done on 'in vivo' environments such as hot springs, the ocean, acid drainage sites, and humans...to name a few. Dr. Rob Knight's lab at UC Boulder, CO for instance has been involved in quite a bit of work in metagenomics as it pertains to humans and environmental bacteria. One of the papers, aside from the one above, which alas is not freely available either is quite nice and focuses on human gut microbiota (Lozupone, CA et al., 2012. Nature).

Looking for more open access pubs with a focus on microbiomes/metagenomics?
Nature Reviews also has a focus on metagenomics that might be useful in finding more studies across fields in this subject, though I cannot guarantee what's open access and what's not.

QIIME = Quantitative Insights into Microbial Ecology. It's a software package (refer to disclaimer about software!) that assists in the analysis of microbial communities and focuses on data generated via high throughput sequencing methods.

Personal Opinion...it's pretty cool. Unfortunately it's not approved at WRAIR...yet.

Jesse Stombaugh from the Biofrontiers Institute (UC Boulder, CO) has some nice slides that show some of QIIME's analysis. Slides 14-19 show you the QIIME workflow. Additionally if you find microBEnet a group that focuses on microbiology of the Built Environment on youtube, they have several videos detailing how to use QIIME.

Or if you're the type of bioinformatic cowboy to just jump into the program itself, check out the QIIME website. If you're a programmer, python is useful to know but not totally necessary if you end up getting into the nuts and bolts of the software. For those of you going python-wha??? See next section on programming which will talk about BioPerl and some Python.

Alright fellow aspiring bioinformaticians...my last prep blog will probably come in the Czech Republic as I head out tonight, then we'll be jumping right into the workshop!

Next Up: Preparation--Programming

Wednesday, January 2, 2013

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--GMOD/Gbrowser

Section 6: GMOD/Gbrowse

GMOD was set up to be a kind of community for biologists to allow them to focus on research and 'the science' if you will instead of fretting over the nitty gritty application acrobatics they need to implement to obtain useful information from their data. It is a series of interconnected applications and databases for scientific use.

GBrowse is a component of GMOD that is for, as you might have guessed, Genome Browsing. I did see a magic word in the description that is dreaded by many who work at institutions with enhanced security protocols...'install'. Here's where the disclaimer from the first blog entry comes in...make sure of how it installs and that you've submitted all the proper paperwork and links so that information assurance can approve and install the program. Yep, it's doubtful you'll get to install in, your IT or IA department will usually take care of that either remotely or by coming to your computer. This of course applies to all of the programs mentioned in previous blogs as well.

Other components include: Community Annotation, Comparative Genome Visualization, Database tools, Gene expression visualization, Genome annotation, Molecular pathway visualization and Blast sequence alignment...I lifted all this from the GMOD wiki, so check it out, it is also one of the suggested 'readings'.

It will be interesting to explore Galaxy and GMOD and see how and if they overlap...they seem to target the same audience but potentially with different available functionality...

The other suggested reading focuses on GBrowse: Stein et al., 2002. The generic genome browser: a building block for a model organism system database. Genome Research. 12:1599-1610.

Looks like we'll be learning about some really interesting platforms and packages of tools available. Myself, I tend to gravitate toward the web-based platforms (I'm already clicking through Galaxy) as most of the time it doesn't require a software install which reduces my work headache by leaps and bounds and spares the bottle of wine at home that would've been in danger of being drank in one swallow the minute I walked in the door!

Next Up: Preparation--Metagenomics

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Galaxy

Section 5: Galaxy

This will be a short one (and the people rejoiced!).

Galaxy is an open source, web-based platform designed to assist in computational analysis for researchers. It helps in data management and contains analysis tools in the framework ranging from converting from one data format to another, manipulating fasta files, fetching sequence data, calculating statistics as well as conducting some evolutionary analyses and metagenomic analyses. It also has a link to the ENCODE projects tools.

Nuff said...to dive in see the suggested readings:
  1. Goecks, J; A Nekrutenko and J Taylor. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible and transparent computational research in the life sciences. Genome Biology. 11:R86.
  2. Blankenberg, D et al., Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database (Oxford) 2011:bar011.
  3. J Goecks also has a slideshare up on Galaxy that shows some of the nifty graphics that can be generated from some of the tools.
  4. 2 years ago David Coil put up a series of tutorials using Galaxy to visualize various datasets. Not sure how up to date the push to click operation is (Galaxy in 2010 versus 2013), but worth a view. Below is the first video in the set. David Coil actually has a nice set of videos on his profile all less than 10 min long so the commitment is minimal, see if any are of interest--most deal with sequencing and programs for analysis.


Or just hop to the website directly: Galaxy and perhaps take a tutorial on how to get started.

Next Up: Preparation--GMOD/Gbrowse

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Assembly

Section 4: Assembly

This next section deals with an introduction to assembly and assemblers. All the suggested readings are freely available which is awesome:

  1. Birol, I et al., 2009. De novo transcriptome assembly with ABySS. Bioinformatics. 25:2872-2877.
  2. Zerbino, DR and E Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18:821-829.
  3. Langmead, B. 2010. Aligning short sequencing reads with Bowtie. Current Protocols in Bioinformatics, Chapter 11, Unit 11.7.
There two general ways in which you can compile short reads to make your genome; reference mapping assembly and de novo assembly. There are benefits and caveats to both.

Tuesday, January 1, 2013

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Transcriptomics

Section 3: Transcriptomics

Four readings were suggested for this section which sadly are not open access, I'll link them below and then we'll dive into some crash-course transcriptomics and explore other resources available to round out your knowledge base on the subject.

Richard Twyman writes a nice short sum up of Transcriptomics and it's applications on the Wellcome Trust site. He defines transcriptomics as the global study of gene expression at the RNA level. So now we are not only talking about genes and their nucleotides, we are now talking about what those genes are doing, when those genes are active and to what degree those genes are active and regulated. All of this is measured through various forms of RNA (mRNA, tRNA, rRNA, ncRNAsiRNA or total RNA). The type of RNA you are interested in depends on what question are you asking. Richard Twyman has assisted in many research publications involving bioinformatic analysis and his recent publications can be found on the writescience site.

In terms of more articles to sift through that explore transcriptomics you can try the OmicsGateway with subject: Transcriptomics through Nature Publishing, though I cannot guarantee they'll be open access. Alternatively you can go to BMC Genomics which has a whole section on Transcriptomics where many of the articles are open access.

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Genome Structure

Section 2: Genomic Structure

The prep section for genomic structure was small with two articles suggested, one of which you'll need a subscription to read; the other is freely available! Huzzah!

So lets jump into Mills et al. and learn something...the one caveat to this article is they automatically assume you know what a 'structural variant' is and they are specifically talking about this with respect to the human genome. So lets back-track a little--skim if you're already a structural 'pro'...or better yet, add your two cents in the comments along with other links to clarify this topic.

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Modern Genome Sequencing Technology

Given the workshop will be moving quite quickly the organizer(s) suggested for those of us with less background to do some reading up in certain topics which they outline and include links to on the website in a section called Preparation.

This page, with respect to the workshop on genomics, is broken down into sections. Total there are a suggested 23 readings or websites, unfortunately many (11 of 23) of the links, you have to have a subscription to view the articles beyond the abstract; which unfortunately I do not have and my institution has limited subscriptions; besides most of my preparation is being done at home where I would't have access to the institutional subscriptions anyway. I'll be going section by section with the papers linked and offer some additional suggestions for those unable to access articles due to lack of subscription.