USeq, MiSeq, WeAllSeq...to Seek: Blog Series: WoG, Cesky Krumlov: "One *ome to rule them all...?" and other anecdotes from this evening

Apologies to the Lord of the Ring die hards who are probably outside my door with torches and pitch forks...but truly this evenings discussion really brought some things to light and well...tossed other things into the dark...

Ah Science...

Disclaimer: The views and opinions expressed in this blog are those of the author and do not necessarily represent any policy, views, musings, opinions, final straws, or coding errors of the instructors, TAs, faculty, or organizers of the Workshop on Genomics, Cesky Krumlov, 2013. Anotherwards I am purely to blame and in the end...at least I am learning whilst amusing myself.

So...if you aren't outside my penzion with a torch and pitchfork you may now read on and learn of the problems, pitfalls, questions, confusion, frustrations, confessions, advice and successes facing different facets of genomics (at least in terms of our research).

www.tumblr.com/tagged/bioinformatics

Rather than give you a laundry list...cause that's quite boring I'll hit on major themes I got out of tonights forum and reiterate some the important concepts to keep in mind when you decide to start an NGS project.

"One *ome (-ome) to rule them all..."

We ended up having quite the conversation in several of the groups with respect to "The -Ome"; 'The genome' or 'The transcriptome' and of course the ever encompassing all powerful solver of all problems 'The reference', ok that's not an 'ome' but you get what I'm saying. Most of the workshop attendees actually work on non-model organisms that do not have a reference genome in Genbank. Quite honestly, 'the human genome' was a mix and it's funny that, that has become the standard 'reference'. It's not a bad thing, we need something to start with after all but should we seek 'the genome' with respect to our organisms at all? If we do seek the whole genome and make it 'the genome', how do we choose what 'kind of genome' we want to get with respect to our critter. If we sequence a lab rat strain/organism that's been in the lab 1200 years, it might be easier and/or more cost effective but is that going to be applicable to genomes 'here and now'? Probably not. BUT if we sequence the full genome of a strain perhaps directly out of the environment, is it necessarily going to be 'general' enough to be of use to other investigators? If we are only interested in one aspect of differential expression do we still go after the whole genome or whole transcriptome when perhaps a chip would do well to answer our hypothesis or use a targeted transcript approach? You also have to think about the future aspects of your research, do you 'go for it' in terms of NGS with the hope of generating a useful dataset that can be utilized for years to come but is more expensive? Or do you stay small and cost effective and target your study?

How to design the optimal bang for our buck in terms of information that would be gained and sanity retained.

I think the general consensus was that everyone loves a reference, quite frankly it just makes life easier for us bioinformaticists. Should we consider it 'the One'? Probably not when you think about the 'world' of our organism(s) and their relationship with the environment and other organisms, but for our research, our time, our place...it kind of is, you have to start somewhere. And sometimes you have to invest in that baseline, that overall view--if you have the funds you'll get to use a couple runs of Illumina lanes (or other platform of choice) and obtain a nice genome as well as running RNA-seq and combine the results to obtain some annotation on top of that genome. Wouldn't that be lovely...

www.tumblr.com/tagged/bioinformatics

"One *ome to find them..."

I find it interesting how science teeters back and forth from discovery-based to hypothesis based. In high school and college and many times in grad school, we have it drilled into us 'not to waste funds' and that everything MUST have a concrete answerable hypothesis. In military research the first question is 'what is your hypothesis'!? Don't expect to get funding unless you can come up with a clean concise hypothesis that they think you can actually answer. Other funding sources are a little more flexible where objectives and goals replace the concrete hypothesis and discovery is given free reign to roam. Some research is all sequence data mining with no actual 'experiments' while other research sequencing outcomes heavily depend on the wet lab design.
How do you go about finding your 'genome'/'transcriptome' writing interesting research with essentially nothing to go on but unable to propose anything until you have something interesting to say? And all the considerations that go into finding 'the one' when finding 'the one' depends on: platform choice, PCR, library construction, expression levels (if you are doing RNA), DNA extraction, time of year, time of day, whether another method is cheaper and better, whether grants will fund you if you don't add the buzz of NGS?
So many things can confound your analysis and put your -ome out of reach: Chimeras, low/no coverage, low DNA/RNA yields, contamination, lack of references or annotations, computational power, selection, ploidy...the list goes on. Want more reasons for your head to spin, go back through the talks, there's simply a lot to consider when constructing a sequencing experiment and finding your "-ome".

"One *ome to bring them all and in the darkness bind them..."

I'd like to take a moment and acknowledge those that program and construct software and algorithms for biology and the few and far between bioinformaticists. It's a mess, I know, it's one big mess in a big black box and often times when you start you are literally shooting in the dark with your parameters. You have to try a lot of things and make a lot of mistakes and run a lot of analyses. Bioinformatics is a dark room of discovery and right when you find the light switch and turn it on you often times only see a corner of the room where your tiny lamp sits...and it's fantastic. Then you realize how much more of the room you have to discover with no or little path to guide you. It's both exciting and frustrating and it's what science is all about.
So many times I have gotten asked by collaborators, students, technicians to 'can' an analysis for them.

Collaborator: "So I have 100 genomes of bug 1, 50 genomes of bug 2 and 120 genomes of bug 3 that we collected in different places at different times and sequenced, can you do an analysis for me?"

Me: "Ok, what are you looking for? What's the question or what are you interested in?"

Collaborator: "Ahhh, I dunno. we just have a bunch of data, could you just like tree it out maybe tell us anything interesting, amino acids under selection, how is drift, it'd be cool to know what the population is doing..."

Me: "That's a master's thesis"

Collaborator: "Ya, so I have an abstract due in 10 days, can you get me the analysis in like 5 so I can get that written up?"

Me: "Uh...no."

One of the faculty at the workshop has a similar story, being recently contacted about analyzing a few million or so reads. Collaborator: "Hey, can you just toss these into BLAST for me!? That'd be great....thanx." Bioinformatics Instructor: "Uh...no."

"Can open...worms...everywhere!" The benefits and pitfalls of 'canned' analysis:

Yes, there are some analyses that can be 'canned'. I am in the process of constructing such a can to assist in introducing and instilling confidence in PIs or technicians that wish to do some molecular evolution analysis on their own. Our genomics workshop here consists of sets of canned analyses designed to show us what programs can do and what we may encounter during NGS analysis. But not everything can be canned. As much as we'd love a push button solution to our scientific problems, it's just not there.

sed "s/water/beer/g" water.txt > drinkbeer.txt

(I love the coding semantics involved in drinking beer)

But another topic that came up was the trade off and possible danger of standardization. In the military there is a standard operating protocol (SOP) for EVERYTHING! I swear I passed one on my way into the bathroom specifically related to using bathroom facilities. Everything has to be standardized to ensure consistency and reproducibility. Ahhh, reproducibility another dark corner of our room. How do we deal with it? You can't necessarily 'standardize' NGS analysis because different projects have different sets of standards, platforms, samples, goals, thresholds--all applicable to the study. Perhaps its a matter of just better annotation and communication on the part of the researchers. So perhaps you don't have to standardize but please for the love of God, tell us what platform and parameters you used. That still won't ensure our datasets are comparable, but it's a start.

I think over time an appreciation or if not, just an awareness will arise that there are people behind that black box of analysis, who are tapping an IV of coffee to get you your report, who desperately would rather just teach you and give you the tools so you can do it yourself. But we realize the field is scary and unpredictable and I think it falls on those of us who are trained or becoming trained to find ways to make our field more accessible and to encourage PIs, graduate students and postdocs to 'fear not' and manhandle their own million read datasets. I hope you are deriving some of that from this blog.

I am fortunate that my boss actually respects the time it takes to do what I do. I still get the odd crazy deadline where I am hopped up on red bull, twitching and typing like a madwoman constructing pretty graphics of analysis output for reports and such but for the most part, it's not so bad and if it is, a lot of it is the pressure I put on myself.

This is what I do for a living and I still get daunted...it's normal!

http://www.cse.chalmers.se/~molokov/seminar.html

Remember none of us started out knowing what we were doing, trial and error and good mentorship, willingness to learn...and a lot of reading, some more trial and error. Perhaps a pot of coffee or eight. A few head bangs against the computer, perhaps a few cuss words...but that's Science in general, it's not exclusive to our field.

We are all in it together, research doesn't happen alone. Work with others, collaborate, talk, ask questions...no question is dumb--ok maybe some questions are kind of dumb BUT we will never say that :). I've asked plenty of dumb questions...still do.

And with that our first week comes to an end...well sort of, we still have two labs tomorrow prior to the brewery tour and the beer drinking that will commence after.

I'll be here, still blogging...

Cheers.

http://www.tumblr.com/tagged/human%20genome

USeq, MiSeq, WeAllSeq...to Seek

Friday, January 11, 2013

Blog Series: WoG, Cesky Krumlov: "One *ome to rule them all...?" and other anecdotes from this evening

No comments:

Post a Comment