New Ventures: Babies, Stats and Python

When I created this blog, I'd set out to have something online where I could dispense what I was learning and I didn't feel the need to post if I didn't think I had something worthwhile to extend to my audience. That's why you are seeing a new post 8 mos after the previous post and the previous post was a newsletter of what I find awesome in the field rather than a written blog post.

Since my last post a lot has changed and I am back after a very long hiatus to have a child, change jobs, move states and attempt to figure out the trajectory of my career from this point on. By way of updates:
  My daughter is almost 11 mos old, awesome, adorable and sleeps through the night - which is great on my sanity.
  My family has moved to accommodate a job my husband got.
  I still work remotely for the research institute in Maryland doing data analysis and consulting. It's really a great set up for me right now.
  I am dipping my heels into the academic job market seeing what's out there and developing my philosophies on research and teaching.
So where do I go from here?

I was looking forward to this 'sort of' break because I knew it would give me a chance to explore my field and how I want my career to play out. If you aren't constantly learning you are falling behind.
"Anyone who stops learning is old, whether at twenty or eighty. Anyone who keeps learning stays young" - Henry Ford
I am determined not to let the gray hairs that are starting to tease me in the mirror get the better of me.

So what does that mean?

I had two main professional goals - to get better at programming (in python), refresh my abilities with stats because I think that last time I seriously did any stats was at Montana State University in 200*-does it matter?! My knowledge is outdated. I also wanted to learn an open source stats program (I've used SPSS and SAS in the past).

Goal 1 - Python programming. I'm not too proud to admit I attempted a formal course around 2013 via EdX administered through MIT...and I failed in fantastic form. I didn't fail the course, I ended up dropping it. At the time (i) I underestimated the challenge level of the course. I liked the challenge but with challenge comes the need to 'meet the challenge' which translates into time! (ii) Time was something I did not have. I was in the middle of building a viral genetics and bioinformatics section at my institute in Maryland which was daunting to say the least especially in light of a million regulations. I made it half way through the course and had to drop as the demands of the job allowed me little time after work to do anything except go home, maybe eat, zone out for a half our then pass out - I'd work runs into the mix when I could. So I conceded defeat determined to conquer the course at a later date; because up til that point I was actually really loving the course. It was (and still is) a well designed online course. I did my epidemiology and biostats certificate completely online, I learn well through an online format. So - now that I find myself with some free time I have re-enrolled in an attempt to actually conquer the course.

So I'm out to start improving my programming from the basics up. If you cannot embrace and learn from defeat you have no business heralding your successes.


Goal 2 - Stats. I actually really like stats despite the heavy math component that can get scary when visions of my high school AP calculus homework start invading my dreams and my stats textbooks. Am I familiar with stats? Yes. Am I a 'pro' with stats? No, but I used to be and I'd like to be again. I'd also like to update my stats programming from the ever so expensive SAS and SPSS to the open source (ie. awesome as hell) R. You can do some beautiful stuff in R.

To this end I could've decided to just do another online course - meh. I think it much more personal and awesome to instead try out a friend/collegue's textbook published in 2014 and I'm really excited about it and that's what you'll be hearing about in the coming blogs on this site - a walk through:

Foundational and Applied Statistics for Biologists Using R by Ken A Aho

I think I'm the perfect candidate to do a self study on this book and see how it goes. I have stats background and to be frank I really appreciated a sentence in his preface:
"Statistical texts and classes within biology curricular generally ignore or fail to instill foundational concepts...Unfortunately, this problem has been exacerbated by advance in statistical software. These tools do not require any knowledge of foundational principles. However, a poor understanding of the theory underlying the algorithms often leads to misapplication of analyses, misunderstanding of results, and invalid inferences."
How often have I 'preached' about using bioinformatics tools without knowledge to back up why you are using the tool you are using and are you using the best tool (especially in phylogenetics)? Do you even know what a model is? Do you know why you are using that model? You are using Geneious to do WHAT?!! Etc.

Bioinformatics is not about pressing buttons and black boxes. It is not about getting results and taking then with no grains of salt. If you are a researcher I would hope you want to know the fundamentals, what it is doing and what it can tell you when you run your data through it and the caveats associated with it.

I am guilty of pressing buttons in stats, especially of recent - time to back up and practice more of what I preach.

Disclaimer: This will not be a fast process, clocking in at 554 pages, Ken's book is long. But I am looking forward to it. I cannot guarantee every-day postings but I'll get through it! Tune into this blog if you are interested in the journey.

What about the newsletters?

Right - so I spent 3 years running newsletters around my institute via email then started posting some on this blog. I got generally favorable feedback. My newsletters will continue to appear on and I encourage you to head there. Though given my goals above I will not be updating them as often as I did before.


Also - is a pay-for service if you want multiple newsletters so I may reduce in the future as readership has been light for over a year now and the service isn't exactly cheap. Just a heads up.

I think that about sums it up.

One final note before I leave you - for those that are in a continual pursuit to improve their abilities in the world of bioinformatics. There are some great reads out there - and here's where I shamelessly plug my bioinformatics justice league - leaders in the field, they will make you think and they will hold you accountable. It makes us all better scientists.

For those that read this blog - thanks for your readership, I do hope it's helpful or interesting or both! up - A Walk Through: Foundational and Applied Statistics for Biologists Using R (FASBuR) - and it's a cool acronym too!

Friday, January 22, 2016

Bioinformatics Newsletter 1: 22 Jan 2016

East Coasters – Stay Warm! The blizzard LOOMS! Find some of your favorite scientist’s publications and cozy up with a hot beverage, I know I will!
Latest and greatest in the Lit
Escobar-Zepeda et al., 2015. The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front in Genet
Brierley et al., 2016. Quantifying global drivers of zoonotic bat viruses: a process-based perspective. Amer Naturalist.
Borucki et al., 2016. Middle east respiratory syndrome coronavirus intra-host populations are characterized by numerous high frequency variants. PLOS ONE.
Didelot et al., 2016. Within-host evolution of bacterial pathogens. Nature Reviews
Metcalf et al., 2016. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science.
Gill et al., 2016. Understanding past population dynamics: Bayesian coalescent-based modeling with covariates. arXiv
Ziegenhain et al., 2016. Comparative analysis of single-cell RNA-sequencing methods. bioRxiv
Parekh et al., 2016. The impact of amplification on differential expression analyses by RNA-seq. bioRxiv
Kim et al., 2016. Host specific and segment-specific evolutionary dynamics of avian and human influenza A viruses: A systematic review. PLOS ONE.
Scientific Internet Chatter
“CDC Zika press conference: Audio and Transcript” (Avian Flu Diary)

“Will Zika become the 2016 NTD of the Year?” (PLOS Blogs)

“Almost everything you wanted to know about Illumina HiSeq 4000” (Core Genomics)

Databases in Bioinformatics” (YouTube) Hours 1 and 2: and

“BIOM23: a 16S Practical” (Loman Lab)

“Paper summary: fast and accurate single-cell RNA-seq analysis using equivalence class counts” (NextGenSeek)

Software Notes:

#14 Velvet is so named because Daniel Zerbino wore velvet gloves when coding it (via Nick Loman)

Friday, December 11, 2015

Dusting off the blog site...Newsletter time

Greetings and Salutations oh Minions of Bioinformatics...

So in keeping with the premise of this blog as you'll notice - I haven't exactly gone anywhere or done anything meriting a hash on this site. Well that's not totally true - I did once again give my Phylogenetics/Sequence Analysis workshop in January 2015 - this time in Peru and I am shamefully delinquent on posting the materials for that but it's on the list.

My excuse? Well I had just found out I was pregnant. That being said I spent the following 9 mos alternately excited, busy attempting to thwart morning sickness and keep up with work and fretting about 10 pound baby head nightmares. Then I was on maternitiy leave. I just recently returned from leave and have a healthy gorgeous baby girl who lights me up no matter how freaking tired I am...and I am tired.

Thus! I have attended no conferences although my work has appeared at conferences with others presenting. Nor have I attended any workshops. I did have two papers accepted (Huzzah!) and 2 more that'll go into submission before the end of the year - so it hasn't been a total intellectual loss.

As part of my return I emailed my Bioinformatics Enthusiasts listserve with the monthly offering in the form of a Bioinformatics Newsletter that I've been putting together and sharing for the past 3 years. Why haven't I posted it here? Because for the past 3 years the institution I work for has blocked this site as a 'blog' site. I don't know why they suddenly unblocked it but I'll take it! And so here I will post my monthly one-hit-wonder for those that are interested in what I find interesting in the world of virolgy and bioinformatics.

Many thanks to my utterly fabulous colleague Nick Loman for encouraging me to start posting these online - here's the shameless plug to his site: Go Here I am Nick Loman's Awesome Lab Site. If I were on twitter right now I'd be hash-tagging that sentence with #scigroupie

Many thanks to my tens of followers of this blog, I hope you find it's content useful and now in the stead of all the conferences and workshops I am not currently attending - 
I shall post my newsletter instead.

Happy Holidays!
Online Short Course:
Pathogen evolution, selection and immunity
Latest and greatest in the Lit
Darriba, Flouri and Stamatakis. 2015. The state of software in evolutionary biology. bioRxiv. 
Khang and Lau. 2015. Getting the most out of RNA-seq data analysis. PeerJ.  
Wang et al., 2015. Reemergence of autochthonous transmission of dengue virus, Eastern China, 2014. EID
Duong et al., 2015. Asymptomatic humans transmit dengue virus to mosquiotes. PNAS.
Jennifer Doody. 2015. Ebola outbreak: a system that failed. Harvard Gazette.
Lau et al., 2015. A systematic Bayesian integration of epidemiological and genetic data. PLoS Comp Biol
Bowers et al., 2015. Impact of library preparation protocols on the metagenomic reconstruction of a mock microbial community. BMC Genomics.
Worby, Lipsitch and Hanage. 2015. Shared genomic variants: identification of transmission routes using pathogen deep sequencing data. bioRxiv
Scientific Internet Chatter
Tutorial: RNA-seq differential expression & pathway analysis with Sailfish, DESeq2, GAGE and Pathview. (Getting Genetics Done)

A computational pipeline for cross-species analysis of RNA-seq data using R and Bioconductor. (RNAseq Blog)

How to extract FASTQ from the new MinION FAST5 format using poRe. (Opiniomics)

“Do demons dreams of phylogeny packages?” (Omics Omics)

“The five habits of bad bioinformaticians” (Opiniomics)

“DENGVAXIA®, worlds first dengue vaccine, approved in Mexico” (Sanofi Pasteur)

MinION and time to result” (Omics Omics)

“Oxford Nanopore’s Software Side.” (BioIT World)

Software Notes:

The HGAP assembler is actually an elaborate front-end hiding 
three thousand slave labourers all running GAP4!
You can also find a totally exhaustive listing of my internet bioinformatic trolling spoils at where I link to research in general virology, dengue, influenza, bioinformatic software that's published or coming out as well as post job openings and trainings I come across so feel free to hit that up if you desire more bioinformatic madness.

Monday, March 10, 2014

Want more Training in Bioinformatics?

More Training Opportunities, courtesy of Dr. Stephen Turner...

Dr. Stephen Turner runs the Getting Genetics Done Blog which has a lot of great posts regarding bioinformatics; general news, software developments and training. It's linked also over to the left as one of the blogs I follow.

He has compiled an extensive listing of Bioinformatics Workshops and Trainings which includes training programs, MOOCs (online courses/modules), workshops, short courses, literature, recommended reading and useful blogs.

It's worth a look.


