Saturday, January 5, 2013

Blog Series: Workshop on Genomics, Cesky Krumlov; Preparation--Programming

Dobrý večer! from Cesky Krumlov, Czech Republic! Ok that's enough Czech from me...

Section 8: Programming

So BioPerl and (I'm going to plug BioPython in here too--see below, PyCogent) are basically what they sound like, perl codes/scripts/modules (however is easier for you to think about it) and python code (Perl and Python being programming languages) geared toward applications in biological analysis.

Now my husband is a programmer turned bioinformatic programmer and his best advice is to jump right in and just keep using it. My main concern with that...aside from my inability to manifest a 36 hour day that would allow me to take on learning a computer langauge...is I don't use it every day in my job. This makes it difficult to say the least to retain all the commands in your head and even then--when in doubt 'Google!' The links above, the BioPerl one being suggested in the preparation materials for the workshop and the BioPython will give you an idea of the modules/programs that have been constructed using those languages and provide more links if you want to get further into the mire and meld programming with biological analysis. You will have to learn some basics of the language itself before jumping into the biological application of it just for functionality's sake.

Now that's all I'm going to say for the moment, but we'll come back to this...my husband who was trained as a computer scientist and is a programmer who just recently got thrown into the world of biology head first and is now programming (using Python) for bioinformatic analysis has a python tutorial and some advice to dispense to you all who aspire to move in that direction...but he has to write it up.

In the meantime, one thing you absolutely need to get comfortable with is the ominous black box called command-line. You simply have to learn how to navigate around your computer in command-line interface. The conference organizers have provided a helpful tutorial so we are going to go through that and I'll add as we go based on my own trial and error experiences.



So the tutorial offers up some commands that you should become familiar with--write them down in a notebook DESIGNATED for remembering command-line stuff as a cheat sheet. My favorite sentence before they jump in is: "Becoming familiar with these key combinations will greatly enhance your ability to become a command-line ninja". Perhaps it's just the fact I like the prospect of being 'ninja' at something.

  • First open a terminal.
  1. Term--wah? If you are in a Mac, you can navigate to /Applications/Utilities/Terminal or if you have quick search, type Terminal into it and it should pop up--double click. 
  2. If you are in Linux, kind of depends on your environment/system (Ubuntu versus Unity versus Gnome versus you name it) but essentially you are going to go to your Dash or Menu then Applications, if it doesn't pop up, you can see if there is a 'More Apps' or 'System' or 'Accessories'--look in all those places, if it's not there (it should be). My last advice would be what I found on this site: Try pressing Ctrl + alt + t which is supposedly the short cut key combo.
  3. You should see something like this for linux and mac:

Lets learn some commands, every time you see something bold, type it then hit enter.

1. type pwd <enter> this is the current directory you are in or 'present working directory'
  • for example if it shows ~/Desktop that means you are 'on the Desktop'.
  • or it may show /home/username/ (for me that'd be /home/mel/)
2. type ls this will give you a list of everything that is in the directory you are in
  • so if you have a file on your desktop that is called 'Hello', when you type ls you should see Hello listed.
3. type cd Hello, cd stands for 'change directory'
  • This assumes you are on the Desktop (if not type cd ~/Desktop/Hello)
  • cd ~ is your home directory, so basically it would be the same as typing /home/username/ (again, for me that'd be /home/mel for example). So the full path to the Hello folder would be: /home/mel/Desktop/Hello.
  • This assumes you have a folder on your Desktop called Hello (caps sensitive).
  • Now type pwd
  • It should now say you are in the Hello folder: /Desktop/Hello
  • Alright we don't want to be in this folder anymore so we want to go back to the Desktop. Type cd .. this will take you one directory above where you were.
  • If you have no idea where you are on your computer and just want to get back to the Desktop and try anew: type cd ~/Desktop
4. Lets do some navigating, I'm going to pull from the evomics tutorial now:
  • Type ls
  • Type ls -lrt
  • Type cd /user/local/bin
  • Type ls -l
  • Type cd ~/Desktop
  • And now you've forgetten that you still want to be in the user/local/bin file and not on your Desktop...doh! Why are you on your Desktop? Because the last command you just typed was cd ~/Desktop which takes you the Desktop...well rubbish. 
  • Instead of re-typing cd /user/local/bin (because we are uber-lazy and make lots of typing errors on a regular basis) hit the up arrow twice. You should see your typing 'history' going backwards. Using the up and down arrows are a quick and easy way to use commands you've already typed without having to retype them. Perhaps in this example it's not so life altering but consider this:
  • I type: chmod 745 /home/yaya/Alpha134/beta6/hotsauce51/AREAx/mel <enter> then I realize I should've typed 755 not 745--doh! Now using the up and down arrows becomes super useful because do I really want to type all those characters again!?
5. More navigating
  • Lets revisit: chmod 745 /home/yaya/Alpha134/beta6/hotsauce51/AREAx/mel
  • That's a pretty long line, what if I only wanted to change part of it? Like at the beginning near 'chmod' or the end in the filenames?
  • If you have 'Home' and 'End' key you can use these. If not, like me, use Ctrl + A to go to the beginning of the line (Home) or Ctrl + E to go to the end of the line (End).
6. Make directories
  • If you aren't back on your Desktop, type cd ~/Desktop
  • Type mkdir directory1 you've now made a directory called 'directory1'
  • Type ls
  • You should see directory1 in there.
  • Type cd directory1
  • Type mkdir directory2 this now makes a directory inside directory1 called 'directory2'.
  • To get back to home type cd  or cd ~
7. I now hate directory2 and have decided to remove it.
  • You know directory2 is in directory1. So navigate to directory1 via cd ~/Desktop/directory1
  • Type ls
  • You should see directory2
  • Type rm -rf directory2; I type -rf to make it recursive and forced. Before you do this make sure you want to remove it AND ALL files within it.
  • type ls
  • directory2 should now be gone.
OK...that was a quick and dirty look at moving around command line. The tutorial on the evomics site goes into more detail in addition to explaining the following commands and how to use them with example files they provide--here's the short list.
  1. The head [filename] command will display the first few lines of the named file in brackets. If you want to display a certain number of lines in that file you can add a dash and number. Ie. head -10 [filename].
  2. To copy: cp [oldfilename] [newfilename]
  3. To stick two files together into one big file (called concatenation): cat [file1] [file2] > [file3]
  4. To move a file: mv [file1] [file2] (which is essentially renaming it) or mv [location1/file1] [location2/file1] if you want to move it to a different directory.
  5. Deleting a directory and all the files in it. rm -rf [directory name]
  6. To count the number of lines a document has: wc -l [filename]
Other commands that you should refer to the tutorial for and ought to know:
  • grep
  • split
  • cut
  • paste
  • sort
  • uniq
  • how to combine multiple commands in one line
  • making files executable and changing other file permissions
  • basic shell scripting
  • modifying text files
  • connecting to a remote machine
  • copying files between computers
  • how to convert DOS files from Windows to UNIX files that you can actually use.
  • compressing and uncompressing (zipping and unzipping) file.
Another thing you can do if there is a command and you have no idea what it does is to type man [command] the 'manual' page for that command will pop up and you can read about it. To quit out of the manual, type q

A little dense with details no? That's what cheat sheets are for. The tutorial on the prep page is worth a read while typing on your computer so you become familiar with layout and what commands do. Play around with it. This way when a facilitator during his lecture or tutorial says the following:

Alrighty...I want everyone to navigate to [this BLAST output file], cut out columns 1, 3, and 6 to a new file. The first column is the organism match, lets grep Escherchia coli and see how many times it pops up. 

You won't be the one in the corner rocking, twitching and talking to yourself in a hushed whisper.

For the love of your sanity...take a look and become familiar with this stuff, you are not expected to be a ninja yet but at least know what the terminology is and vaguely what it does.

To help:
Moving on...

What you thought we were done??? Silly bio-, micro-, viro-logist...we're almost done.

Back to programming! Once you are familiar with looking at the black (or white) box terminal screen and your eyes have become accustomed to the soothing  green (or black) type and your skin has paled somewhat due to staying in doors all day banging your head (I mean your fingers) against the keys...you are ready to consider jumping into programming.

Things like BioPerl, BioPython modules or programs like PyCogent are meant to facilitate and hopefully make our lives analyzing copious amounts of data easier.

PyCogent stands for a python-based Comparative Genomic Toolkit. It assists in analysis of sequences, implementing workflows and probably most exciting for me, generating publishable graphics at the resolution needed. The direct link to the website for PyCogent has all the documentation you will need to start running the program including installation, guidelines, examples and a 'cookbook'. I've reprinted Table 1 from the article recommended on PyCogent because it shows the features of PyCogent as compared to other programs like HyPhy and Mesquite which I've used, BioPython, ARB and CIPRES.


If any of this is up your alley in terms of things you'd like to be able to do, read the publication and jump into the website.

Finally...some advice from a programmer (Tyghe) to biologists who wish to get into programming or simply understand what programs are doing rather than chalk it all up to fairies and voodoo; which I like to do sometimes. I've grown up over the years though and slowly but surely am getting more adept at understanding what programming has done for bioinformatic analysis and getting less scared at implementing it more and more...and it's pretty amazing.

By introduction, Tyghe Vallard is a programmer...he's awesome but he--like every other true to form programmer is an elitist when it comes to his/her programming language. Every programmer has a preference...a HUGE preference for one language over another. I think a lot of it is 'function' for what they are doing...java and html for web stuff for instance. Not that you can't use java or html for other things. Another reason I've heard is 'ease of reading' the language--how organized it's written. Two examples below:



Now I'm probably going to get hissed at...but they both look confusing to me, ha! But honestly I get the flow of python more, so that's what I've been attempting to learn. Also, it's what Tyghe programs in, so I have a handy dandy reference that I live with.

Basically use what gets the job done for you; there are endless options: Perl, Python, Ruby, C++, C-sharp, Java, Fortran...

Tyghe just recently found a book which he deems a holy grail for the biologist attempting to cross over into programming. Practical Computing for Biologists by Haddock and Dunn which coincidentally also has it's own webpage and facebook page. I've given the book a skim via Amazon and we ordered it, I haven't had a chance to truly get into it and read it, but he swears by it. So give it a gander.

He has also posted a down and dirty tutorial for python on his blog...so I now POST A CHALLENGE!

  • Go to his blog, read it, try and do it...probably fail at it like I did the first time around. Mull it over, do it again. Then post questions to him on his blog. Anyone who can stump him for longer than a week I will personally buy you a beer or if we live too far apart, I will owe you. He's a hard guy to stump when it comes to programming.

Also some helpful websites for Python and Perl as well as 'where do you go and what do you do when you don't know what to do'--(that was a mouthful)--with respect to programming:

  1. The Python Standard Library this is great for searching commands or functions when you forget, like I often do.
  2. Alternatively you can use google itself to search. If your search is python oriented type your search term followed by site: docs.python.org
  3. For aspiring Perl-ers, try the Perl doc programming website which also has a handy search box. The command above (#2) you can do with this site as well just type into google your search term followed by site: perldoc.perl.org
  4. Stackoverflow is also a good site for searching common programming snafus.
  5. There is a group on YouTube (you know how I like to watch videos) called EducatorVids2, look on the right side bar you'll see a link to Computer Programming and Software Training Tools. Both are useful and full of tutorials on various programming languages.
  6. There's a shorter, more Mel (that's me) sized book at Amazon geared to python for beginners...Python for Kids a Playful Introduction to Programming by JR Briggs. Hey don't judge me, whatever gets the job done right? Plus, I never stopped enjoying being a child and the book is actually really great for beginners, additionally page 12 has a picture of an anteater holding a floppy disk. And if you can't find the joy or humor in a silly anteater holding a floppy disk...you are dead inside. No, I kid. But truly, check it out, I got it on Kindle.

Wow...this was a lengthy entry!

And with that, I leave you to it! Tomorrow I am taking the day off of blogging, it's Sunday--I shall do some exploring and I'm sure see some of you tomorrow night at the reception.

Kindest regards to all, I hope this preparation set of blogs has been useful in some way, shape or form. I hope you are as excited as I to jump right in and see what all this can do for you and your research endeavors!

See you Monday!