Dr. Bill Cresko
University of Oregon
Sensei of the Unix Ninja
Topic: Genomic analysis of non-model organisms, RAD-tags and STACKS
Greetings!
So today we launch into how to analyze critters that do not contain a reference genome or have relatively few annotations associated with their genome, the proverbial black box we bang our heads against.
What is a non-model versus model organism anyway?
Model organisms have 'an entire community trying to dissect one species usually in an effort to understand humans better.' The mouse is a model organism, E. coli K12 is a model organism.
Non-model organisms are studied by relatively few, don't contain good references usually and have no correlation to research going on in humans; meaning they don't necessarily further our knowledge of processes in humans.
However, despite not having the breadth of knowledge that model organisms have--the literal arsenal of annotations--we still have the same questions about non-model organisms as we do with model organisms!
- How do major differences among lineages evolve
- What is the relatedness between organisms?
- What is adaptation like in these organisms?
There are fundamental processes in evolution:
- Origin of genetic variation: via mutation and migration
- Sorting of that variation: variation can be affected by genetic drift and natural selection for instance. For those drawing a blank on 'genetic drift'...it is the change in allele frequency over generations...for those of you drawing a blank on 'allele'...an allele is essentially a gene or better yet a specific 'version' of the gene. So one gene can have more than one 'flavor'. An allele is a particular flavor of that gene as determined by it's sequence data. For those of you drawing a blank on gene...click over to the nyan cat you tube video--be entertained and don't come back to this blog until you know what a gene is.
- Simultaneous genotyping of neutral and adaptive loci (for population genomics): neutral loci provide a genome wide background that gives you estimates of effective population size and can be used for phylogeography. Adaptive loci are outliers from the neutral background and can lend insight into selective sweeps or local adaptation.
So the naive solution when approaching a project with a non-model organisms is to just 'sequence everything'. Why? Because sequencing is still quite expensive and for many studies is pretty much a waste.
Genomes are generally organized as linkage blocks so essentially as long as you have well spaced markers that will work just as well for genotype for a fraction of the cost. Having genetic maps are very useful in genome studies and often times a great first step to guide you in whether full genome sequencing really needs to be done to answer the scientific question you have.
So the Cresko lab is heavily involved in an alternative approach that exploits these linkage blocks in the genome. It bypasses the need for a full genome and still provides tons of data about the genome in addition to building a genetic map and doing genotyping that can guide further studies/hypotheses.
The technique is called RAD-tags, the program/pipeline is called STACKS and the 'grand architect' our resident Unix Ninja--Julian Catchen.